[Sap18a] From Conventional Data Analysis Methods to Big Data Analytics

Chapitres de Livre : Titre du livre: "Big Data for Insurance Companies", January 2018, Wiley, pp. 27-41, (doi: 10.1002/9781119489368.ch2) (isbn: 9781786300737)

Auteurs: G. Saporta

Mots clés: Big Data, Cross validation, Data analysis, regression, supervised classification

Résumé: Data analysis in this chapter mainly means descriptive and exploratory methods, also known as unsupervised. The objective is to describe as well as structure a set of data that can be represented in the form of a rectangular table crossing n statistical units and p variables. Data analysis methods are essentially dimension reduction methods that are divided into two categories: factor methods; and the unsupervised classification methods or clustering. Data mining is a step in the knowledge discovery process, which involves applying data analysis algorithms. Data mining seeks to find predictive models of a Y denoted response, but from a very different perspective than that of conventional modeling. This chapter distinguishes regression methods where Y is quantitative, supervised classification methods (also called discrimination methods) where Y is categorical, most often with two modalities. The chapter also discusses new tools for big data processing, based on validation with data set aside.

BibTeX

@inbook {
	Sap18a,
	title	=	"{Big Data for Insurance Companies}",
	chapter	=	"{From Conventional Data Analysis Methods to Big Data Analytics}",
	author	=	"G. Saporta",
	year	=	2018,
	publisher	=	"Wiley",
	pages	=	"27-41",
	doi	=	"10.1002/9781119489368.ch2",
	isbn	=	" 9781786300737",
}