[SAPa15] Which analytic methods for Big Data ?Conférences Internationales sans actes : ARS'15, Anacapri, Italy,
Mots clés: Big Data, machine learning, sparse
Résumé: Classical inference is not fitted for massive data: statistical tests reject any reasonable model, confidence intervals are reduced to nothing . Model validation should be done through cross validation or split sampling. Explicit, parcimonious generative models are replaced by predictive algorithms. Model choice is driven by statistical learning theory and not by penalized likelihood. The analyst’s toolbox includes revisited classical data analysis techniques (PCA, MCA as particular cases of SVD, clustering) mainly for exploratory purposes as well as machine learning methods (SVM, boosting, ensemble learning) for prediction. In the case of high dimensional data where the number of variables exceeds the number of units, sparse methods based on L1 regularization provide elegant and simple solutions; we will present a sparse generalization of multiple correspondence analysis. Is the data deluge making the scientific method obsolete, as C.Anderson claimed some years ago? l conclude by some comments on correlation and causality.
Commentaires: ARS'15: 5th International Workshop "LARGE NETWORKS AND BIG DATA: NEW METHODOLOGICAL CHALLENGES " . 29-30 April 2015