Méthodes statistiques de data-mining et apprentissage

Les activités de l’équipe MSDMA (Méthodes statistiques de data-mining et apprentissage) se situent dans le domaine de la science des données. Elles concernent le traitement de données par des méthodes mathématiques, statistiques et informatiques dont le concept fédérateur est celui du data-mining.. Cette discipline se propose de découvrir des relations et des structures dans des données à travers des méthodes d’apprentissage supervisé et non supervisé. Elle se situe à la frontière de la statistique, de l’intelligence artificielle et des bases de données. La théorie de l’apprentissage lui donne ses fondements conceptuels.

Les travaux des membres de l’équipe portent sur le développement de méthodes exploratoires et de modélisation paramétrique et non paramétrique ainsi que d’outils informatiques pour leur mise en œuvre.

Ces travaux permettent le traitement de données complexes : données éparses, aberrantes, manquantes, tronquées, censurées, données mixtes mélangeant des variables quantitatives et qualitatives, données structurées en blocs ou multi-tableaux…

Ces données proviennent de divers domaines tels que l’environnement, la télédétection, les procédés industriels, les images, la médecine, la santé et les sciences sociales.

En raison l’augmentation en volume et en variété des bases de données et plus généralement avec l’émergence des nouveaux domaines du data mining et du big data, l’équipe répond aux enjeux scientifiques autour de la science des données également en explorant des thèmes émergents tel que celui de l’explicabilité des méthodes d’IA.

Annuaire
Publications
Diffusion
Valorisation
Thèses/HDR
Projets

Annuaire de l'équipe

Responsable

Ndeye Niang (Professeur des Universités)

Permanents

Non Permanents

Anciens membres

Conférences et revues avec comité de lecture

Annuler

2024

Articles de revue

Abdi, H.; Guillemot, V.; Liu, R.; Niang, N.; Saporta, G. and Yu, J-c. From Plain to Sparse Correspondence Analysis: a Generalized SVD Approach. In Statistica Applicata - Italian Journal of Applied Statistics, 35 (3): 301-338, 2024. doi www

Articles de conférence

Ndao, M-L.; Youness, G.; Niang, N. and Saporta, G. Enhancing Explainability in Predictive Maintenance : Investigating the Impact of Data Preprocessing Techniques on XAI Effectiveness. In Florida Online Journals. Proceedings of FLAIRS, Florida, United States, Special Track: Explainable, Fair, and Trustworthy AI 37, 2024. doi www

2023

Articles de revue

Liu, R.; Niang, N.; Saporta, G. and Wang, H. Sparse correspondence analysis for large contingency tables. In Advances in Data Analysis and Classification, 17 (4): 1037-1056, 2023. doi www

Mu~noz, J.; Efthimiou, O.; Audigier, V.; de Jong, V. and Debray, T. Multiple imputation of incomplete multilevel data using Heckman selection models. In Statistics in Medicine, 43 (3), 2023. doi www

Bry, X.; Niang, N.; Verron, T. and Bougeard, S. Clusterwise elastic-net regression based on a combined information criterion. In Advances in Data Analysis and Classification, 17: 75-107, 2023. doi www

Youness, G. and Aalah, A. An Explainable Artificial Intelligence Approach for Remaining Useful Life Prediction. In Aerospace, 10 (5): 1-23, 2023. doi www

Articles de conférence

Youness, G.; Phan, Nu U. T. and Boulakia, B. C. BootBOGS: Hands-on optimizing Grid Search in hyperparameter tuning of MLP. In AICCSA 2023 : 20th ACS/IEEE International Conference on Computer Systems and Applications, Giza, Egypt, Track 4: Artificial Intelligence & Cognitive Systems , 2023. www

2022

Articles de revue

Daouda, O. S.; Chevance, A.; Temime, L.; Légeron, P.; Gaillard, R.; Saporta, G. and Hocine, M. A new ranking index to identify the work-related psychosocial factors most impacting mental health: a cross-sectional study. In BMJ Open, 12 (12): e046444, 2022. doi www

Le Guen, V. and Thome, N. Deep Time Series Forecasting with Shape and Temporal Criteria. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 (1): 342-355, 2022. doi www

Boukela, L.; Zhang, G.; Yacoub, M.; Bouzefrane, S. and Baba Ahmadi, S. B. An approach for unsupervised contextual anomaly detection and characterization. In Intelligent Data Analysis, 26 (5): 1185-1209, 2022. doi www

Premachandra, A.; Wang, X.; Saad, M.; Moussawy, S.; Rouzier, R.; Latouche, A. and Albi-Feldzer, A. Erector spinae plane block versus thoracic paravertebral block for the prevention of acute postsurgical pain in breast cancer surgery: A prospective observational study compared with a propensity score-matched historical cohort. In PLoS ONE, 17 (12): 1-13, 2022. doi www

Audigier, V. and Niang, N. Clustering with missing data: which equivalent for Rubin's rules?. In Advances in Data Analysis and Classification, 2022. doi www

Bar-Hen, A. and Audigier, V. An ensemble learning method for variable selection: application to high dimensional data and missing values. In Journal of Statistical Computation and Simulation, 2022. doi www

Articles de conférence

Calem, L.; Ben-Younes, H.; Perez, P. and Thome, N. Diverse Probabilistic Trajectory Forecasting with Admissibility Constraints. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 3478-3484, IEEE, Montreal, Canada, 2022. doi www

Ameur, Y.; Aziz, R.; Audigier, V. and Bouzefrane, S. Secure and non-interactive k-NN classifier using symmetric fully homomorphic encryption. In Privacy in Statistical Databases. PSD 2022. Lecture Notes in Computer Science, vol 13463, pages 142-154, Springer International Publishing, Paris, France, Lecture Notes in Computer Science 13463, 2022. doi www

2021

Articles de revue

Moins-Teisserenc, H.; Cordeiro, D. J.; Audigier, V.; Ressaire, Q.; Benyamina, M.; Lambert, J.; Maki, G.; Homyrda, L.; Toubert, A. and Legrand, M. Severe Altered Immune Status After Burn Injury Is Associated With Bacterial Infection and Septic Shock. In Frontiers in Immunology, 12: 586195, 2021. doi www

Djennane, N.; Yacoub, M.; Aoudjit, R. and Bouzefrane, S. CPU-based prediction with Self Organizing Map in Dynamic Cloud Data Centers. In International Journal of Sensors, Wireless Communications and Control, 11 (7): 733-747, 2021. doi www

Huang, T.; Saporta, G.; Wang, H. and Wang, S. A robust spatial autoregressive scalar-on-function regression with t-distribution. In Advances in Data Analysis and Classification, 15 (1): 57-81, 2021. doi www

Mboup, B.; Le Tourneau, C. and Latouche, A. Insights for Quantifying the Long-Term Benefit of Immunotherapy Using Quantile Regression. In JCO precision oncology (5): 173-176, 2021. doi www

Bar-Hen, A.; Gey, S. and Poggi, J-M. Spatial CART Classification Trees. In Computational Statistics, 2021. doi www

Yin, Y.; Le Guen, V.; Don`a, J.; de Bézenac, E.; Ayed, I.; Thome, N. and Gallinari, P. Augmenting physical models with deep networks for complex dynamics forecasting. In Journal of Statistical Mechanics: Theory and Experiment, 2021 (12): 124012, 2021. doi www

Boukela, L.; Zhang, G.; Yacoub, M.; Bouzefrane, S.; Bagheri, S. and Jelodar, H. A modified LOF based approach for outlier characterization in IoT. In Annals of Telecommunications - annales des télécommunications, 76 (3-4): 145-153, 2021. doi www

Hanczar, B. and Bar-Hen, A. CASCARO: Cascade of classifiers for minimizing the cost of prediction. In Pattern Recognition Letters, 149: 37-43, 2021. doi www

Articles de conférence

Diallo, A. W.; Niang, N. and Ouattara, M. Sparse Subspace K-means. In 3rd IEEE ICDM Workshop on Deep Learning and Clustering. In conjunction with IEEE ICDM 2021 December 7-10, 2021., pages 678-685, IEEE, Auckland, New Zealand, 2021. doi www

Audigier, V.; Niang, N. and Resche-Rigon, M. Clustering sur données incomplètes~: quel modèle d'imputation choisir~?. In EPICLIN 2021 -- 15e Conférence francophone d'épidémiologie clinique -- 28e Journées des statisticiens des centres de lutte contre le cancer, pages S21-S22, Elsevier Masson, Marseille, France, 2021. doi www

2020

Articles de revue

Mirouse, A.; Parrot, A.; Audigier, V.; Demoule, A.; Mayaux, J.; Geri, G.; Mariotte, E.; Bréchot, N.; de Prost, N.; Vautier, M.; Neuville, M.; Bigé, N.; de Montmollin, E.; Cacoub, P.; Resche-Rigon, M.; Cadranel, J. and Saadoun, D. Severe diffuse alveolar hemorrhage related to autoimmune disease: a multicenter study. In Critical Care, 24 (1), 2020. doi www

Russolillo, G. and Saporta, G. Using partial least squares regression for conjoint analysis. In Statistica Applicata - Italian Journal of Applied Statistics, 32: 67-84, 2020. doi www

Wang, Z.; Wang, H.; Wang, S.; Lu, S. and Saporta, G. Linear mixed-effects model for longitudinal complex data with diversified characteristics. In Journal of Management Science and Engineering, 5 (2): 105-124, 2020. doi www

Zaffora, B.; Demeyer, S.; Magistris, M.; Ronchetti, E.; Saporta, G. and Theis, C. A Bayesian framework to update scaling factors for radioactive waste characterization. In Applied Radiation and Isotopes: 109092, 2020. doi www

Wang, H.; Liu, R.; Wang, S.; Wang, Z. and Saporta, G. Ultra-high dimensional variable screening via Gram--Schmidt orthogonalization. In Computational Statistics, 35: 1153-1170, 2020. doi www

Meddis, A.; Latouche, A.; Zhou, B.; Michiels, S. and Fine, J. Meta-analysis of clinical trials with competing time-to-event endpoints. In Biometrical Journal, 62 (3): 712-723, 2020. doi www

Torres, R.; Di Bernardino, E.; Laniado, H. and Lillo, R. On the estimation of extreme directional multivariate quantiles. In Communications in Statistics - Theory and Methods, 49 (22): 5504-5534, 2020. doi www

Desjonquères, C.; Rybak, F.; Ulloa, J. S.; Kempf, A.; Bar-Hen, A. and Sueur, J. Monitoring the acoustic activity of an aquatic insect population in relation to temperature, vegetation and noise. In Freshwater Biology, 65 (1): 107-116, 2020. doi www

Chevance, A. M; Daouda, O. S; Salvador, A.; Légeron, P.; Morvan, Y.; Saporta, G.; Hocine, M. N and Gaillard, R. Work-related psychosocial risk factors and psychiatric disorders: A cross-sectional study in the French working population. In PLoS ONE, 15 (5): e0233472, 2020. doi www

Yala, K.; Niang, N.; Brajard, J.; Mejia, C.; Ouattara, M.; El Hourany, R.; Crépon, M. and Thiria, S. Estimation of phytoplankton pigments from ocean-color satellite observations in the Senegalo--Mauritanian region by using an advanced neural classifier. In Ocean Science, 16 (2): 513-533, 2020. doi www

2019

Articles de revue

Biermé, H.; Bernardino, E. Di; Duval, C. and Estrade, A. Lipschitz-Killing curvatures of excursion sets for two dimensional random fields. In Electronic Journal of Statistics, 13: 536-581, 2019. www

Bougeard, S.; Chauvin, C.; Saporta, G. and Niang, N. Régression multibloc sur classes latentes. Application `a l'usage d'antibiotiques en élevages de lapins. In Epidémiologie et Santé Animale, 76: 43-53, 2019. www

Brogi, G. and Bernardino, E. Di Hidden Markov models for advanced persistent threats. In International Journal of Security and Networks, 14 (4): 181, 2019. doi www

Duchemin, T.; Bar-Hen, A.; Lounissi, R.; Dab, W. and Hocine, M. Hierarchizing Determinants of Sick Leave. In Journal of Occupational and Environmental Medicine, 61 (8): e340-e347, 2019. doi www

Berthelot, G. C.B.; Bar-Hen, A.; Marck, A.; Foulonneau, V.; Douady, S.; Noirez, P.; Zablocki-Thomas, P. B.; Antero, J.; Carter, P. A.; Di Meglio, J-M. and Toussaint, J-F. c. An integrative modeling approach to the age-performance relationship in mammals at the cellular scale. In Scientific Reports, 2019. doi www

Huang, T.; Wang, H. and Saporta, G. 成分数据的空间自回归模型. In Journal of Beijing University of Aeronautics and Astronautics, 45 (1): 93-98, 2019. doi www

Austin, P.; Latouche, A. and Fine, J. A review of the use of time-varying covariates in the Fine-Gray subdistribution hazard competing risk regression model. In Statistics in Medicine, 2019. doi www

Latouche, A.; Andersen, P. K.; Rey, G. and Moreno-Betancur, M. A note on the measurement of socioeconomic inequalities in life years lost by cause of death. In Epidemiology, 30 (4): 569-572, 2019. doi www

Graffeo, N.; Latouche, A.; Le Tourneau, C. and Chevret, S. ipcwswitch: An R package for inverse probability of censoring weighting with an application to switches in clinical trials. In Computers in Biology and Medicine, 111: 103339, 2019. doi www

Wang, H.; Gu, J.; Wang, S. and Saporta, G. Spatial partial least squares autoregression: Algorithm and applications. In Chemometrics and Intelligent Laboratory Systems, 184: 123-131, 2019. doi www

Articles de conférence

Corbière, C.; Thome, N.; Bar-Hen, A.; Cord, M. and Pérez, P. Addressing Failure Prediction by Learning Model Confidence. In Advances in Neural Information Processing Systems 32, pages 2898-2909, Curran Associates, Inc., Vancouver, Canada, 2019. www

Le Guen, V. and Thome, N. Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models. In Advances in Neural Information Processing Systems 32 (NIPS 2019) proceedings, Vancouver, Canada, Advances in Neural Information Processing Systems 32 (NIPS 2019) proceedings 4191--4203, 2019. www

Ben-Younes, H.; Cadene, R.; Thome, N. and Cord, M. BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection. In AAAI 2019 - 33rd AAAI Conference on Artificial Intelligence, Honolulu, United States, 2019. www

Cadene, R.; Ben-Younes, H.; Cord, M. and Thome, N. MUREL: Multimodal Relational Reasoning for Visual Question Answering. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, United States, 2019. www

2018

Articles de revue

Saporta, G. Training data scientists: a few challenges. In International Journal of Data Science and Analytics, 6 (3): 201-204, 2018. doi www

Antero, J.; Pohar-Perme, M.; Rey, G.; Toussaint, J-F. c. and Latouche, A. The heart of the matter: years-saved from cardiovascular and cancer deaths in an elite athlete cohort with over a century of follow-up. In European Journal of Epidemiology, 33 (6): 531-543, 2018. doi www

Woringer, M.; Martiny, N.; Porgho, S.; Bicaba, B. W.; Bar-Hen, A. and Mueller, J. E. Atmospheric dust, early cases, and localized meningitis epidemics in the African meningitis belt: an analysis using high spatial resolution data. In Environmental Health Perspectives, 126 (9): 097002, 2018. doi www

Bougeard, S.; Cariou, V.; Saporta, G. and Niang, N. Prediction for regularized clusterwise multiblock regression. In Applied Stochastic Models in Business and Industry, 34 (6): 852-867, 2018. doi www

Wei, Y.; Wang, H.; Wang, S. and Saporta, G. Incremental modelling for compositional data streams. In Communications in Statistics - Simulation and Computation, 48 (8): 2229-2243, 2018. doi www

Beck, G.; Azzag, H.; Bougeard, S.; Lebbah, M. and Niang, N. A New Micro-Batch Approach for Partial Least Square Clusterwise Regression. In Procedia Computer Science, 144: 239-250, 2018. doi www

Bougeard, S.; Abdi, H.; Saporta, G. and Niang, N. Clusterwise analysis for multiblock component methods. In Advances in Data Analysis and Classification, 12 (2): 285-313, 2018. doi www

Audigier, V.; White, I.; Jolani, S.; Debray, T.; Quartagno, M.; Carpenter, J.; van Buuren, S. and Resche-Rigon, M. Multiple Imputation for Multilevel Data with Continuous and Binary Variables. In Statistical Science, 33 (2): 160-183, 2018. doi www

Chevalier, M.; Thome, N.; Henaff, G. and Cord, M. Classifying low-resolution images by integrating privileged information in deep CNNs. In Pattern Recognition Letters, 116: 29-35, 2018. doi www

Ioannidou, D.; Malherbe, L.; Beauchamp, M.; Saby, N. P. A.; Bonnard, R. and Caudeville, J. Characterization of Environmental Health Inequalities Due to Polyaromatic Hydrocarbon Exposure in France. In International Journal of Environmental Research and Public Health, 15 (12): 2680, 2018. doi www

Massiera, P.; Trinchera, L. and Russolillo, G. 'Evaluation de la présence des capacités marketing. Proposition d'un index multidimensionnel et hiérarchique. In Recherche et Applications en Marketing (French Edition), 33 (1): 31-55, 2018. doi www

Articles de conférence

Durand, P.; Ghorbanzadeh, D. and Jaupi, L. Index Theorem and Applications, a Gentle Review. In Series Proc. Computational Mathematics Computational Geometry and Statistics (CMCGS), pages pp.1-6, Digital Library, Singapore, Singapore, Series Proc. Computational Mathematics Computational Geometry and Statistics (CMCGS) , 2018. doi www

Jaupi, L. New Test Methods to Evaluate Potential Performance of Cosmetic Products. In 20th International Conference Materials, Methods & Technologies 2018, Burgas, Bulgaria, 2018. www

Robert, T.; Thome, N. and Cord, M. HybridNet: Classification and Reconstruction Cooperation for Semi-supervised Learning. In Computer Vision -- ECCV 2018 15th European Conference, Munich, Germany, September 8--14, 2018, Proceedings, pages 158-175, Springer, Munich, Germany, Lecture Notes in Computer Science 11211, 2018. doi www

Durand, P.; Ghorbanzadeh, D. and Jaupi, L. Different approaches for the texture classification of a remote sensing image bank. In Ninth International Conference on Graphic and Image Processing, pages 1-9, SPIE, Qingdao, China, 2018. doi www

2017

Articles de revue

Geronimi, J. and Saporta, G. Variable selection for multiply-imputed data with penalized generalized estimating equations. In Computational Statistics and Data Analysis, 110: 103-114, 2017. doi www

Zaffora, B.; Magistris, M.; Chevalier, J-P.; Saporta, G.; Luccioni, C. and Ulrici, L. A new approach to characterize very-low-level radioactive waste produced at hadron accelerators. In Applied Radiation and Isotopes, 122: 141-147, 2017. doi www

Bernardino, E. Di; Estrade, A. and Le'on, J. R. A test of Gaussianity based on the Euler characteristic of excursion sets. In Electronic Journal of Statistics, 11 (1): 843-890, 2017. doi www

Liberati, C.; Camillo, F. and Saporta, G. Advances in credit scoring: combining performance and interpretation in kernel discriminant analysis. In Advances in Data Analysis and Classification, 11 (1): 121-138, 2017. doi www

Di Bernardino, E. and Rullière, D. A note on upper-patched generators for Archimedean copulas. In ESAIM: Probability and Statistics, 2017. doi www

Articles de conférence

Ben-Younes, H.; Cadene, R.; Cord, M. and Thome, N. MUTAN: Multimodal Tucker Fusion for Visual Question Answering. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2631-2639, IEEE, Venice, Italy, 2017 IEEE International Conference on Computer Vision (ICCV) , 2017. doi www

Ghorbanzadeh, D.; Durand, P. and Jaupi, L. Generating the Skew Normal random variable. In World Congress on Engineering 2017, pages 113-116, London-UK, United Kingdom, 2017. www

Ghorbanzadeh, D.; Durand, P. and Jaupi, L. A method for the Generate a random sample from a finite mixture distributions. In CMCGS 2017. 6th Annual International Conference on Computational Mathematics, Computational Geometry & Statistics, Singapore, Singapore, 2017. doi www

Actions de diffusion scientifique

Annuler

2024

Chapitres d'ouvrage

Abdi, H.; Di Ciaccio, A. and Saporta, G. Old and~New Perspectives on~Optimal Scaling. In Analysis of Categorical Data from Historical Perspectives, pages 131-154, Springer Nature, Behaviormetrics: Quantitative Approaches to Human Behavior 17, 2024. doi www

Articles de conférence

Saporta, G. Codage optimal et encodage: nouveaux regards sur un ancien problème. In CISEM 2024, 4ème Colloque international statistique et économétrie, Mahdia, Tunisia, 2024. www

Niang, N.; Ouattara, M. and Saporta, G. Clustering variables: a survey and some new developments. In Sensometrics 2024, Paris, France, 2024. www

Saporta, G. Optimal Scaling: New Insights Into an Old Problem. In New perspectives on Statistics and Data Science, pages 97-100, Palermo University Press, Palermo, Italy, 2024. www

Audigier, V. Clustering sur données incomplètes avec clusterMI. In 10èmes Rencontres R, Vannes (Bretagne, France), France, 2024. www

Divers

Dieye, N. A.; Niang, N. and Russolillo, G. Sensibilité des indices de qualité d'un classifieur probabiliste. , Poster. www

2023

Chapitres d'ouvrage

Saporta, G. Préface. In Voyage au bout de l'IA: Ce qu'il faut savoir sur l'intelligence artificielle, pages 5-8, De Boeck Supérieur, 2023. www

Ameur, Y.; Bouzefrane, S. and Audigier, V. Application of Homomorphic Encryption in Machine Learning. In Emerging Trends in Cybersecurity Applications, pages 391-410, Springer International Publishing, 2023. doi www

Saporta, G. and Stoltz, G. Gilbert Saporta : un parcours éclectique. In Les nombres, acteurs de changement, pages 85-104, Presses des Mines, Sciences Sociales , 2023. www

Saporta, G. Histoire et enjeux de l'IA. In L'IA éducative. L'intelligence artificielle dans lénseignement supérieur, pages 41-50, Bréal, Thèmes & Débats , 2023. www

Articles de conférence

Niang, N.; Ouattara, M. and Saporta, G. A comparison of some methods for clustering of variables of mixed types. In Programme and Book of Abstracts, pages 85-86, Viana Do Castelo, Portugal, 2023. www

Liu, R.; Niang, N. and Saporta, G. Sparse non-symmetrical correspondence analysis. In CARME - Correspondence Analysis and Related Methods, Bonn, Germany, 2023. www

Béra, M.; Daskalaki, V.; Saporta, G.; Spiliopoulos, K.; Spinakis, K. and Stavropoulos, P. Quantifying the contribution of individual records to the reidentification risk of (pseudo) anonymized datasets. In Proceedings of the 64th ISI World Statistics Congress, Ottawa (Ontario), Canada, 2023. www

Saporta, G. Optimal Scaling: New Insights Into an Old Problem. In ASMDA 2023, 20th Applied Stochastic Models and Data Analysis Conference, Heraklion - Crete, Greece, 2023. www

Audigier, V. and Niang, N. Handling missing data in clustering using multiple imputation. In Ecosta Econometrics and Statistics, Berlin (Germany), Germany, 2023. www

Audigier, V. and Niang, N. Multiple imputation for clustering on incomplete data. In ClaDAG 2023, Salerno (Italy), Italy, 2023. www

2022

Livres

Aimetti, J-P.; Coppet, O. and Saporta, G. Manifeste pour une intelligence artificielle comprise et responsable. Cent Mille Milliards, 2022. www

Gégout-Petit, A.; Maumy-Bertrand, M.; Saporta, G. and Thomas-Agnan, C. Données manquantes. Editions Technip, 2022. www

Chapitres d'ouvrage

Audigier, V. Imputation multiple en grande dimension par analyse factorielle. In Données manquantes, Editions TECHNIP, 2022. www

Saporta, G. Algorithmes de recommandation. In Données manquantes, pages 247-252, Editions Technip, 2022. www

Audigier, V. Gestion des données manquantes par imputation multiple. In Données manquantes, Editions TECHNIP, 2022. www

Gégout-Petit, A.; Maumy-Bertrand, M.; Saporta, G. and Thomas-Agnan, C. Une histoire lacunaire. In Données manquantes, pages 1-27, Editions Technip, 2022. www

Articles de conférence

Saporta, G. Equité et explicabilité des algorithmes :~ définitions, paradoxes et biais. In CISEM 2022, 3eme Colloque international statistique et économétrie, Mahdia, Tunisia, 2022. www

Huang, T. and Saporta, G. Some spatial regression models for functional and compositional data. In Conference in honor of Christine Thomas-Agnan, Toulouse, France, 2022. www

Saporta, G. On some issues related to the fairness of algorithms. In Compstat 2022, 24th International Conference on Computational Statistics, Bologna, Italy, 2022. www

Audigier, V.; Niang, N. and Resche-Rigon, M. Clustering with missing data: which imputation model for which cluster analysis method?. In 17th conference of the International Federation of Classification Societies, Porto, Portugal, 2022. www

Divers

Charrier, T.; Fresneau, B.; Haddy, N.; Schwartz, B.; Journy, N.; Demoor-Goldschmidt, C.; Diallo, I.; Surun, A.; Aerts, I.; Doz, F. c.; Souchard, V.; Vu-Bezin, G.; Lemler, S.; Letort, V.; Rubino, C.; Fresneau, B.; Haddy, N.; Schwartz, B.; Journy, N.; Demoor-Goldschmidt, C.; Diallo, I.; Surun, A.; Aerts, I.; Doz, F. c.; Souchard, V.; Vu-Bezin, G.; Letort, V.; Rubino, C.; de Vathaire, F.; Latouche, A. and Allodji, R. S Increased Cardiac Risk After a Second Malignant Neoplasm Among Childhood Cancer Survivors, a FCCSS Study. , Poster. www

2021

Livres

Bertrand, F.; Saporta, G. and Thomas-Agnan, C. Statistique et causalité. Editions Technip, 2021. www

Chapitres d'ouvrage

Huang, T.; Saporta, G. and Wang, H. A Spatial Durbin Model for Compositional Data. In Advances in Contemporary Statistics and Econometrics, pages 471-488, Springer Nature, 2021. doi www

Articles de conférence

Saporta, G. Sparse Correspondence Analysis for Contingency Tables. In Celebrating 40 years of Greek Statistical Institute 1981-2021, Athènes, Greece, 2021. www

Saporta, G. Interprétabilité des modèles prédictifs. In ASI 11. 11ème Colloque International sur l'Analyse Statistique Implicative, Belfort, France, 2021. www

Niang-Keita, N.; Ouattara, M. and Saporta, G. Sparse Divisive Feature Clustering. In Program and Book of Abstracts, pages 75-76, Covilh~a, Portugal, Program and Book of Abstracts , 2021. www

Hassini, H.; Niang, N. and Audigier, V. SOM-based clusterwise regression. In Data Science, Statistics and Visualisation, Rotterdam, Netherlands, 2021. www

Fateri Gouard, N.; Niang, N. and Ouattara, M. Unbiased Feature selection in Random Forests using Consensus Feature Clustering. In Data Science, Statistics & Visualisation(DSSV) and European Conference on Data Analysis (ECDA), Rotterdam, Netherlands, 2021. www

Saporta, G. From the triumph of black boxes to the right to understand and the search for fairness. In ASMDA 2021, Athens, Greece, 2021. www

Bougeard, S.; Bry, X.; Verron, T. and Niang, N. Combined-information criterion for clusterwise elastic-net regression. Application to omic data. In 8th Channel Network Conference, Paris, France, 2021. www

Boukela, L.; Zhang, G.; Yacoub, M. and Bouzefrane, S. A near-autonomous and incremental intrusion detection system through active learning of known and unknown attacks. In 2021 International Conference on Security, Pattern Analysis, and Cybernetics（SPAC), pages 374-379, IEEE, Chengdu, China, 2021. doi www

Audigier, V. and Niang, N. Cluster analysis after multiple imputation. In ASMDA 2021, Athènes, Greece, 2021. www

2020

Livres

Diday, E.; Guan, R.; Saporta, G. and Wang, H. Advances in Data Science. Symbolic, Complex and Network Data. ISTE-WILEY, Big Data, Artificial Intelligence and Data Analysis , 2020. www

Articles de conférence

Saporta, G. About Interpreting and Explaining Machine Learning and Statistical Models. In SMTDA 2020; 6th Stochastic Modeling Techniques and Data Analysis International Conference, Barcelone (virtual), Spain, 2020. www

Rapports

Bauer, A.; Faron, O.; Richier, J.; Bar-Hen, A.; Béra, M.; Cappelletti, L.; Collomb, A.; Durance, P.; Fleury-Perkins, C.; Fontanet, A.; Gnesotto, N.; Réau, B. and Trainar, P. Chaire Nouveaux risques : rapport 2020. Technical Report, Conservatoire national des arts et métiers (Cnam) ; Allianz France, 2020.

2019

Chapitres d'ouvrage

Saporta, G. 50 Years of Data Analysis: From Exploratory Data Analysis to Predictive Modeling and Machine Learning. In Data Analysis and Applications 1. Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, ISTE-Wiley, Data Analysis and Applications , 2019. doi www

Mariadassou, M.; Bar-Hen, A. and Kishino, H. Tree Evaluation and Robustness Testing. In Encyclopedia of Bioinformatics and Computational Biology, pages 736-745, Elsevier, 2019. doi www

Articles de conférence

Saporta, G.; Liu, R.; Niang Keita, N. and Wang, H. Sparse Methods for Unsupervised Data Analysis. In The 4th International Symposium on Interval Data Modelling (SIDM 2019), Pékin, China, 2019. www

Saporta, G.; Liu, R.; Niang Keita, N. and Wang, H. Sparse Correspondence Analysis. In ASMDA 2019. 18th Conference of the Applied Stochastic Models and Data Analysis International Society, Florence, Italy, 2019. www

Daouda, O.; Chevance, A.; Salvador, A.; Légeron, P.; Morvan, Y.; Saporta, G.; Hocine, M. and Gaillard, R. Impact of work-related psychosocial factors on mental health: A cross-sectional study in the French working population. In Work, Stress and Health 2019 Conference of the American Psychological Association, Philadelphia, United States, 2019. www

Faucheux, L.; Resche-Rigon, M.; Audigier, V.; Curis, E.; Soumelis, V. and Chevret, S. Clustering with missing data: Pooling multiple imputation results with consensus clustering. In 40th Annual Conference of the International Society for Clinical Biostatistics, Leuven (BE), Belgium, 2019. www

Jaupi, L. Combinations of Shewhart and CUSUM Control Charts for Individual Observations. In MMT2019, Burgas, Bulgaria, 2019. www

Audigier, V. and Resche Rigon, M. micemd: a smart multiple imputation R package for missing multilevel data. In UseR!2019, Toulouse, France, 2019. www

Saporta, G. De l'analyse exploratoire `a la modélisation prédictive: le chemin de la science des données. In Montpellier: berceau de la Data Science. Colloque en l'honneur du Pr. Yves Escoufier, Montpellier, France, 2019. www

Milliet de Faverges, M.; Picouleau, C.; Russolillo, G.; Merabet, B. and Houzel, B. Impact of calibration of perturbations in simulation: the case of robustness evaluation at a station. In RailNorrk"oping 2019. 8th International Conference on Railway Operations Modelling and Analysis (ICROMA), Norrk"oping, Sweden, 2019. www

Saporta, G. Science des données, données massives : défis et nouveaux métiers. In CISEM 2019, Mahdia, Tunisia, 2019. www

Divers

Daouda, O. S.; Temime, L.; Saporta, G. and Hocine, M. How to prioritize work-related psychosocial factors impacting mental health? Regression and random forest approaches. , Poster. www

Daouda, O. S.; Chevance, A.; Saporta, G.; Gaillard, R. and Hocine, M. Impact des facteurs de risque psychosociaux liés au travail sur la santé mentale : étude transversale sur la population active franc caise. , Poster. www

2018

Livres

Maumy-Bertrand, M.; Saporta, G. and Thomas-Agnan, C. Apprentissage statistique et données massives. Editions Technip, 2018. www

Chapitres d'ouvrage

Ghorbanzadeh, D.; Durand, P. and Jaupi, L. Application and generation of the univariate Skew Normal random variable. In Transactions on Engineering Technologies. 25th World Congress on Engineering, pages pp. 129-138, Springer, Transactions on Engineering Technologies. 25th World Congress on Engineering , 2018. doi www

Saporta, G. From Conventional Data Analysis Methods to Big Data Analytics. In Big Data for Insurance Companies, pages 27-41, John Wiley & Sons, Inc., 2018. doi www

Saporta, G. Une brève histoire de l'apprentissage. In Apprentissage statistique et données massives, Editions Technip, 2018. www

Articles de conférence

Milliet de Faverges, M.; Russolillo, G.; Picouleau, C.; Merabet, B. and Houzel, B. Modelling passenger train arrival delays with Generalized Linear Models and its perspective for scheduling at main stations. In 8th International Conference on Railway Engineering (ICRE 2018), IET, London, United Kingdom, 2018. doi www

Milliet de Faverges, M.; Russolillo, G.; Picouleau, C.; Merabet, B. and Houzel, B. Estimating long-term delay risk with Generalized Linear Models. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2911-2916, IEEE, Maui, France, 2018. doi www

Huang, T.; Saporta, G.; Wang, H. and Wang, S. SFLM: A mix of a Functional Linear Model and of a Spatial Autoregressive Model for spatially correlated functional data. In CroNoS Workshop on Functional Data Analysis, Iasi, Romania, 2018. www

Jaupi, L. Statistical methods to study consistency between declared and measured values on waste packages. In COMPSTAT 2018, The 23rd International Conference on Computational Statistics, Iasi, Romania, 2018. www

Saporta, G. Clusterwise Methods: a Synthesis and New Developments. In Homenagem a Fernando da Costa Nicolau, pages 23-26, Lisbonne, Portugal, Homenagem a Fernando da Costa Nicolau , 2018. www

Divers

Niang, N. Multiblock consensus clustering. , Poster. www

Rapports

Comité Scientifique Du Haut Conseil Des Biotechnologies, .; Angevin, F.; Bagnis, C.; Bar-Hen, A.; Barny, M. A. M. A.; Bellivier, F.; Berny, P.; Boireau, P.; Brévault, T.; Chauvel, B. B.; Coléno, F. c.; Couvet, D.; Dassa, E.; Eychenne, N.; Franche, C.; Guerche, P.; Guillemain, J.; Hernandez Raquet, G.; Jestin, A.; Klonjkowski, B.; Lavielle, M.; Le Corre, V. V.; Lemaire, O. O.; Lereclus, D.; Maximilien, R.; Meurs, E.; Moreau de Bellaing, C.; Naffakh, N.; Négre, D.; Noyer, J-L.; Ochatt, S.; Pages, J-C.; Parzy, D.; Regnault-Roger, C.; Renard, M.; Saindrenan, P.; Simonet, P.; Troadec, M-B.; Vaissière, B.; de Verneuil, H. and Vilotte, J-L. Commentaires sur le projet de document consensus de l'OCDE sur les considérations environnementales relatives `a l'évaluation des risques associé. Paris, le 23 mai 2018. Technical Report, Haut Conseil des Biotechnologies, 2018.

Comité Scientifique Du Haut Conseil Des Biotechnologies, .; Angevin, F.; Bagnis, C.; Bar-Hen, A.; Barny, M. A. M. A.; Bellivier, F.; Berny, P.; Boireau, P.; Brévault, T.; Chauvel, B. B.; Coléno, F. c.; Couvet, D.; Dassa, E.; Eychenne, N.; Franche, C.; Guerche, P.; Guillemain, J.; Hernandez Raquet, G.; Jestin, A.; Klonjkowski, B.; Lavielle, M.; Le Corre, V. V.; Lemaire, O. O.; Lereclus, D.; Maximilien, R.; Meurs, E.; Moreau de Bellaing, C.; Naffakh, N.; Négre, D.; Noyer, J-L.; Ochatt, S.; Pages, J-C.; Parzy, D.; Regnault-Roger, C.; Renard, M.; Saindrenan, P.; Simonet, P.; Troadec, M-B.; Vaissière, B.; de Verneuil, H. and Vilotte, J-L. Avis en réponse `a la saisine HCB - dossier C/NL/06/01_001. Paris, le 17 octobre 2018. Technical Report, Haut Conseil des Biotechnologies, 2018.

2017

Livres

Bertrand, F.; Droesbeke, J-J.; Saporta, G. and Thomas-Agnan, C. Model Choice and Model Aggregation. Editions Technip, 2017. www

Chapitres d'ouvrage

Saporta, G. Des méthodes classiques d'analyse des données au Big Data. In Le big data pour les compagnies d'assurance, pages 41-55, ISTE, Innovation, Entrepreneuriat et Gestion Série Big Data, IA et analyse de données, 2017. www

Petrarca, F.; Russolillo, G. and Trinchera, L. Integrating Non-metric Data in Partial Least Squares Path Models: Methods and Application. In Partial Least Squares Path Modeling, pages 259-279, Springer International Publishing, 2017. doi www

Articles de conférence

Hocine, M.; Feropontova, N.; Niang, N.; Ait Bouziad, K. and Saporta, G. Importance of factors contributing to work-related stress: comparison of four metrics. In ASMDA 2017, London, United Kingdom, 2017. www

Bougeard, S.; Niang-Keita, N.; Preda, C. and Saporta, G. Clusterwise Sparse PLS. In PLS'17, Macao, Macau SAR China, 2017. www

Jaupi, L. Dual-use performance measures for customer service evaluation in bike-shared systems. In 61st World Statistics Congress -- WSC-ISI2017, Marrakech, Morocco, 2017. www

Jaupi, L. Using Big Data to Display the Quality of Service Provided on a Bike-Shared Network. In 19th International Conference Materials, Methods & Technologies 2017, Burgas, Bulgaria, 2017. www

Renosh, P.; Jourdin, F.; Charantonis, A. A.; Yala, K.; Badran, F.; Thiria, S.; Guillou, N. and Gohin, F. Construction of multi-year time series profiles of suspended particulate inorganic matter concentrations from highly dynamic coastal waters of the English Channel using self-organizing maps and hidden Markov model. In Third International Ocean Colour Science Meeting, Lisbon, Portugal, 2017. www

Saporta, G. Clusterwise methods, past and present. In ISI 2017 61st World Statistics Congress, Marrakech, Morocco, 2017. www

Actions de valorisation

Annuler

Thèses/HDR

Annuler

2018

Thèses et habilitations

Brogi, G. Real-time detection of Advanced Persistent Threats using Information Flow Tracking and Hidden Markov Models. Ph.D. Thesis, Conservatoire national des arts et metiers - CNAM, 2018.

Ioannidou, D. Characterization of environmental inequalities due to Polyaromatic Hydrocarbons in France : developing environmental data processing methods to spatialize exposure indicators for PAH substances. Ph.D. Thesis, Conservatoire national des arts et metiers - CNAM, 2018.

Projets en cours

DOTATION 2025 MSDMA

Nom complet: DOTATION 2025 MSDMA: DOTATION 2025 MSDMA - Financeur: Laboratoire Cédric
Durée: January 2025 - December 2025
Résumé:

FASCINATION-SHOM

Nom complet: FASCINATION-SHOM: FASCINATION-SHOM - Financeur: Etablissement public administratif SHOM
Durée: September 2023 - September 2027
Résumé: Représentation géostatistique de champs de célébrité par paysages sonores homogènes

Projets passés

- Nom complet: CIFRE UTAC 2021-2024
- Durée: July 2021 - July 2024
- Résumé: L'objectif est la recherche de méthodes d’analyse statistique et d’algorithmes d’apprentissage automatique et intelligence artificielle pour la surveillance du contrôle technique automobile.
- Nom complet: Méthodes statistiques, data-mining et apprentissage 2021
- Durée: December 2020 - December 2021
- Résumé:
- Nom complet: conception et Développement des Jeux Pervasifs Adaptables avec la prise en compte des Etats Emotionnels des Joueurs
- Durée: January 2022 - December 2022
- Résumé: Le projet vise à prendre en considération les états émotionnels des utilisateurs en temps réel pour mieux adapter leurs environnements, leurs interactions... En particulier dans ce projet, ceci est appliqué en milieu pervasif.
- Nom complet: PRivAcy-preserving LocalIzation with MachiNE Learning in IoT
- Durée: January 2022 - December 2022
- Résumé: Le projet vise à proposer une solution de localisation des objets connectés tout en assurant la sécurité de cette information en utilisant des algorithmes de machien learning.
- Nom complet: Soutien équipe MSDMA 2022
- Durée: January 2022 - December 2022
- Résumé:
- Nom complet: Dotation MSDMA 2023
- Durée: January 2023 - December 2023
- Résumé:
- Nom complet: PEX Praline 2023
- Durée: January 2023 - December 2023
- Résumé:
- Nom complet: Dotation MSDMA 2024
- Durée: January 2024 - December 2024
- Résumé:
- Nom complet: PEX Jurai 2024
- Durée: January 2024 - December 2024
- Résumé:
- Nom complet: MEDIATECH Nafise GOUARD
- Durée: May 2018 - April 2021
- Résumé:
- Nom complet: IMPACT MDS
- Durée: November 2019 - October 2021
- Résumé:
- Nom complet: PRESIDIO
- Durée: January 2015 - July 2019
- Résumé:
- Nom complet: EARLY METRICS
- Durée: May 2017 - September 2019
- Résumé:
- Nom complet: SOCIETE EARLY METRICS 2
- Durée: May 2021 - February 2023
- Résumé:
- Nom complet: Analyse de données issues des patients COVID-19/embolie pulmonaire
- Durée: June 2020 - June 2021
- Résumé:
- Nom complet: NEZ ELECTRONIQUE
- Durée: June 2017 - May 2018
- Résumé:
- Nom complet: CRM SERVICE 2017-2018
- Durée: June 2017 - June 2018
- Résumé:
- Nom complet: CIFRE VELVET
- Durée: December 2019 - December 2023
- Résumé:
- Nom complet: MIXQUEBEC
- Durée: February 2022 - December 2023
- Résumé: