<?xml version='1.0' encoding='UTF-8'?>
<rss version='2.0' xmlns:atom='http://www.w3.org/2005/Atom'>
	<channel>
		<title>CEDRIC - MSDMA RSS feed</title>
		<atom:link rel='self' href='http://cedric.cnam.fr/rss/MSDMA.xml'/>
		<atom:link href='http://cedric.cnam.fr/'/>
		<language>fr</language>
		<lastBuildDate>Tue, 14 May 2013 14:12:05 +0200</lastBuildDate>
		<description></description>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2742</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2742</link>
			<title>Paper - Classification multi blocs pondérée basée sur les cartes topologiques auto-organisées (ConSOM)</title>
			<description></description>
			<pubDate>Tue, 14 May 2013 14:12:05 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2741</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2741</link>
			<title>Paper - STATIS BASED MULTIBLOCK CLUSTERING</title>
			<description></description>
			<pubDate>Tue, 14 May 2013 12:02:05 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2740</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2740</link>
			<title>Paper - Soft Subpace clustering pour données multiblocs
basée sur les cartes topologiques auto-organisées
SOM : 2S-SOM</title>
			<description></description>
			<pubDate>Tue, 14 May 2013 11:50:19 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2739</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2739</link>
			<title>Paper - Classification multi blocs basée sur les cartes topologiques</title>
			<description></description>
			<pubDate>Tue, 14 May 2013 11:43:53 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2738</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2738</link>
			<title>Paper - CLUSTERING INDIVIDUALS DESCRIBED BY MULTI BLOCK VARIABLES   </title>
			<description>We address the problem of clustering individuals described by   several homogeneous blocks of variables. Reformulating it as problem of consensus of partitions, we propose a method based on the three way method STATIS to find a unique partition of the individuals. A real example on environmental data illustrates the proposed method</description>
			<pubDate>Tue, 14 May 2013 11:31:22 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2737</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2737</link>
			<title>Paper - A competing risks approach for nonparametric estimation of transition probabilities in a non-Markov illness-death model</title>
			<description>Competing risks model time to first event and type of first event. An example from hospital epidemiology is the incidence of hospital-acquired infection, which has to account for hospital discharge of non-infected patients as a competing risk. An illness-death model would allow to further study hospital outcomes of infected patients. Such a model typically relies on a Markov assumption. However, it is conceivable that the future course of an infected patient does not only depend on the time since hospital admission and current infection status but also on the time since infection. We demonstrate how a modified competing risks model can be used for nonparametric estimation of transition probabilities when the Markov assumption is violated. </description>
			<pubDate>Sat, 11 May 2013 15:15:56 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/labo/membre/view?id=1792</guid>
			<link>http://cedric.cnam.fr/index.php/labo/membre/view?id=1792</link>
			<title>Job - Nouveau   : Fan Jia</title>
			<description>a</description>
			<pubDate>Thu, 04 Apr 2013 17:07:23 +0200</pubDate>
			<category>MSDMA</category>
			<category>Job</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2720</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2720</link>
			<title>Paper - Modèles à variables latentes et modèles de mélange</title>
			<description>Cet ouvrage est consacré à un domaine de recherche porteur de nombreux développements, tout particulièrement depuis une quinzaine d'années.
L'une des innovations des modèles à variables latentes est de prendre en compte des variables inobservables, causes de phénomènes qui, eux, peuvent s'observer directement.
Cette formalisation permet de fédérer de nombreuses méthodes utilisées dans des domaines très divers de la statistique :
? l'analyse factorielle,
? l'analyse en classes latentes,
? les modèles structurels où des blocs de variables sont expliqués chacun par des variables latentes, elles-mêmes reliées entre elles par un graphe de causalité,
? les modèles de mélange fini de distributions.
Cet ouvrage est le fruit de la collaboration entre spécialistes parmi les plus réputés </description>
			<pubDate>Mon, 11 Mar 2013 23:26:25 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2719</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2719</link>
			<title>Paper - A survey of some sparse methods for high-dimensional data</title>
			<description>High dimensional data means that the number of variables p if far larger than the number of observations n. This occurs in several fields such as genomic data or chemometrics. This didactic talk starts from a survey of various solutions in linear regression and present afterwards their extensions to unsupervised « sparse » methods for principal components analysis (PCA) and multiple correspondence analysis (MCA).
  
When p>n  the OLS estimator does not exist for linear regression. Since it is a case of forced multicollinearity, one may use regularized techniques such as ridge regression, principal component regression or PLS regression: these methods provide rather robust estimates through a dimension reduction approach or with explicit (or not) constraints on the regression coefficients. The fact that all the predictors are kept is often considered as a positive point. 
However if p>>n it becomes a drawback since a combination of all variables cannot be interpreted. Sparse combinations, ie with a large number of zero coefficients are preferred. Lasso, elastic net, sparse PLS perform simultaneously  regularization and  variable selection thanks to non quadratic penalties: L1, SCAD etc. We will present variants such as the group-lasso when the variables are structured in blocks.

In PCA, the singular value decomposition shows that if we regress principal components onto the input variables, the vector of regression coefficients is equal to the factor loadings. It suffices to adapt sparse regression techniques to get sparse versions of PCA and of PCA with groups of variables. We conclude by a presentation of a sparse version of Multiple Correspondence Analysis. 
</description>
			<pubDate>Sun, 10 Mar 2013 18:55:48 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2718</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2718</link>
			<title>Paper - YAO poster presentation at Colloque National sur l'Assimilation de Données, Paris, France, December 1-2.</title>
			<description>YAO poster presentation.</description>
			<pubDate>Sun, 03 Mar 2013 12:23:39 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2717</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2717</link>
			<title>Paper - Formalisation et automatisation de YAO, générateur de code pour l'assimilation variationnelle de données.</title>
			<description>Variational data assimilation 4D-Var is a well-known technique used in geophysics, and in particular in meteorology and oceanography. This technique consists in estimating the control parameters of a direct numerical model, by minimizing a cost function which measures the mis?t between the forecast values and some actual observations. The minimization, which is based on a gradient method, requires the computation of the adjoint model (product of the transpose Jacobian matrix and the derivative vector of the cost function at the observation points). In order to perform the 4DVar technique, we have to cope with complex program implementations, in particular concerning the adjoint model, the parallelization of the code and an ef?cient memory management.

To address these dif?culties and to facilitate the implementation of 4D-Var applications, LOCEAN is developing the YAO framework. YAO proposes to represent a direct model with a computation ?ow graph called modular graph. Modules depict computation units and edges between modules represent data transfer. Description directives proper to YAO allow a user to describe its direct model and to generate the modular graph associated to this model. YAO contains two core algorithms. The ?rst one is a forward propagation algorithm on the graph that computes the output of the numerical model ; the second one is a back propagation algorithm on the graph that computates the adjoint model. The main advantage of the YAO framework, is that the direct and adjoint model programming codes are automatically generated once the modular graph has been conceived by the user. Moreover, YAO allows to cope with many scenarios for running different data assimilation sessions.

This thesis introduces a computer science research on the YAO framework. In a ?rst step, we have formalized in a more general way the existing YAO speci?cations. Then algorithms allowing the automatization of some tasks have been proposed such as the automatic generation of an ?optimal? computational ordering and the automatic parallelization of the generated code on shared memory architectures using OpenMP directives. This thesis permits to lay the foundations which, at medium term, will make of YAO a general and operational platform for data assimilation 4D-Var, allowing to process applications of high dimensions.</description>
			<pubDate>Sun, 03 Mar 2013 11:36:54 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/labo/membre/view?id=1782</guid>
			<link>http://cedric.cnam.fr/index.php/labo/membre/view?id=1782</link>
			<title>Job - Nouveau   : Henri Bertholon</title>
			<description>a</description>
			<pubDate>Fri, 01 Mar 2013 14:07:30 +0100</pubDate>
			<category>MSDMA</category>
			<category>Job</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2715</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2715</link>
			<title>Paper - method for the retrieval of ocean vertical profiles using self-organizing maps and hidden Markov models - Application on Ocean Colour Satellite Image Inversion</title>
			<description>This paper presents a statistical inversion method used to infer 3D data from 2D imaging. The methodology is based on a combination of the Self Organising Maps and the Hidden Markov Models. The Self-Organising Maps generate the typical situations of the emissions and the hidden states of the Hidden Markov Model. The method has been validated by inferring the oceanic vertical profiles of Chlorophyll-A based on sea-surface data.</description>
			<pubDate>Wed, 27 Feb 2013 16:30:46 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2714</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2714</link>
			<title>Paper - YAO: A Generator of Parallel Code for Variational Data Assimilation Applications</title>
			<description>Variational data assimilation consists in estimating
control parameters of a numerical model in order to minimize
the misfit between the forecast values and the actual
observations. The YAO framework is a code generator that
facilitates, especially for the adjoint model, the writing and
the generation of a variational data assimilation program for
a given numerical application. In this paper we present how
the modular graph specific to YAO enables the automatic and
efficient parallelization of the generated code with OpenMP on
shared memory architectures. Thanks to this modular graph
we are also able to completely avoid the data race conditions
(write/write conflicts). Performance tests with actual applications
demonstrates good speedups on a multicore CPU.</description>
			<pubDate>Wed, 27 Feb 2013 16:21:03 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2713</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2713</link>
			<title>Paper - YAO: A Generator of Parallel Code for Variational Data Assimilation Applications</title>
			<description>Variational data assimilation consists in estimating control parameters of a numerical model in order to minimize the misfit between the forecast values and the actual observations. The YAO framework is a code generator that facilitates, especially for the adjoint model, the writing and the generation of a variational data assimilation program for a given numerical application. In this paper we present how the modular graph specific to YAO enables the automatic and efficient parallelization of the generated code with OpenMP on shared memory architectures. Thanks to this modular graph we are also able to completely avoid the data race conditions (write/write conflicts). Performance tests with actual applications demonstrates good speedups on a multicore CPU.</description>
			<pubDate>Wed, 27 Feb 2013 16:10:58 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2710</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2710</link>
			<title>Paper - Automatic segmentation of textures on a database of remote-sensing images and classification by neural network</title>
			<description>.</description>
			<pubDate>Wed, 27 Feb 2013 11:56:54 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2701</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2701</link>
			<title>Paper - Methodology for the evaluation of vascular surgery manpower in France</title>
			<description>Objectives: The French population is growing and ageing. It is expected to increase by 2.7% by 2020, and the number of individuals over 65 years of age is expected to increase by 3.3million, a 33% increase, between 2005 and 2020. As the number of vascular surgery procedures is closely associated with the age of a population, it is anticipated that there will be a significant increase in the workload of vascular surgeons. Study design: A model is presented to predict changes in vascular surgery activity according to population ageing, including other parameters that could affect workload evolution.
Methods: Three types of arterial procedures were studied: infrarenal abdominal aortic
aneurysm (AAA) surgery, peripheral arterial occlusive disease (PAOD) procedures and
carotid artery (CEA) procedures. Data were selected and extracted from the national PMSI
(Medical Information System Program) database. Data obtained from 2000 were used to
predict data based on an ageing population for 2008. From this model, a weighted index
was defined for each group by comparing expected and observed workloads.
Results: According to the model, over this 8-year period, there was an overall increase in
vascular procedures of 52.2%, with an increase of 89% in PAOD procedures. Between 2000
and 2009, the total increase was 58.0%, with 3.9% for AAA procedures, 101.7% for PAOD
procedures and 13.2% for CEA procedures. The weighted model based on an ageing population
and corrected by a weighted factor predicted this increase.
Conclusion: This weighted model is able to predict the workload of vascular surgeons over
the coming years. An ageing population and other factors could result in a significant
increase in demand for vascular surgical services.</description>
			<pubDate>Mon, 21 Jan 2013 14:33:46 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2700</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2700</link>
			<title>Paper - Regression modeling of the cumulative incidence function with missing causes of failure using pseudo-values</title>
			<description>Competing risks arise when patients may fail from several causes. Strategies for modeling cause-specific functionals
often assume that the cause of failure is known for all patients, but this is seldom the case. Several authors have
addressed the problem of modeling the cause-specific hazard rates with missing causes of failure. In contrast, direct
modeling of the cumulative incidence function has received little attention. We provide a general framework for
regression modeling of this function in the missing cause setting, encompassing key models such as the Fine and
Gray and additive models, by considering two extensions of the Andersen-Klein pseudo-value approach. The first
extension is a novel inverse probability weighting method, while the second extension is based on a previously
proposed multiple imputation procedure. The gain in using these approaches with small samples was evaluated
in an extensive simulation study. Asymptotic properties were verified and variance estimators were suggested and
evaluated. We analyzed the data from an ECOG breast cancer treatment clinical trial to illustrate the practical
value and ease of implementation of the proposed methods.</description>
			<pubDate>Fri, 18 Jan 2013 16:48:12 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/labo/membre/view?id=1742</guid>
			<link>http://cedric.cnam.fr/index.php/labo/membre/view?id=1742</link>
			<title>Job - Nouveau   : Henri Wallard</title>
			<description>a</description>
			<pubDate>Wed, 05 Dec 2012 10:45:36 +0100</pubDate>
			<category>MSDMA</category>
			<category>Job</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2686</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2686</link>
			<title>Paper - A generalisation of sparse PCA to multiple correspondence analysis</title>
			<description>Principal components analysis (PCA) for numerical variables and multiple correspondence analysis (MCA) for categorical variables are wellknown dimension reduction techniques. PCA and MCA provide a small number of informative dimensions: the components. However, these
components are a combination of all original variables, hence some dif?culties in the interpretation. Factor rotation (varimax, quartimax etc.) has a long history in factor analysis for obtaining simple structure, ie looking for combinations with a large number of coef?cients either close to zero or to 1 or -1. Only recently, rotations have been used in Multiple Correspondence Analysis. Sparse PCA and group sparse PCA are new techniques providing components which are combinations of few original variables: rewriting PCA as a regression problem, null loadings are obtained by imposing the lasso (or similar) constraint on the regression coef?cients. When the data matrix has a natural block structure, group sparse PCA give zero coef?cients to entire blocks of variables. Since MCA is a special kind of PCA with blocks of indicator variables, we de?ne sparse MCA as an extension of group sparse PCA. We present an application of sparse MCA to genetic data (640 SNP?s with 3 categories measured on 502 women)and a comparison between sparse and rotated components.</description>
			<pubDate>Wed, 05 Dec 2012 10:26:26 +0100</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2663</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2663</link>
			<title>Paper - Identification of microRNA-regulated gene networks by expression analysis of target genes.</title>
			<description>MicroRNAs (miRNAs) and transcription factors control eukaryotic cell proliferation, differentiation, and metabolism through their specific gene regulatory networks. However, differently from transcription factors, our understanding of the processes regulated by miRNAs is currently limited. Here, we introduce gene network analysis as a new means for gaining insight into miRNA biology. A systematic analysis of all human miRNAs based on Co-expression Meta-analysis of miRNA Targets (CoMeTa) assigns high-resolution biological functions to miRNAs and provides a comprehensive, genome-scale analysis of human miRNA regulatory networks. Moreover, gene cotargeting analyses show that miRNAs synergistically regulate cohorts of genes that participate in similar processes. We experimentally validate the CoMeTa procedure through focusing on three poorly characterized miRNAs, miR-519d/190/340, which CoMeTa predicts to be associated with the TGF? pathway. Using lung adenocarcinoma A549 cells as a model system, we show that miR-519d and miR-190 inhibit, while miR-340 enhances TGF? signaling and its effects on cell proliferation, morphology, and scattering. Based on these findings, we formalize and propose co-expression analysis as a general paradigm for second-generation procedures to recognize bona fide targets and infer biological roles and network communities of miRNAs.</description>
			<pubDate>Thu, 27 Sep 2012 10:46:14 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2662</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2662</link>
			<title>Paper - Non-Metric Partial Least Squares</title>
			<description>In this paper I review covariance-based Partial Least Squares (PLS) methods, focusing on common features of their respective algorithms and optimization criteria. I then show how these algorithms can be adjusted for use as optimal scaling tools. Three new PLS-type algorithms are proposed for the analysis of one, two or several blocks of variables: the Non-Metric NIPALS, the Non-Metric PLS Regression and the Non-Metric PLS Path Modeling, respectively. These algorithms extend the applicability of PLS methods to data measured on different measurement scales, as well as to variables linked by non-linear relationships</description>
			<pubDate>Thu, 27 Sep 2012 09:44:45 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2646</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2646</link>
			<title>Paper - Modélisation d'un code numérique par un processus gaussien ? Application au calcul d'une courbe de probabilité de dépasser un seuil</title>
			<description>La modélisation statistique d'un code numérique par processus gaussien permet de
définir un cadre bayésien d'analyse d'un code numérique. Dans l'objectif de la prop-
agation des incertitudes, le couplage du processus gaussien avec un plan d'expériences
numériques permet de prendre en compte des relations complexes (corrélations linéaires,
non linéarite,...) entre les variables, à partir d'un nombre d'appels au code limité, afin
d'évaluer un indicateur en sortie du code. Cette démarche est ici adaptée au domaine
du Contrôle Non Destructif (CND) pour lequel elle constitue une méthode efficace et une
avancée conceptuelle de traitement des incertitudes. Dans un premier temps on présente
les enjeux relatifs à une modélisation statistique en CND dans le but d'obtenir des courbes
de probabilité de détection de défauts. Puis on présente une méthode d'estimation des
processus gaussiens par échantillonnage de Gibbs permettant une construction originale
de ces courbes a posteriori. Enfin la démarche complète est illustrée sur le cas d'une in-
spection d'une plaque de titane par une méthode d'inspection par courants de Foucault.</description>
			<pubDate>Wed, 19 Sep 2012 13:52:06 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2627</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2627</link>
			<title>Paper - Kernel discrimination and explicative features: an operative approach</title>
			<description>Kernel-based methods such as SVMs and LS-SVMs have been successfully used for solving various supervised classication and pattern recognition problems in machine learning. Unfortunately, they are heavily dependent on the choice of the optimal kernel function and from tuning parameters. Their solutions, in fact, suer of complete lack of interpretation in terms of input variables. That is not a banal problem, especially when the learning task is related with a critical asset of a business, like credit scoring, where deriving a classification rule has to respect an international regulation. 
The following strategy is proposed for solving problems using categorical predictors: replace the predictors by components issued from MCA, choice of the best kernel among several ones (linear ,RBF, Laplace, Cauchy, etc.), approximation of the classifier through a linear model. The loss of performance due to such approximation is balanced by better interpretability for the end user, employed in order to understand and to rank the influence of each category of the variables set in the prediction. This strategy has been applied to real risk-credit data of small enterprises. Cauchy kernel was found the best and leads to a score much more efficient than classical ones, even after approximation.</description>
			<pubDate>Wed, 22 Aug 2012 10:09:13 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2626</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2626</link>
			<title>Paper - Stacking prediction for a binary outcome</title>
			<description>A large number of supervised classication models have been proposed in the literature. In order to avoid any bias induced by the use of one single statistical approach, they are combined through a specic "stacking" eta-model. 
To deal with the case of a binary outcome and of categorical predictors, we introduce several improvements to stacking: combining models is done through PLS-DA instead of OLS due to the strong correlation between predictions, and a specic methodology is developed for the case of a small number of observations, using repeated sub-sampling for variables selection.
Five very dierent models (Boosting, Naive Bayes, SVM, Sparse PLS-DA and Expert Scoring) are combined through this improved stacking, and applied in the context of the development of alternative strategies for safety evaluation where multiple in vitro, in silico and physico-chemical parameters are used to classify substances in two classes : "Sensitizer" and "No Sensitizer".
Results show that stacking meta-models have better performances than each of the five models taken separately, and furthermore, stacking provides a better balance between sensitivity and specicity.</description>
			<pubDate>Wed, 22 Aug 2012 09:56:32 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2625</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2625</link>
			<title>Paper - Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis</title>
			<description>Two new methods to select groups of variables have been developed for multiblock data: "Group Sparse Principal Component Analysis" (GSPCA) for continuous variables and "Sparse Multiple Correspondence Analysis" (SMCA) for categorical variables. GSPCA is a compromise between Sparse PCA method of Zou, Hastie and Tibshirani and the method "group Lasso" of Yuan and Lin. PCA is formulated as a regression-type optimization problem and uses the constraints of the group Lasso on regression coecients to produce modied principal
components with sparse loadings. It leads to reduce the number of nonzero coecients, i.e. the number of selected groups. SMCA is a straightforward extension of GSPCA to groups of indicator variables, with the chi-square metric. Two real examples will be used to illustrate each method. The first one is a data set on 25 trace elements measured in three tissues of 48 crabs (25 blocks of 3 variables). The second one is a data set of 502 women aimed at the identication of genes aecting skin aging with more than 370.000 blocks, each block corresponding to SNPs (Single Nucleotide Polymorphisms) coded into 3 categories.</description>
			<pubDate>Wed, 22 Aug 2012 09:41:56 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/labo/membre/view?id=1646</guid>
			<link>http://cedric.cnam.fr/index.php/labo/membre/view?id=1646</link>
			<title>Job - Nouveau   : Elena Di Bernardino</title>
			<description>a</description>
			<pubDate>Wed, 11 Jul 2012 16:04:56 +0200</pubDate>
			<category>MSDMA</category>
			<category>Job</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2602</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2602</link>
			<title>Paper - Probabilités, analyse des données et statistique</title>
			<description>Cette édition est une révision complète, avec des ajouts, des éditions de 1990 et de 2006. Elle comporte de nombreux développements sur des méthodes récentes. Les 21 chapitres sont structurés en cinq parties : outils probabilistes, analyse exploratoire, statistique inférentielle, modèles prédictifs et recueil de données. On y trouve l'essentiel de la théorie des probabilités, les différentes méthodes d'analyse exploratoire des données (analyses factorielles et classification), la statistique « classique » avec l'estimation et les tests mais aussi les méthodes basées sur la simulation, la régression linéaire et logistique ainsi que des techniques non linéaires, la théorie des sondages et la construction de plans d'expériences.</description>
			<pubDate>Fri, 29 Jun 2012 11:25:31 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2601</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2601</link>
			<title>Paper - Robust  Statistics for Classification of Remote Sensing Data
Robust  Statistics for Classification of Remote Sensing Data
Robust  Statistics for Classification of Remote Sensing Data</title>
			<description>.</description>
			<pubDate>Mon, 25 Jun 2012 10:17:15 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
		<item>
			<guid>http://cedric.cnam.fr/index.php/publis/article/view?id=2600</guid>
			<link>http://cedric.cnam.fr/index.php/publis/article/view?id=2600</link>
			<title>Paper - Variable selection in the context of multivariate process monitoring</title>
			<description>.</description>
			<pubDate>Mon, 25 Jun 2012 10:09:00 +0200</pubDate>
			<category>MSDMA</category>
			<category>Paper</category>
		</item>
	</channel>
</rss>