Rechercher

[Cha10] Une approche générique pour l'analyse croisant contenu et usage des sites Web par des méthodes de bipartitionnement

Mémoire de Thèse : Soutenue le: 22 March 2010, pp. 185, pp.: Directeur: Gilbert Saporta
Rapporteur 1: Gerard Govaert
Rapporteur 2: Sadok Ben Yahia
Membre du jury: Elisabeth Métais
Membre du jury: Mohamed Nadif
Membre du jury: Yves Lechevallier, : Une approche générique pour l'analyse croisant contenu et usage des sites Web par des méthodes de bipartitionnement,

Auteurs: M. Charrad

Mots clés: Web Mining, Web Usage Mining, Data Mining, Biclustering, Machine Learning, Clustering, Natural Language Processing, Text Mining

Résumé: Today, a major source of data is the Web. This source is in constant expansion due to the exponential growth of online documents number on the one hand and the increasing number of users on the other hand. Therefore, website operators are incited to analyze users' behavior on their websites to better meet users' expectations. These considerations have prompted major efforts in the analysis of Internet users clickstreams on Websites. Other efforts have been focused on analyzing the content of Web pages. However, few works have made the connection between the content and usage analysis of a web site. Given the close connection between the content and usage, we propose a new approach WCUM (Web Content and Usage Mining based approach) for linking content analysis to usage analysis of a website to discover usage patterns of a website. our work is mainly organized around two main axes of Web Mining, namely the Web Content Mining and Web Usage Mining and is based on the use of the block clustering algorithm CROKI2 implemented with two different strategies that we compared through experiments on artificially generated data. However, to overcome the problem of determining the number of clusters on the rows and columns, we propose to generalize the use of some indices originally proposed to evaluate the partitions obtained by clustering algorithms to evaluate bipartitions obtained by simultaneous clustering algorithms. To evaluate the performance of these indices on data with biclusters structure, we proposed an algorithm for generating artificial data to perform simulations and validate the results. Results of experiments on artificial data and real data have been reported in published papers.

Commentaires: Defended on Monday, March 22, 2010 in CNAM

Equipe: msdma

BibTeX

@phdthesis {
Cha10,
title="{Une approche générique pour l'analyse croisant contenu et usage des sites Web par des méthodes de bipartitionnement}",
author="M. Charrad",
year=2010,
pages="185",
address="{CEDRIC Laboratory, Paris, France}",
note="{Defended on Monday, March 22, 2010 in CNAM
Directeur: Gilbert Saporta
Rapporteur 1: Gerard Govaert
Rapporteur 2: Sadok Ben Yahia
Membre du jury: Elisabeth Métais
Membre du jury: Mohamed Nadif
Membre du jury: Yves Lechevallier}",
}