[SS10a] Automatic Categorization of Job Postings

Atelier, Poster ou Démonstration dans une Conférence Internationale : COMPSTAT'2010, 19th International Conference on Computational Statistics, Paris, August 2010, pp.258,

Mots clés: text categorization, Latent Semantic Analysis, Support Vector Machine, job posting

Résumé: Since the beginning of the Nineties, the increasing proportion of job vacancies which are published on the internet has led to a multiplication of on-line job search sites (job boards). Consequently, the need to assess job board performance has become a priority for recruiters. But an important issue is that each job board has a specific nomenclature to describe the type of the post. As part of the modelisation of job posting performance, we need to establish a common classification for the ``function" criterion. To achieve that goal, we are working on a corpus of manually labelled texts of job offers, and we are proposing a method to categorize the texts into a two-level predefined classification of occupations. First, a preprocessing adapted to the particularities of job offer texts is performed (stemming, use of a specific dictionnary,...). Then, we are reducing the dimensionality of the problem thanks to a feature selection method (we can see a comparative study in Yang and Pedersen (1997)). The Vector Space Model is used to represent the texts and the terms are weighted with a function depending on the position of the term in the text (title or mission description). Finally, classification of job postings is performed with SVM (e.g. Joachims (1997)). Popular performance measures such as recall and precision are used and adapted to our context with a weighting for errors according to the seriousness of misclassification. In addition, we are exploring the effects on the classification quality of the Probabilistic Latent Semantic Analysis, another dimensionality reduction method which allows to address the issue of synonymy (Hofmann (1999)).

Equipe: msdma


@inproceedings {
title="{Automatic Categorization of Job Postings}",
author=" J. Séguéla and G. Saporta ",
booktitle="{COMPSTAT'2010, 19th International Conference on Computational Statistics, Paris}",