[MTH18] End-to-End Learning of Latent Deformable Part-based Representations for Object Detection

Revue Internationale avec comité de lecture : Journal International Journal of Computer Vision (IJCV), 2018

Mots clés: Deep Learning, Object Detection, Part-Based Model

Résumé: Object detection methods usually represent objects through rectangular bounding boxes from which they extract features, regardless of their actual shapes. In this paper, we apply deformations to regions in order to learn representations better fitted to objects. We introduce DP-FCN, a deep model implementing this idea by learning to align parts to discriminative elements of objects in a latent way, i.e. without part annotation. This approach has two main assets: it builds invariance to local transformations, thus improving recognition, and brings geometric information to describe objects more finely, leading to a more accurate localization. We further develop both features in a new model named DP-FCN2.0 by explicitly learning interactions between parts. Alignment is done with an in-network joint optimization of all parts based on a CRF with custom potentials, and deformations are influencing localization through a bilinear product. We validate our models on PASCAL VOC and MS COCO datasets and show significant gains. DP-FCN2.0 achieves state-of-the-art results of 83.3 and 81.2% on VOC 2007 and 2012 with VOC data only.



@article {
title="{End-to-End Learning of Latent Deformable Part-based Representations for Object Detection}",
author="T. Mordan and N. Thome and G. Henaff and M. Cord",
journal=" International Journal of Computer Vision (IJCV)",