This PhD focuses on the control of generative models for visual scenes and is supervised by Michel Crucianu and Nicolas Auderbert from CNAM and by Hervé Le Borgne from CEA-LIST.
One of the most exciting achievements, often employed to showcase the capabilities of deep learning, is the ability of recent generative models to produce high-resolution photo-realistic images. Work in the area of generative modeling mainly focused on extending the applicability of the approach to other types of content and on improving the quality of the generated data. However, for most applications, the user should have significant control over the generated content. The conditional generation attempts to provide a rather general solution to this problem by allowing the user to give an input to the generation process, e.g. a semantic sketch for ordinary scenes or specific climatic properties for satellite images. Other recent works propose to find meaningful directions in the latent space of generative models along which one can move to control precisely specific continuous properties of the generated image like the position or scale of the object in the image. While these proposals do provide some control over the generation process, they all fall short in giving the user sufficient direct control over a broad set of properties of the generated images. The main goal of this thesis is to define means for a refined control over the generated images, comprising several relatively independent “knobs”, each controlling either a continuous, a discrete or a structured variable describing the image. Depending on the target application control may concern: the presence and nature of individual entities in the scene (e.g. cyclist, truck), their pairwise positional relations (e.g. at the left of, far from), the visual properties of individual entities (e.g. wearing glasses, wearing helmet), visual properties of the scene (e.g. sunny, dark), geometrical properties of the camera or the entities (e.g. camera pose, object orientation), physical parameters (such as weather conditions or climate) having an impact on the content of the scene or on its visual properties.