Image context for object detection, object context for part detection
MetadataShow full item record
Objects and parts are crucial elements for achieving automatic image understanding. The goal of the object detection task is to recognize and localize all the objects in an image. Similarly, semantic part detection attempts to recognize and localize the object parts. This thesis proposes four contributions. The first two make object detection more efficient by using active search strategies guided by image context. The last two involve parts. One of them explores the emergence of parts in neural networks trained for object detection, whereas the other improves on part detection by adding object context. First, we present an active search strategy for efficient object class detection. Modern object detectors evaluate a large set of windows using a window classifier. Instead, our search sequentially chooses what window to evaluate next based on all the information gathered before. This results in a significant reduction on the number of necessary window evaluations to detect the objects in the image. We guide our search strategy using image context and the score of the classifier. In our second contribution, we extend this active search to jointly detect pairs of object classes that appear close in the image, exploiting the valuable information that one class can provide about the location of the other. This leads to an even further reduction on the number of necessary evaluations for the smaller, more challenging classes. In the third contribution of this thesis, we study whether semantic parts emerge in Convolutional Neural Networks trained for different visual recognition tasks, especially object detection. We perform two quantitative analyses that provide a deeper understanding of their internal representation by investigating the responses of the network filters. Moreover, we explore several connections between discriminative power and semantics, which provides further insights on the role of semantic parts in the network. Finally, the last contribution is a part detection approach that exploits object context. We complement part appearance with the object appearance, its class, and the expected relative location of the parts inside it. We significantly outperform approaches that use part appearance alone in this challenging task.