Show simple item record

dc.contributor.advisorFerrari, Vittorio
dc.contributor.advisorSchmid, Cordelia
dc.contributor.advisorHerrmann, Michael
dc.contributor.authorKalogeiton, Vasiliki
dc.date.accessioned2018-03-26T11:20:11Z
dc.date.available2018-03-26T11:20:11Z
dc.date.issued2018-07-02
dc.identifier.urihttp://hdl.handle.net/1842/28984
dc.description.abstractThe rise of deep learning has facilitated remarkable progress in video understanding. This thesis addresses three important tasks of video understanding: video object detection, joint object and action detection, and spatio-temporal action localization. Object class detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, training an object detector on one domain (either still images or videos) and testing on the other one results in a significant performance gap compared to training and testing on the same domain. In the first part of this thesis, we examine the reasons behind this performance gap. We define and evaluate several domain shift factors: spatial location accuracy, appearance diversity, image quality, aspect distribution, and object size and camera framing. We examine the impact of these factors by comparing the detection performance before and after cancelling them out. The results show that all five factors affect the performance of the detectors and their combined effect explains the performance gap. While most existing approaches for detection in videos focus on objects or human actions separately, in the second part of this thesis we aim at detecting non-human centric actions, i.e., objects performing actions, such as cat eating or dog jumping. We introduce an end-to-end multitask objective that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting object-action pairs in videos, and show that both tasks of object and action detection benefit from this joint learning. In experiments on the A2D dataset [Xu et al., 2015], we obtain state-of-the-art results on segmentation of object-action pairs. In the third part, we are the first to propose an action tubelet detector that leverages the temporal continuity of videos instead of operating at the frame level, as state-of-the-art approaches do. The same way modern detectors rely on anchor boxes, our tubelet detector is based on anchor cuboids by taking as input a sequence of frames and outputing tubelets, i.e., sequences of bounding boxes with associated scores. Our tubelet detector outperforms all state of the art on the UCF-Sports [Rodriguez et al., 2008], J-HMDB [Jhuang et al., 2013a], and UCF-101 [Soomro et al., 2012] action localization datasets especially at high overlap thresholds. The improvement in detection performance is explained by both more accurate scores and more precise localization.en
dc.language.isoenen
dc.publisherThe University of Edinburghen
dc.relation.hasversionVicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari and Cordelia Schmid, Action Tubelet Detector for Spatio-Temporal Action Localization, Published to IEEE International Conference on Computer Vision (ICCV), 2017.en
dc.relation.hasversionVicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari and Cordelia Schmid, Joint learning of object and action detectors, Published to IEEE International Conference on Computer Vision (ICCV), 2017.en
dc.relation.hasversionVicky Kalogeiton, Vittorio Ferrari and Cordelia Schmid, Analysing domain shift factors between videos and images for object detection, Published to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.en
dc.subjectaction localizationen
dc.subjectaction recognitionen
dc.subjectobject detectionen
dc.subjectvideo analysisen
dc.subjectcomputer visionen
dc.subjectdeep learningen
dc.subjectmachine learningen
dc.titleLocalizing spatially and temporally objects and actions in videosen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record