Scene Understanding in Videos: Segmentation, Tracking and Pose Estimation

Scene Understanding in Videos: Segmentation, Tracking and Pose Estimation

Place: Large Lecture Room

Affiliation: Inria Grenoble - Rhône-Alpes - LEAR Project, France.   This talk will focus on some of our work in the context of the scene understanding problem, in particular, in the use of temporal information.

Video provides not only rich visual cues such as motion and appearance, but also much less explored long-range temporal interactions among objects. The first part of the talk will present a method to capture such interactions and to construct a powerful intermediate-level video representation. We also use them for tracking objects, and develop a tracking-by-detection approach that exploits occlusion and motion reasoning. This reasoning is based on long-term trajectories, which are labelled as object or background tracks with an energy-based formulation. In the second part of the talk we show the use of temporal constraints for estimating articulated human poses, which is cast as an optimization problem. We present a new approximate scheme to solve it, with two steps dedicated to pose estimation. First, our approach takes into account temporal links with subsequent frames for the less-certain parts, namely elbows and wrists. Second, our method decomposes poses into limbs, generates limb sequences across time, and recomposes poses by mixing these body part sequences.