Human Segmentation, Pose Estimation and Applications

October 13, 2017 at 11:00 am by

Place: CVC Sala d’actes


Dr. Manuel Jesus Marin-Jimenez – Departament de Informática y Análisis Numérico Universidad de Córdoba

Dr. Laura Igual Muñoz – Departament Matemàtiques i Informàtica – Universitat de Barcelona

Dr. David Masip Rodo – Departament Informàtica Multimèdia i Telecomunicacions – Universitat Oberta de Catalunya


Thesis Supervisor
Dr. Sergio Escalera – Department of Mathematics and Informatics – UB

Dr. Jordi Gonzalez –  Dept. Ciències de la computació & Centre de Visió per Computador – UAB




Automatic analyzing humans in photographs or videos has great potential applications in computer vision, including medical diagnosis, sports, entertainment, movie editing and surveillance, just to name a few. Body, face and hand are the most studied components of humans. Body has many variabilities in shape and clothing along with high degrees of freedom in pose. Face has many muscles causing many visible deformity, beside variable shape and hair style. Hand is a small object, moving fast and has high degrees of freedom.

Adding human characteristics to all aforementioned variabilities makes human analysis quite a challenging task. In this thesis, we developed human segmentation in different modalities. In a first scenario, we segmented human body and hand in depth images using example-based shape warping. We developed a shape descriptor based on shape context and class probabilities of shape regions to extract nearest neighbors. We then considered rigid affine alignment vs. nonrigid iterative shape warping. In a second scenario, we segmented face in RGB images using convolutional neural networks (CNN). We modeled conditional random field with recurrent neural networks. In our model pair-wise kernels
are not fixed and learned during training. We trained the network end-to-end using adversarial networks which improved hair segmentation by a high margin.

We also worked on 3D hand pose estimation in depth images. In a generative approach, we fitted a finger model separately for each finger based on our example-based rigid hand segmentation. We minimized an energy function based on overlapping area, depth discrepancy and finger collisions. We also applied linear models in joint trajectory space to refine occluded joints based on visible joints error and invisible joints trajectory smoothness. In a CNN-based approach, we developed a tree-structure network to train specific features for each finger and fused them for global pose consistency. We also formulated physical and appearance constraints as loss functions.

Finally, we developed a number of applications consisting of human soft biometrics measurement and garment retexturing. We also generated some datasets in this thesis consisting of human segmentation, synthetic hand pose, garment retexturing and Italian gestures.


Watch the video presentation