Weakly-Supervised 3D Pose Estimation from a Single Image using Multi-View Consistency

Guillaume Rochette, Chris Russell, Richard Bowden: Weakly-Supervised 3D Pose Estimation from a Single Image using Multi-View Consistency. In: 30th British Machine Vision Conference (BMVC 2019), BMVC, 2019.

Abstract

We present a novel data-driven regularizer for weakly-supervised learning of 3D
human pose estimation that eliminates the drift problem that affects existing approaches.
We do this by moving the stereo reconstruction problem into the loss of the network
itself. This avoids the need to reconstruct 3D data prior to training and unlike previous
semi-supervised approaches, avoids the need for a warm-up period of supervised training.
The conceptual and implementational simplicity of our approach is fundamental to its
appeal. Not only is it straightforward to augment many weakly-supervised approaches
with our additional re-projection based loss, but it is obvious how it shapes reconstructions
and prevents drift. As such we believe it will be a valuable tool for any researcher working
in weakly-supervised 3D reconstruction. Evaluating on Panoptic, the largest multi-camera
and markerless dataset available, we obtain an accuracy that is essentially indistinguishable
from a strongly-supervised approach making full use of 3D groundtruth in training.

BibTeX (Download)

@inproceedings{surrey852639,
title = {Weakly-Supervised 3D Pose Estimation from a Single Image using Multi-View Consistency},
author = {Guillaume Rochette and Chris Russell and Richard Bowden},
url = {http://epubs.surrey.ac.uk/852639/},
year  = {2019},
date = {2019-01-01},
booktitle = {30th British Machine Vision Conference (BMVC 2019)},
journal = {Proceedings of the 30th British Machine Vision Conference (BMVC 2019)},
publisher = {BMVC},
abstract = {We present a novel data-driven regularizer for weakly-supervised learning of 3D 
human pose estimation that eliminates the drift problem that affects existing approaches. 
We do this by moving the stereo reconstruction problem into the loss of the network 
itself. This avoids the need to reconstruct 3D data prior to training and unlike previous 
semi-supervised approaches, avoids the need for a warm-up period of supervised training. 
The conceptual and implementational simplicity of our approach is fundamental to its 
appeal. Not only is it straightforward to augment many weakly-supervised approaches 
with our additional re-projection based loss, but it is obvious how it shapes reconstructions 
and prevents drift. As such we believe it will be a valuable tool for any researcher working 
in weakly-supervised 3D reconstruction. Evaluating on Panoptic, the largest multi-camera 
and markerless dataset available, we obtain an accuracy that is essentially indistinguishable 
from a strongly-supervised approach making full use of 3D groundtruth in training.},
keywords = {University of Surrey},
pubstate = {published},
tppubtype = {inproceedings}
}