Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos

Oscar Koller, Necati Cihan Camgöz, Hermann Ney, Richard Bowden: Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2019.

Abstract

In this work we present a new approach to the field of weakly supervised learning in the video domain. Our method is relevant to sequence learning problems which can be split up into sub-problems that occur in parallel. Here, we experiment with sign language data. The approach exploits sequence constraints within each independent stream and combines them by explicitly imposing synchronisation points to make use of parallelism that all sub-problems share. We do this with multi-stream HMMs while adding intermediate synchronisation constraints among the streams. We embed powerful CNN-LSTM models in each HMM stream following the hybrid approach. This allows the discovery of attributes which on their own lack sufficient discriminative power to be identified. We apply the approach to the domain of sign language recognition exploiting the sequential parallelism to learn sign language, mouth shape and hand shape classifiers. We evaluate the classifiers on three publicly available benchmark data sets featuring challenging real-life sign language with over 1000 classes, full sentence based lip-reading and articulated hand shape recognition on a fine-grained hand shape taxonomy featuring over 60 different hand shapes. We clearly outperform the state-of-the-art on all data sets and observe significantly faster convergence using the parallel alignment approach.

BibTeX (Download)

@article{surrey851776,
title = {Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos},
author = {Oscar Koller and Necati Cihan Camgöz and Hermann Ney and Richard Bowden},
url = {http://epubs.surrey.ac.uk/851776/},
doi = {10.1109/TPAMI.2019.2911077},
year  = {2019},
date = {2019-04-01},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
pages = {1--1},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
abstract = {In this work we present a new approach to the field of weakly supervised learning in the video domain. Our method is relevant to sequence learning problems which can be split up into sub-problems that occur in parallel. Here, we experiment with sign language data. The approach exploits sequence constraints within each independent stream and combines them by explicitly imposing synchronisation points to make use of parallelism that all sub-problems share. We do this with multi-stream HMMs while adding intermediate synchronisation constraints among the streams. We embed powerful CNN-LSTM models in each HMM stream following the hybrid approach. This allows the discovery of attributes which on their own lack sufficient discriminative power to be identified. We apply the approach to the domain of sign language recognition exploiting the sequential parallelism to learn sign language, mouth shape and hand shape classifiers. We evaluate the classifiers on three publicly available benchmark data sets featuring challenging real-life sign language with over 1000 classes, full sentence based lip-reading and articulated hand shape recognition on a fine-grained hand shape taxonomy featuring over 60 different hand shapes. We clearly outperform the state-of-the-art on all data sets and observe significantly faster convergence using the parallel alignment approach.},
keywords = {Assistive technology, Continuous sign language recognition, Gesture recognition, Hand shape recognition, Hidden Markov models, Hybrid CNN-LSTM-HMMs, Lip reading, Shape, Speech recognition, Supervised learning, Synchronization, University of Surrey, Weakly supervised learning},
pubstate = {published},
tppubtype = {article}
}