Progressive Transformers for End-to-End Sign Language Production

Ben Saunders, Necati Cihan Camgöz, Richard Bowden: Progressive Transformers for End-to-End Sign Language Production. In: European Conference on Computer Vision (ECCV), Forthcoming.

Abstract

The goal of automatic Sign Language Production (SLP) is to
translate spoken language to a continuous stream of sign language video at a level comparable to a human translator. If this was achievable, then it would revolutionise Deaf hearing communications. Previous work on predominantly isolated SLP has shown the need for architectures that
are better suited to the continuous domain of full sign sequences. In this paper, we propose Progressive Transformers, the first SLP model to translate from discrete spoken language sentences to continuous 3D sign pose sequences in an end-to-end manner. A novel counter decoding technique is introduced, that enables continuous sequence generation at training and inference. We present two model configurations, an end-to end network that produces sign direct from text and a stacked network that utilises a gloss intermediary. We also provide several data augmentation
processes to overcome the problem of drift and drastically improve the performance of SLP models.
We propose a back translation evaluation mechanism for SLP, presenting benchmark quantitative results on the challenging RWTH-PHOENIXWeather-
2014T (PHOENIX14T) dataset and setting baselines for future
research. Code available at https://github.com/BenSaunders27/
ProgressiveTransformersSLP.

BibTeX (Download)

@inproceedings{surrey858238,
title = {Progressive Transformers for End-to-End Sign Language Production},
author = {Ben Saunders and Necati Cihan Camgöz and Richard Bowden},
url = {http://epubs.surrey.ac.uk/858238/},
year  = {2020},
date = {2020-07-01},
booktitle = {European Conference on Computer Vision (ECCV)},
journal = {2020 European Conference on Computer Vision (ECCV)},
abstract = {The goal of automatic Sign Language Production (SLP) is to 
translate spoken language to a continuous stream of sign language video at a level comparable to a human translator. If this was achievable, then it would revolutionise Deaf hearing communications. Previous work on predominantly isolated SLP has shown the need for architectures that 
are better suited to the continuous domain of full sign sequences. In this paper, we propose Progressive Transformers, the first SLP model to translate from discrete spoken language sentences to continuous 3D sign pose sequences in an end-to-end manner. A novel counter decoding technique is introduced, that enables continuous sequence generation at training and inference. We present two model configurations, an end-to end network that produces sign direct from text and a stacked network that utilises a gloss intermediary. We also provide several data augmentation 
processes to overcome the problem of drift and drastically improve the performance of SLP models. 
We propose a back translation evaluation mechanism for SLP, presenting benchmark quantitative results on the challenging RWTH-PHOENIXWeather- 
2014T (PHOENIX14T) dataset and setting baselines for future 
research. Code available at https://github.com/BenSaunders27/ 
ProgressiveTransformersSLP.},
keywords = {Sign language production, University of Surrey},
pubstate = {forthcoming},
tppubtype = {inproceedings}
}