Neural Sign Language Translation

Necati Cihan Camgöz, Simon Hadfield, Oscar Koller, Hermann Ney, Richard Bowden: Neural Sign Language Translation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7784–7793, IEEE, 2018.

Abstract

Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most research to date has considered SLR as a naive gesture recognition problem. SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language. In contrast, we introduce the Sign Language Translation (SLT) problem. Here, the objective is to generate spoken language translations from sign language videos, taking into account the different word orders and grammar.
We formalize SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge). This allows us to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language.
To evaluate the performance of Neural SLT, we collected the first publicly available Continuous SLT dataset, RWTHPHOENIX-Weather 2014T1. It provides spoken language translations and gloss level annotations for German Sign Language videos of weather broadcasts. Our dataset contains over .95M frames with ensuremath>67K signs from a sign vocabulary of ensuremath>1K and ensuremath>99K words from a German vocabulary of ensuremath>2.8K. We report quantitative and qualitative results for various SLT setups to underpin future research in this newly established field. The upper bound for translation performance is calculated at 19.26 BLEU-4, while our end-to-end frame-level and gloss-level tokenization networks were able to achieve 9.58 and 18.13 respectively.

BibTeX (Download)

@inproceedings{surrey846335,
title = {Neural Sign Language Translation},
author = {Necati Cihan Camgöz and Simon Hadfield and Oscar Koller and Hermann Ney and Richard Bowden},
url = {http://epubs.surrey.ac.uk/846335/},
doi = {10.1109/CVPR.2018.00812},
year  = {2018},
date = {2018-01-01},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018},
journal = {Proceedings CVPR 2018},
pages = {7784--7793},
publisher = {IEEE},
abstract = {Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most  research to date has considered SLR as a naive gesture recognition problem. SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language. In contrast, we introduce the Sign Language Translation (SLT) problem. Here, the objective is to generate spoken language translations from sign language videos, taking into account the different word orders and grammar. 
We formalize SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge). This allows us to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language. 
To evaluate the performance of Neural SLT, we collected the first publicly available Continuous SLT dataset, RWTHPHOENIX-Weather 2014T1. It provides spoken language translations and gloss level annotations for German Sign Language videos of weather broadcasts. Our dataset contains over .95M frames with ensuremath>67K signs from a sign vocabulary of ensuremath>1K and ensuremath>99K words from a German vocabulary of ensuremath>2.8K. We report quantitative and qualitative results for various SLT setups to underpin future research in this newly established field. The upper bound for translation performance is calculated at 19.26 BLEU-4, while our end-to-end frame-level and gloss-level tokenization networks were able to achieve 9.58 and 18.13 respectively.},
keywords = {University of Surrey},
pubstate = {published},
tppubtype = {inproceedings}
}