These are the test sets for the WMT shared translation task. They are small parallel data sets used for testing MT systems, and are typically created by translating a selection of crawled articles from online news sites. WMT17 test sets are at http://data.statmt.org/wmt17/translation-task/test.tgz
Cracker has contributed to the German-English and Czech-English test sets from 2015 to 2018 , as well as a different guest language in each of these years. The guest language pairs for 2017 were Latvian-English (2017). We also included Russian, Turkish, Chinese, Estonian and Kazakh with funding from other sources, as well as Finnish in 2017. The source data are crawled from online news sites and carry the respective licensing conditions.