| |||||||||||||||
WMT 2015 : TENTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION | |||||||||||||||
Link: http://www.statmt.org/wmt15/ | |||||||||||||||
| |||||||||||||||
Call For Papers | |||||||||||||||
IMPORTANT DATES
Release of training data for translation task Early January, 2015 Release of training data for automatic post-editing task January 31, 2015 Release of MT system for tuning task February 9, 2015 Release of training data for quality estimation task February 15, 2015 Registration for complimentary manual evaluation (tuning task) February 22, 2015 Submission deadline for tuning task April 20, 2015 Test set distributed for translation task April 20, 2015 Submission deadline for translation task April 27, 2015 Test set distributed automatic post-editing task April 27, 2015 System outputs distributed for metrics task May 4, 2015 Test sets distributed for quality estimation task May 4, 2015 Submission deadline for automatic post-editing task May 15, 2015 Submission deadline for metrics task May 25, 2015 Submission deadline for quality estimation task May 25, 2015 Start of manual evaluation period May 4, 2015 End of manual evaluation June 1, 2015 Paper submission deadline June 28, 2015 Notification of acceptance July 21, 2015 Camera-ready deadline August 11, 2015 OVERVIEW This year's workshop will feature five shared tasks: a translation task, a (pilot) automatic post-editing task (NEW), a tuning task (optimize a given MT system, NEW, a follow-up of the WMT11 tunable metrics task), a quality estimation task (assess MT quality without access to any reference), a metrics task (assess MT quality given reference translation). In addition to the shared tasks, the workshop will also feature scientific papers on topics related to MT. Topics of interest include, but are not limited to: word-based, phrase-based, syntax-based, semantics-based SMT using comparable corpora for SMT incorporating linguistic information into SMT decoding system combination error analysis manual and automatic method for evaluating MT scaling MT to very large data sets We encourage authors to evaluate their approaches to the above topics using the common data sets created for the shared tasks. TRANSLATION TASK The first shared task which will examine translation between the following language pairs: English-German and German-English English-French and French-English English-Finnish and Finnish-English NEW English-Czech and Czech-English English-Russian and Russian-English The text for all the test sets will be drawn from news articles except for (NEW) the French-English set, which will be drawn from user-generated comments on the news articles. Participants may submit translations for any or all of the language directions. In addition to the common test sets the workshop organizers will provide optional training resources, including a newly expanded release of the Europarl corpora and out-of-domain corpora. All participants who submit entries will have their translations evaluated. We will evaluate translation performance by human judgment. To facilitate the human evaluation we will require participants in the shared tasks to manually judge some of the submitted translations. For each team, this will amount to ranking 300 sets of 5 translations, per language pair submitted. We also provide baseline machine translation systems, with performance comparable to the best systems from last year's shared task. AUTOMATIC POST-EDITING TASK This shared task will examine automatic methods for correcting errors produced by machine translation (MT) systems. Automatic Post-editing (APE) aims at improving MT output in black box scenarios, in which the MT system is used "as is" and cannot be modified. From the application point of view APE components would make it possible to: Cope with systematic errors of an MT system whose decoding process is not accessible Provide professional translators with improved MT output quality to reduce (human) post-editing effort In this first edition of the task, the evaluation will focus on one language pair (English-Spanish), measuring systems' capability to reduce the distance (HTER) that separates an automatic translation from its human-revised version approved for publication. Training and test data are provided by Unbabel. QUALITY ESTIMATION TASK Quality estimation systems aim at producing an estimate on the quality of a given translation at system run-time, without access to a reference translation. This topic is particularly relevant from a user perspective. Among other applications, it can (i) help decide whether a given translation is good enough for publishing as is; (ii) filter out sentences that are not good enough for post-editing; (iii) select the best translation among options from multiple MT and/or translation memory systems; (iv) inform readers of the target language of whether or not they can rely on a translation; and (v) spot parts (words or phrases) of a translation that are potentially incorrect. Research on this topic has been showing promising results in the last couple of years. Building on the last three years' experience, the Quality-Estimation track of the WMT15 workshop and shared-task will focus on English, Spanish and German as languages and provide new training and test sets, along with evaluation metrics and baseline systems for variants of the task at three different levels of prediction: word, sentence, and document. METRICS TASK The metrics task (also called evaluation task) will assess automatic evaluation metrics' ability to: Rank systems on their overall performance on the test set Rank systems on a sentence by sentence level Participants in the shared evaluation task will use their automatic evaluation metrics to score the output from the translation task and the tunable metrics task. In addition to MT outputs from the other two tasks, the participants will be provided with reference translations. We will measure the correlation of automatic evaluation metrics with the human judgments. TUNING TASK In the tuning task is a follow up of WMT11 invitation-only tunable metrics task. The task will assess your team's ability to optimize the parameters of a given hierarchical MT system (Moses). Participants in the tuning task will be given complete Moses models for English-to-Czech and Czech-to-English translation and the standard developments sets from the translation task. The participants are expected to submit the moses.ini for one or both of the translation directions. We will use the configuration and a fixed revision of Moses to translate official WMT15 test set. The outputs of the various configurations of the system will be scored using the standard manual evaluation procedure. PAPER SUBMISSION INFORMATION Submissions will consist of regular full papers of 6-10 pages, plus additional pages for references, formatted following the EMNLP 2015 guidelines. In addition, shared task participants will be invited to submit short papers (4-6 pages) describing their systems or their evaluation metrics. Both submission and review processes will be handled electronically. Note that regular papers must be anonymized, while system descriptions do not need to be. We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this workshop and past workshops, so that their experiments can be repeated by others using these publicly available corpora. POSTER FORMAT The posters will be attached to self standing posterboards measuring 3 ft high and 4 ft wide and sitting on top of tables so there will be laptop/handout space as well. We will provide pushpins, double-sided tape, that putty-like substance, and clips to affix the posters to the posterboards. |
|