Computational Linguistics in the Netherlands (CLIN) is an annual conference about Computational Linguistics organised in Flanders (Belgium) and The Netherlands.
CLIN is a splendid occasion for the whole Dutch and Belgian computational linguistics community to get together and present their current work, even if in an ongoing state. International researchers often take part in the conference as well. Submissions usually cover a diverse range of computational linguistics topics resulting in a stimulating event where researchers exchange early research results, compare approaches and get a good sense of the direction of computational linguistics in the Netherlands and Belgium as a whole.
Joachim Van den Bagaert from CrossLang will be presenting the following paper:
Factored and hierarchical models for Statistical Machine Translation have been around for some time, but not many results with respect to Dutch have been reported yet. An important cause is probably the fact that simple phrase-based models require less effort to train while performing on par with more involved approaches.
In this presentation, we discuss a couple of experiments we conducted using some of Moses SMT’s more advanced features (multiple translation and generation steps, and multiple decoding paths), while leveraging monolingual data for inflexion prediction using lemmata and part-of-speech information.
In a follow-up discussion, we focus on the asymmetric nature of Machine Translation from English into Dutch and Dutch into English, and present some strategies we discovered for data sparsity reduction using the factored model framework. Reordering will be discussed by comparing hierarchical models and pre-ordering strategies.
We conclude the presentation with an overview of how different strategies can be combined and to which extent such strategies remain effective when larger data sets are used.
More information can be found here.