Which language combinations yield the best results in Google Translate?

  • Published: 5 years ago on 29 November 2017


  1. Aljoscha Burchardt says:

    First, it is very difficult to assess translation quality, even for humans, let alone with automatic tools. I have been working on this topic for a long time, but this would be a separate discussion.

    The different engines are algorithmically by and large identical, yet trained on different data, i.e. bilingual (parallel) corpora in the two respective languages. So, data availability (including coverage of domains, etc.) has a high impact on the quality we can expect. Another factor is data sparseness. For example, in languages with a rich morphology, some forms of words will rarely appear so that it is difficult for systems to learn the generalisations. Having said that, translation from Finnish to Hungarian will most probably be worse than from Spanish to English. As with translation memories, results will of course be better if the material to be translated is more similar to the training data.

    Apart from these rules of thumb, it is not possible to present a list of “quality levels” per language pair. Results from several scientific comparisons can be found here (mostly news texts, automatic evaluation): http://matrix.statmt.org.