As machine translation research output soars again before the (virtual) academic conference season, Facebook is introducing M2M-100, a multilingual neural machine translation (NMT) model designed to avoid English as the intermediary (or pivot) language between source and target languages. These so called massive multilingual models are important in machine translation because of the promise efficiency gains and the option to use transfer learning.
For Facebook, the release of the model was important enough to proactively inform media outlets, including Slator, of its release; an unusual move for the company.
Facebook has already open-sourced the model, training, and evaluation setup. An October 19, 2020 Facebook AI blog post described the new model as the “culmination of years of Facebook AI’s foundational work in machine translation.”
Facebook’s LASER, a library for calculating multilingual sentence embeddings, has been available on GitHub since 2018. In a November 2019 paper, Facebook researchers detailed how they mined LASER for billions of parallel sentences, a significant portion of which were not aligned with English; and then open-sourced the resulting mined data, referred to as CCMatrix.
According to an October 19, 2020 comment on GitHub by lead author Holger Schwenk, the model now contains 10.8 billion sentences for about 80 languages, and the team plans to “release scripts to reproduce the data this week.”
Trained on 2,200 language directions and touted as being able to translate “between any pair of 100 languages without relying on English data,” M2M-100 is notable for its scale. But it also reports better BLEU scores than English-centric multilingual models.
Facebook AI researcher Angela Fan told Slator that Facebook also conducted human evaluation across 21 non-English directions, where the judges were asked to rate accuracy and fluency, and provide written feedback about translation issues.
NMT models of this size can show translation quality degradation for high-resource languages, as observed in a July 2019 paper on Google’s “massively multilingual” NMT model. Fan said M2M-100 counteracts that risk via two methods: dense scaling (increasing the model capacity by adding more neural network layers and widening each layer) and sparse scaling (allocating language-specific parameters so that dedicated capacity is provided for languages).
“We want to conduct more experiments with gendered words, generation fluency, and semantic accuracy before using this system at Facebook scale” — Angela Fan, Facebook AI Researcher
M2M-100 has yet to make its debut on Facebook, and a July 2020 MT snafu highlighted why improving automated translations across low-resource languages has been and remains a priority for the social network.
“We’re currently building more specialized computation architectures that are necessary to bring this research to production,” Fan said. “We’re also making the system more robust and fair before we deploy it. We want to conduct more experiments with gendered words, generation fluency, and semantic accuracy before using this system at Facebook scale.”
For more on the current state-of-the-art in machine translation, watch this September 2020 SlatorPod episode with Modelfront CEO Adam Bittlingmayer.