Machine translation training and customization is on the rise: MT specialist and MT director positions appear across a wide range of localization teams. Despite the growing importance of MT trainers as a profession, little unbiased and neutral information is available on trained MT in localization. Developers of machine translation produce an optimistic view on MT performance. Language services providers often downplay the effect to hold off client requests for discounts (or exaggerate when negotiating discounts with translators). Enterprise trainers lack the voice and the ability to disseminate information that companies have.
Custom.MT is one of the few companies on the market that specializes in machine translation training and at the same time is completely independent from a translation company. Founded at the end of 2020 by language industry researcher Konstantin Dranch, the company has completed training and evaluation of its 150th trained MT engine and aggregated several vital statistics in a report.
Select findings from the report:
- 13% was the average edit distance in words changed by human expert translators on the output of trained engines
- The best-ever score was 1.6% words corrected for Brazilian Portuguese for software manuals, one word in 50 changed by linguists
- 33% was the average improvement to BLEU score after training, meaning a significant increase in localization productivity
- Half of engine training operations did not result in BLEU gain, meaning that multiple brands of MT must be trained for each language to be sure to find a high performer
Custom.MT trains best-of-breed machine translation engines.
A project begins by preparing a single dataset by performing a technical cleanup on the client’s translation memories in TMX format, cutting away 42%of the lines on average. Then a set of 5 — 10 candidate engines is trained, typically involving Google AutoML, Microsoft Custom, Amazon ACT, ModernMT, Yandex Translate, Globalese, PangeaMT, and other brands on demand. For European companies with security requirements, only EU-based MT is trained and evaluated. To evaluate, Custom.MT runs a metric script against a reference translation using BLEU, hLEPOR and BERTScore first. In the next round, three human experts in the domain of translation complete a blind test by editing the output to calculate edit distance. Custom.MT then configures the best-performing engines in the translation management systems.
Custom.MT is a specialist service that helps localization directors and LSPs train, customize, implement and maintain machine learning models. Based in Prague, Custom.MT is a team of consultants, engineers, project producers, lexicographers and data scientists. The company develops internal automation systems to make data preparation and model training faster and more automated.
Silvia Schiavoni, Marketing Manager
Konstantin Dranch, CEO