The Age Of Machine Translation – CIOReview

Neil Currie, Head of Data Management, Crowe

Neil Currie, Head of Data Management, Crowe

With the entrance of Covid on to the world stage, employees and companies solved a problem that many had been working on for over a decade, the ability to make remote working a reality. Overnight, millions of people were told to work from home, and infrastructure transitioned to support virtual desktops, and meetings via Video conference became normal. Companies like Zoom saw their value increase in multiples, and the beginning of a new operational model appeared. Of lesser note was the increased adoption of some of the other tools necessary to support a new globalised workforce where many of the traditional boundaries had been removed.

These included translation engines, many of which had been operating on a smaller scale in the background but were now becoming part of everyday business life. As we know, adoption drives development, and we are now starting to see the much-needed greater sophistication necessary to support machine based translation. This is a realm where nuances, context, and ontologies matter. To provide a simple example, if you saw the acronym “HA” in the text, context can become very important, for a family doctor, it might refer to headache, a cardiologist heart attack, and endocrinologist hepatitis A. If you saw the word “body” in the text, for the medical fraternity, it might refer to a person, to the corporate world, a corporate entity, and to an association its membership. The translation of the word is impacted by the surrounding text and the document origination. Translation done manually will see multiple nuances based on the knowledge of the person, their experience, and interpretation of what is meant limited by their level of understanding. In the world of machine-based translation, we see a world without these nuances where the translation is done based on dictionaries tailored to different contexts and environments. Embedded images remaining in the original format and non-machine readable documents fail translation. None of these issues is a surprise to the people who use these technologies to support their products, but to the information consumer, they remain a source of frustration where the recipient is expecting to see a perfect translation fit for their use. Fed by the “On-Demand” requirements of how our interactions have evolved, the need to refine the capabilities for this secondary level of background technologies is growing and in the same way, we have seen an industry sector develop to provide training datasets to support machine learning it is likely we will see one for translation dictionaries to support this growing need.

We see today ready adoption of the current flawed model in places where accuracy is less important, for example, on-demand translation on websites, instruction books. In some roles, it is unclear when adoption will become mainstream, these include the law profession, general medical profession, and those where a greater level of accuracy is important. For those areas like academic papers, scientific journals, and those items requiring peer review and opinion, it is likely they will remain manually translated because of the requirement for nuances, interpretation, and environmental distinctions to support the hypothesis proposed and review by peers.

Today Alexa, Google, and Siri’s voice technology is already a part of our life, and the AI-supported technologies that allow these to operate in multiple languages with tonal nuances of people from different areas can hopefully be used to support the contextual interpretation, nuance etc. required to mature document-based translation as these technologies continue to develop.