Amazon recently announced the arrival of a new Live Translation feature for Alex, permitting people to speak in different languages to converse through Alexa. The internal AI becomes your interpreter, translating each side of the conversation.
With the arrival of the new feature, a customer will be able to ask Alexa to initiate sessions with translation using a pair of languages. During the conversation, Alexa can automatically identify the language being spoken, and translate in real-time. The launch starts with 6 pairs of languages, including Hindi, Brazilian Portuguese, Italian, German, French, Spanish, and English.
Live Translation takes advantage of various existing tools in Amazon, including the ASR system for Alexa, Alexa Text-to-Speech, and Amazon Translate.
Discovering Alexa Live Translation
During a session of Live Translation, Alexa uses two ASR modes along with another model for language identification. The technology can automatically deal with two ASR models at once, and based on the ID model of the language, only one output reaches the translation engine, reducing the risk of latency for the translation request.
During production, Amazon discovered that the model for language IDs works best when based on both acoustic speech information and the output of two ASR models. The ASR can help speakers of a non-native language who have consistent acoustic properties to their speech.
Once the language ID system has selected an output, it’s processed through Amazon Translate and passed to Alexa for playback via text-to-speech. Like most ASR systems, the ones used for live translation by Amazon include both a language and acoustic model. The language model encodes probabilities for specific strings of words, while the acoustic model converts audio into phonemes.
Delivering a Detailed Outcome
Each of the ASR systems in Amazon’s solution also come with two language models, one traditional option for encoded probabilities of short word strings, and one neural model. The Neural model can handle longer strings. These models have been trained to handle a wide range of conversational speech topics.
Amazon also modified the end-pointer on Alexa, which determines when customers are done talking. This usually distinguishes between the end of a sentence and a pause. However, for Live Translation, the pointer can now tolerate longer pauses from people in longer conversations.
Amazon Translate’s neural machine translation system was previously designed to work with text input, and the Live translation system adjusts for disfluencies as a result. In the future, Amazon will continue to explore new ways of improving the Live Translation experience even further, working with things like semi-supervised learning models.
Amazon has further explained that to improve the robustness and fluency of the translation, it’s working on adapting the neural translation engine to manage more conversational speech data, as well as generate translations with better context. This might include tone of voice and formal or informal translations.