While intelligence analysts are trying to prise open low-resource languages from the outside, native speakers of those languages are also taking matters into their own hands. They, too, want access to urgent information in other languages – not for espionage, but to improve their everyday lives.
“When this Covid-19 pandemic happened, there was a sudden need to translate basic health tips into many languages. And we couldn’t do this with machine translation models, because of the quality,” says David Ifeoluwa Adelani, a doctoral student in computer science at Saarland University in Saarbrücken, Germany. “I think this has really taught us that it’s important that we have technology that works for low-resource languages, especially in time of need.”
Adelani is originally from Nigeria and a native Yorùbá speaker, and has been building a Yorùbá-English database as part of a non-profit project called Cracking the Language Barrier for a Multilingual Africa. He and his team created a new dataset by gathering translated movie scripts, news, literature and public talks. They then used this dataset to fine-tune a model already trained on religious texts, such as Jehovah’s Witnesses publications, improving its performance. Similar efforts are underway for other African languages like Ewe, Fongbe, Twi and Luganda, helped by grassroots communities such as Masakhane, a network of researchers from all over Africa.
One day, all of us may be using multilingual search engines in our everyday lives, unlocking the world’s knowledge at the click of a button. Until then, the best way to really understand a low-resource language is probably to learn it – and join the multilingual, online human chatter that trains the world’s translation robots.
If you liked this story, sign up for the weekly bbc.com features newsletter, called “The Essential List”. A handpicked selection of stories from BBC Future, Culture, Worklife, and Travel, delivered to your inbox every Friday.