Facebook makes Flores-101 dataset open source for more accurate AI translation – Illinoisnewstoday.com

Facebook Inc. is now open sourcing a dataset called Flores-101 that can be used to develop artificial intelligence models that translate text between different languages.

Building an AI model involves training a neural network with a large amount of information until it learns to identify useful patterns. The developer then lets the AI ​​process the test database to see if it produces results that are accurate enough for production use. Flores-101 is a test dataset for evaluating translation models that contain sentences translated into 101 languages.

Facebook researchers working on Flores-101 say they are addressing a large gap in the AI ​​ecosystem. Measuring the accuracy of AI is an important part of a machine learning project. This is because without the ability to reliably evaluate processing results, developers cannot determine whether fine-tuning the model has improved performance.

However, the test datasets commonly used to perform evaluations cover only a limited number of widely spoken languages, such as English and Spanish. As a result, developers building AI software for translating between other languages ​​often face challenges when assessing the accuracy of their models.

In a blog post, the Flores-101 team explains, “Imagine you’re trying to bake a cake, but you can’t taste it.” “It’s almost impossible to know if it’s good, and it’s even harder to know how to improve the recipe for future attempts.”

Flores-101 consists of text blocks extracted from news articles, travel guides and other sources translated into 101 languages. More than 80% of these languages ​​have previously had a limited number of AI training datasets available, or none at all, according to Facebook researchers.

In recent years, computer scientists have sought to make AI translation models more accurate by configuring AI translation models to analyze words and sentences in the context of surrounding text. According to Facebook, Flores-101 can support projects that take this approach. “FLORES is built to translate multiple adjacent sentences from a selected document, which means a model that can measure whether document-level context improves the quality of translations,” he said. The company’s researchers write.

In addition, social networks contained metadata clues as well as text, such as tags describing the topic of each text block. Such information helps machine learning more easily guess the meaning of sentences and improves the quality of translations.

Facebook has assembled the text that makes up Flores-101 in a multi-step process. First, the company asked a team of professional translators to translate each text into a supported language. The editor then checked each document for errors and then handed it over to another team of translators to complete the dataset.

“It’s hard to build a good benchmark,” says Facebook researchers. “We need to accurately reflect the meaningful differences between the models and make them available to researchers to make decisions. Benchmarking translations can be particularly difficult. Some that are readily available to translators. This is because all languages, not just those in the same language, must meet the same quality standards. “

“Efforts like FLORES are very valuable because FLORES not only focuses on poorly serviced languages, but also immediately invites and actively promotes research in all these languages.” Said Antonios Anastasopoulos, assistant professor of computer science at George Mason University.

Facebook has begun collaborating with Microsoft Corp. and Machine Translation workshops to facilitate the development of AI translation models that support languages ​​with limited training datasets currently available. As part of this initiative, Facebook is sponsoring a grant that enables researchers to use Microsoft Corp.’s Azure Cloud Platform graphics processing unit for their projects. Social networks say the grant offers “thousands of GPU hours” for free.

Photo: Eston Bond / Flicker

Join the Cube Club and Expert Cube Event community to show your support for our mission. Joined the community, including Amazon Web Services, and soon joined many celebrities and experts, including Amazon.com CEO Andy Jassy, ​​Dell Technologies founder and CEO Michael Dell, and Intel CEO Pat Gelsinger. Please give me.

Source link Facebook makes Flores-101 dataset open source for more accurate AI translation