Galaxy S24 Ultra

The Learning Curve, Part 2: How to Build an AI for Diverse Dialects

May 16, 2024

facebook twitter

Tales from the Middle East on the complexity of creating AI tools for Arabic, a language with many facets

Galaxy AI now supports 16 languages, helping more people to lower language barriers with real-time and on-device translation. Samsung opened the door to a new era of mobile AI, so we are visiting Samsung Research centers all over the world to learn how Galaxy AI came to life and what it took to overcome the challenges of AI development. While part one of the series examines the task of determining what data is needed, this instalment looks at the complex task of accounting for dialects.

Teaching a language to an AI model is a complex process, but what if it isn’t a singular language, but a collection of diverse dialects? That was the challenge faced by the team at Samsung R&D Institute Jordan (SRJO). While “Arabic” was added as a language option for Galaxy AI features such as Live Translate, the team had to cater to the various Arabic dialects that span the Middle East and North Africa, with each varying in pronunciation, vocabulary, and grammar.

Arabic is one of the top six most widely spoken languages around the world, used daily by more than 400 million people¹. The language is categorized into two forms: Fus'ha (Modern Standard Arabic) and Ammiya (the dialects of Arabic). Fus'ha is typically used in public and official events, as well as in news broadcasts, while Ammiya is more commonly used for day-to-day conversations. Over 20 countries use Arabic, and there are currently around 30 dialects in the region.

Unwritten Rules
Recognizing the variation presented by these dialects, the team at SRJO employed a range of techniques to discern and process the unique linguistic features inherent in each. This approach was crucial in ensuring that Galaxy AI could understand and respond in a way that accurately reflects the regional nuances.

"Unlike other languages, the pronunciation of the object in Arabic varies depending on the subject and verb in the sentence," says Mohammad Haweeleh, head of the Arabic Text-to-Speech (TTS) team. "Our goal is to develop a model that understands all these dialects and can answer in standard Arabic."

TTS is the component of Galaxy AI’s Live Translate feature that lets users interact with speakers of different languages by translating spoken words into written text, and then vocally reproducing them. The TTS team faced a unique challenge, caused by a quirk of working with Arabic.

Arabic uses diacritics, which are guides for the pronunciation of words in some contexts, such as religious texts, poetry, and books for language learners. Diacritics are widely understood by native speakers but absent in everyday writing. This makes it difficult for a machine to convert raw text into phonemes, the basic units of sound that are the building blocks of speech.

“There is a shortage of high-quality and reliable datasets that accurately represent how diacritics are correctly used”, explains Haweeleh. “We had to design a neural model that can predict and restore those missing diacritics with high accuracy.”

Neural models work similarly to human brains. To predict diacritics, a model needs to study lots of Arabic text, learn the language's rules, and understand how words are used in different contexts. For instance, the pronunciation of a word can vary greatly depending on the action or gender it describes. Extensive training from the team was the key to enhancing the Arabic TTS model’s accuracy.

Enhancing Understanding
The SRJO team also had to collect diverse audio recordings of the dialects from various sources, which had to be transcribed, focusing on unique sounds, words, and phrases. “We assembled a team of native speakers in the dialects who were well-versed in the nuances and variations,” says Ayah Hasan, whose team was responsible for database creation. “They listened to the recordings and manually converted the spoken words into text.”

This work was crucial for enhancing the Automatic Speech Recognition (ASR) process so that Galaxy AI could handle the rich tapestry of Arabic dialects. ASR is pivotal in enabling Galaxy AI’s real-time understanding and response capabilities.

“Building an ASR system that supports multiple dialects in a single model is a complex undertaking,” says Mohammad Hamdan, ASR lead for the project. “It demands a thorough understanding of the language's intricacies, careful data selection, and advanced modeling techniques.”

The Culmination of Innovation
After months of planning, building, and testing, the team was ready to release Arabic as a language option for Galaxy AI, enabling many more people to communicate across borders. This single team has made Galaxy AI services accessible to Arabic speakers, lowering the language and cultural barriers between them and people all over the world. In doing so, they have established new best practices that can be rolled out globally. This success is only the beginning: the team continues to refine their models and enhance the quality of Galaxy AI’s language capabilities.

In the next episode, we go to Vietnam to see how the team makes language data better. Plus, what does it take to train an effective AI model?

Arabic is just one part of the languages and dialects newly supported by Galaxy AI and available for download from the Settings app. Galaxy AI’s language features such as Live Translate and Interpreter are available on Galaxy devices running Samsung’s One UI 6.1 update.²

¹UNESCO, World Arabic Language Day 2023
² One UI 6.1 was first released on Galaxy S24 series devices with a wider roll out to other Galaxy devices including S23 series, S23 FE, S22 series, S21 series, Z Fold5, Z Fold4, Z Fold3, Z Flip5, Z Flip4, Z Flip3, Tab S9 series and Tab S8 series