Samsung Mobile Press

The Learning Curve, Part 4: A New AI Model and an Evolving Language

June 4, 2024

Samsung Research in China is part of a series about the people and innovations behind the democratization of mobile AI

The Learning Curve part 4 body image

As Samsung continues to pioneer premium mobile AI experiences, we visit Samsung Research centers around the world to learn how Galaxy AI is enabling more users to maximize their potential. Galaxy AI now supports 16 languages, so more people can expand their language capabilities, even when offline, thanks to on-device translation in features such as Live Translate, Interpreter, Note Assist and Browsing Assist. But what does AI language development involve? Last time we visited Vietnam to learn about preparing the data that’s used to train AI models. This time, we’re seeing how teams made Galaxy AI a unique offering for both the Chinese mainland and Hong Kong.

The rapid growth in AI tools that use large language models (LLM) has been seen worldwide, and China is no exception. With Baidu’s ERNIE Bot and Meitu’s MiracleVision emerging as popular choices in China, Samsung R&D Institute China partnered with both companies to help build Galaxy AI features for the country. 

Samsung R&D Institute China in Guangzhou (SRC-G) and Beijing (SRC-B) worked to ensure Mandarin speakers in China had the same Galaxy AI experience as other users around the world, despite the back-end technology looking very different. The team took advantage of the dedicated resources of Chinese dialects from their third-party partners and built a unique Galaxy AI solution for China.

“We have the advantage of blending global best practices with China’s local practices, as well as creating new features and constantly improving them through daily communication with Chinese consumers.” says Hairong Zhang, Software Innovation Group Leader at SRC-G. “With rich development experience from the Galaxy S24, I’m proud of how our team co-operated with local Chinese AI companies such as Baidu and Meitu to provide a solution that resonates in China.”

The Learning Curve part 4 body image
At the beginning, the teams had to acclimatize to each other’s working styles and iron out the initial kinks of information asymmetry. Daijun Zhang, head of SRC-B, established a task force to ensure the project followed the development schedule and moved quickly toward its goals.

The Learning Curve part 4 body image
Thanks to the Beijing team’s experience in generating large-scale models, and with successful collaboration with the third-party partners, all the generative AI features were successfully launched in China. The result is a solution that has local relevance and market-specific features such as Touch to Search. 

Expanding on Chinese to develop for the Cantonese dialect
Chinese for mainland China (Mandarin) arrived on Galaxy AI with the launch of the Galaxy S24 in January 2024. But the job for the Samsung R&D Institute China was far from finished. The team was also tasked with developing the AI model for Chinese in Hong Kong (Cantonese), a dialect that builds on the work already carried out for Mandarin but one that brings an entirely new set of language features to address.

The Learning Curve part 4 body image
In developing for Cantonese, the China R&D team faced major cultural challenges that they needed to respond to in order to fully support localization for the market. The first cultural phenomenon is the two sets of systems for writing and speech. Hong Kong locals use grammar and expression similar to Mandarin when writing but adopt a completely different colloquial grammar when communicating daily. Also, Cantonese has nine tones for pronunciation whereas Mandarin has four.

Another cultural phenomenon is that the Cantonese dialect itself develops with the times. Add to that the fact that people often blend Cantonese and English into conversations and it’s clear to see why it was complicated to create test cases and validate language packs.

The Learning Curve part 4 body image
“Cantonese is a very unique dialect that varies in different Cantonese speaking regions,” says Jing Li, who leads the operation for testing the Cantonese AI solution. “Some of the slang, phrases, vocabularies, and even the tones are varied from place to place. Therefore, we conducted a large amount of work in verifying the Hong Kong specific data, as well as proofreading tens of thousands of relevant test cases.”

The Learning Curve part 4 body image
With these complexities in mind, SRC-G and SRC-B worked together to support a deep code mix using a mixture of Cantonese and English for speech recognition, simultaneously supporting both written and spoken expressions in machine translation, and reflecting current pronunciations in speech synthesis.

Cultural impact of communication
When Galaxy AI launched the Chinese (Hong Kong) language option, the customer feedback showed that the hard work of the Samsung R&D team was justified.

For both the Chinese mainland and Hong Kong, Samsung’s Galaxy AI activities show the importance of a global brand having local presence and expertise, as well as the power of open collaboration with other organizations. In Hong Kong, Cantonese is a key part of the cultural identity of those who live there. That’s why it was so important for the team to get the AI language model right.

“Language and communication are crucial in every region and in all walks of life,” says Henry Wat, who heads of the engineering group at Samsung Electronics Hong Kong. “No matter the language, any tool that helps people communicate is invaluable. I believe our work is meaningful.”

The Learning Curve part 4 body image
In the next episode of The Learning Curve, we will head to Brazil to see how a team works across cultures and borders to bring Galaxy AI to more people. ​​​

Images (7)