VNExpress/Reuters-Feb 9

Like millions worldwide, Southeast Asians have been trying out large language models such as Meta’s Llama 2 and Mistral AI – but in their native Bahasa Indonesia or Thai. The result has usually been gibberish in English. This leaves them at a disadvantage, tech experts warn, as generative artificial intelligence transforms education, work and governance worldwide. A Singapore government-led initiative aims to correct the imbalance with a Southeast Asian LLM, the first in a family of models named SEA-LION – Southeast Asian Languages in One Network – trained in the region’s languages and cultural norms. Trained on data in 11 Southeast Asian languages including Vietnamese, Thai and Bahasa Indonesia, the open-sourced model is a cheaper and more efficient option for the region’s businesses, governments and academia, said Leslie Teo at AI Singapore. There are over 7,000 languages spoken worldwide. Yet LLMs including Open AI’s GPT-4 and Meta’s Llama 2 that are used to build AI systems such as chatbots and other tools, have largely been developed for, and are trained on, the English language. Governments and tech firms are trying to bridge this gap, with India creating datasets in local languages, an LLM in the United Arab Emirates powering generative AI tools in Arabic, and AI models in China, Japan and Vietnam in local languages. These models can help local populations participate more equitably in the global AI economy that is largely dominated by big tech firms, said Nuurrianti Jalli, an assistant professor at Oklahoma State University’s school of communications. Read more at: https://e.vnexpress.net/news/news/singapore-builds-ai-model-to-represent-southeast-asians-4710746.html