Product
OverviewVideo​Graphic​Document​
Enterprise
Story
LETR / TECH noteNews / Notice​
Pricing
En
한국어English日本語日本語
User guide
Getting started
한국어English日本語
한국어English日本語
AI that became a linguistic genius, multilingual (Polyglot) model (1)
2024-07-04
The limits of language are the limits of the world. (The limits of my language means the limits of my world.)

‍

wittgenstein_LETR
Image: LITHUB

‍

This is what Wittgenstein, the 20th century's leading philosopher, said. As he said, humans think in language and live within the framework of that language. Since we are Korean, we are going to think and live within the framework of the Korean language, and of course the world we understand is bound to be different from that of people in the Anglo-American region.

‍

Therefore, in order to understand the world more broadly and more deeply, I need to expand my horizons through language. But learning a new language isn't easy. To properly understand a language, you need to know the country, region, culture, and people that language belongs to (not just increase your vocabulary).

‍

‍

The world is big, and there are many languages. However...

‍

It is said that there are around 7,100 languages around the world. As such, there is probably a lot of human knowledge and information left that hasn't been shared with the world yet. It's unfortunate that humans are limited in their ability to learn languages.

‍

languages_LETR
Image: The Washington Post

‍

Meanwhile, the online world is dominated by English. It's often said that the Internet is an open information space, but I think this story is limited to English users. The reality is that there is a huge knowledge and information gap for many people who don't actually speak English.

‍

Rank Language_LETR
Image: MADTIMES

‍

‍

The disappointment of natural language processing with a focus on English

‍

In the past, NLP research, such as machine translation and language models, has focused on English. Apparently, it has developed mainly in the US and other Western regions, so it's no wonder. As a result, most languages, with the exception of some languages such as English and Spanish, were left out of NLP research.

‍

Most multilingual AI models also rely on English. For example, when translating from German to Korean, they first switch from German to English, then change from English to Korean, and so on. The erratic mistranslation of machine translators, which used to be easy to read, may have had a big impact.

‍

mistranslation_LETR
Image: Seoul Newspaper

‍

Meanwhile, due to globalization, the importance of NLP technology is growing more and more. There are more and more things that everyone needs to do to communicate across language barriers. Unfortunately, the reality is that most people around the world are still excluded from the benefits of technological advancements such as AI translation.

‍

Languages with little data that can train AI language models are called low-resource languages. However, as is well known, NLP research requires significant amounts of linguistic data. As a result, only people who speak a select few commonly used languages (out of 7,100 languages around the world) can use AI language tools.

‍

In fact, according to Meta (Meta) AI “More than 20% of the world's population cannot receive commercialized translation technology services.” *That's it. There is a digital divide that prevents people using low-resource languages from communicating freely. This is why there is a need for solutions for those who are excluded from the global exchange of knowledge, information, and culture because of language.

‍

Image: Meta AI

‍

‍

While finishing

‍

Before looking at multilingual AI in earnest, I looked at why various languages other than English are becoming important in NLP research. In fact, recently, there have been more and more attempts to switch languages and translation models to multilingual ones. In light of these unfortunate circumstances, this is great news for more people around the world who have been marginalized until now.

‍

Next, in the next post, I'll take a closer look at this topic through actual industry research and development examples.

‍

‍

‍

* Quote https://www.ciokorea.com/t/22000/AI/243970#csidxaf4c5dbdb5bf6318b0d338efe81a7fa

‍

‍

References

[1] https://www.washingtonpost.com/news/worldviews/wp/2015/04/23/the-worlds-languages-in-7-maps-and-charts/

[2] https://www.ethnologue.com/guides

[3] https://edu.krlo.co.kr/2018/05/09/q-001/

[4] https://ai.facebook.com/blog/teaching-ai-to-translate-100s-of-spoken-and-written-languages-in-real-time/

‍

‍

‍

🚀데이터 인텔리전스 플랫폼 '레터웍스' 지금 바로 경험해보세요.

• 노트의 내용을 실제로 이용해 보세요! (한 달 무료 이용 가능 🎉)
• AI 기술이 어떻게 적용되는지 궁금한가요? (POC 샘플 신청하기 💌)

‍

‍

‍

View all blogs

View featured notes

LETR note
Comparing Google Gemini and LETR WORKS Persona chatbots
2024-12-19
WORKS note
All about persona chatbot: technology, usage, and LETR WORKS approach
2024-12-16
LETR note
Paradigm innovation in content creation - the present and future of AI dubbing technology
2024-12-12
User Guide
Partnership
Twigfarm Co.,Ltd.
Company registration number : 556-81-00254  |  Mail-order sales number : 2021- Seoul Jongno -1929
CEO : Sunho Baek  |  Personal information manager : Hyuntaek Park
Seoul head office : (03187) 6F, 6,Jong-ro, Jongno-gu,Seoul, Republic of Korea
Gwangju branch : (61472) 203,193-22, Geumnam-ro,Dong-gu,Gwangju, Republic of Korea
Singapore asia office : (048581) 16 RAFFLES QUAY #33-07 HONG LEONG BUILDING SINGAPORE
Family site
TwigfarmLETR LABSheybunny
Terms of use
|
Privacy policy
ⓒ 2024 LETR WORKS. All rights reserved.