Hi again! 👋
In Part 1, we established the crucial concept of Sovereign AI. For Part 2, we’re excited to dive deeper and introduce our recent project—the Korean Video Data Construction Project—by looking at it through the lens of Sovereign AI.
🎥 The Korean Video Understanding Data Project and Sovereign AI
The 'Korean Video Understanding Data' project we carried out was all about building video-related AI training data specifically focused on Korea. This dataset is a powerful collection, featuring approximately 41,000 images related to Korea, along with a total of 205,000 detailed descriptive sentences for those images.
We built this essential data through the following systematic steps:
1. Image Extraction & Expert Review: We extracted Korea-related images from broadcast and video content, ensuring rigorous inspection by subject matter experts.
2. Caption Generation (The Core Task): We created five descriptive sentences for each image. This is the crucial work of Image Captioning—generating rich, contextual descriptions for the visual data.
3. Review & Refinement: We reviewed the images and descriptive sentences, then corrected and supplemented any inappropriate or inaccurate expressions.
4. Quality Assurance: We conducted final quality verification and validity evaluation.
Through this meticulous process, we successfully built AI training data that possesses a much deeper and more accurate understanding of Korea's unique culture.
So, why exactly is this type of specialized data so important?
AI models learn how to think and express themselves based entirely on their training data. While large, general-purpose models from global Big Tech companies like ChatGPT and Gemini deal with a vast array of worldwide data, they have relatively little data that accurately reflects the unique daily life and culture specific to Koreans.
Therefore, the most desirable approach is for Koreans to directly build and train AI models that can truly understand data rooted in the Korean context.
For example, when we asked ChatGPT and Gemini to generate an image that comes to mind when they hear the word 'palace,' they created images like this:
<Images of 'Palace' generated by ChatGPT and Gemini>
Does this match the image of a palace you envisioned? It might look quite different from the image of a palace that people in non-Western countries imagine.
From this perspective, the project we carried out is deeply linked to the Sovereign AI strategy. This is because the core of Sovereign AI is precisely about building AI tailored to the social and cultural context of one's own country—and this is true for other nations as well.
These kinds of projects will become an important foundation for AI to more precisely understand and reflect the cultural context of various countries around the world. This is more than just technological development; it's a way for countries globally to safeguard their cultural sovereignty.
🖼️ Image Captioning Test
So, how exactly was the actual data created? From here on, we’ll walk you through the Korean Video Understanding Data Construction Project by looking at a practical case study: Image Captioning using ChatGPT and Gemini.
Here is a photo featuring a beautiful autumn scene. It captures a typical Korean autumn landscape and was found on an image site. Let's assume this image was captured from a program produced by a Korean broadcaster.
<Autumn Landscape (Olympic Park), Source: Unsplash>
The task of captioning involves creating five descriptive sentences for the image. We followed a process of first generating the sentences in Korean and then translating them into English. We showed the photo to ChatGPT and Gemini, provided information about the location (Seoul's Olympic Park), and instructed them to generate five sentences in Korean. Finally, they were asked to translate those sentences into English.
The two models generated the following sentences:
<English Sentences Generated by ChatGPT and Gemini>
Both models managed to generate smooth English sentences based on the descriptions they first created in Korean.
In Part 2, we introduced our Korean Video Understanding Data Construction Project through the lens of Sovereign AI, using the case studies tested with ChatGPT and Gemini.
In Part 3, we'll continue our exploration by looking at the critical steps of Human Labeling and Quality Assurance (Correction) Testing.
Thank you for reading, and stay tuned for the next part!