KorNAT (Korean National Alignment Test) is a benchmark dataset developed to assess the alignment of Large Language Models (LLMs) with South Korean social values and common knowledge. It evaluates models across two primary dimensions:
- Social Value Alignment: Measures how well a model understands and reflects the collective viewpoints of South Korean citizens on critical societal issues.
- Common Knowledge Alignment: Assesses the model’s grasp of foundational information broadly recognized and understood by the South Korean populace.
Key Characteristics:
- Data Collection: The social value dataset comprises multiple-choice questions generated using GPT-3.5-Turbo, based on current social issues identified from news articles and reports. These questions were refined through human review and validated via a large-scale survey involving 6,174 Korean participants. The common knowledge dataset includes questions crafted by human workers, derived from Korean textbooks and General Educational Development (GED) reference materials, covering subjects like history, science, and mathematics.
- Dataset Composition: KorNAT contains 4,000 samples in the social value dataset and 6,000 samples in the common knowledge dataset.
- Evaluation Metrics: For social value alignment, metrics such as Social Value Alignment (SVA), Adjusted SVA (A-SVA), and Normalized SVA (N-SVA) are used to measure the extent to which a model’s responses align with the majority opinion in society. Common knowledge alignment is evaluated using accuracy metrics.
- Leaderboard: A leaderboard is maintained to evaluate models using the KorNAT benchmark, providing a standardized platform for comparison.
Applications:
- LLM Evaluation: Provides a standardized benchmark to assess how well LLMs align with South Korean social values and common knowledge.
- Model Improvement: Identifies areas where models may need further training to better align with specific cultural and knowledge aspects.
- Policy Making: Assists in understanding how AI models interact with cultural norms, informing regulations and guidelines.
Why It Matters:
KorNAT addresses the need for culturally specific evaluation benchmarks, ensuring that AI systems deployed in South Korea are aligned with local social values and knowledge. This alignment is crucial for the effective and ethical deployment of AI technologies within the country.