File size: 1,776 Bytes
61ab573
 
2691a4a
61ab573
2691a4a
61ab573
 
 
 
2691a4a
 
 
e638628
2691a4a
1b2b3f1
 
 
 
 
2691a4a
 
 
 
 
 
 
 
 
 
28bc9a7
2691a4a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
title: README
emoji: 🔥
colorFrom: indigo
colorTo: blue
sdk: static
pinned: false
---

# Welcome to HAERAE


We are a non-profit research lab focused on the interpretability and evaluation of Korean language models. Our mission is to advance the field with insightful benchmarks and tools. Below is an overview of our projects.

## High-Quality Korean Corpora
- [Korean WebText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-WEBTEXT) : A collection of 2B tokens of Korean text collected from the web.
- [Korean SyntheticText](https://huggingface.co/datasets/HAERAE-HUB/KOREAN-SyntheticText-1.5B) : A collection of 1.5B tokens of Korean text synthetically generated.


## Evaluation Benchmarks
- **HAE_RAE_BENCH Series**:
  - [HAE_RAE_BENCH_1.0](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.0): An evaluation suite for Korean knowledge. See [paper](https://arxiv.org/abs/2309.02706) for further information.
  - [HAE_RAE_BENCH_1.1](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.1): An ongoing project to refine the HAE_RAE_BENCH 1.0, enhancing its depth and coverage.

- **KMMLU**:
  - [KMMLU](https://huggingface.co/datasets/HAERAE-HUB/KMMLU): A Korean reimplementation of MMLU, focusing on comprehensive language understanding across a wide range of subjects. See [paper](https://arxiv.org/abs/2402.11548) for further information.
  - [KMMLU-HARD](https://huggingface.co/datasets/HAERAE-HUB/KMMLU-HARD): A subset of KMMLU, with CoT samples.
    
## Bias and Fairness
- [QARV](https://huggingface.co/datasets/HAERAE-HUB/QARV-preview) : An ongoing project aiming to benchmark regional bias in Large Language Models (LLMs).

If you have any inquiries or are interested in joining our team, please contact me at `[email protected]`.