Model Card for Qwen2.5-1.5B-Open-R1-Distill-ko

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the lemon-mint/korean-reasoning-v02 dataset. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "ν”„λž‘μŠ€μ˜ μˆ˜λ„λŠ”?"
generator = pipeline("text-generation", model="whooray/Qwen2.5-1.5B-Open-R1-Distill-ko", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
<think>\nλ¨Όμ € ν”„λž‘μŠ€ μˆ˜λ„λ₯Ό μ•Œμ•„λ΄μ•Όκ² μ–΄μš”. ν”„λž‘μŠ€λŠ” 잘 μ•Œλ €μ§„ 유럽 κ΅­κ°€ 쀑 ν•˜λ‚˜μΈλ° μˆ˜λ„λ₯Ό μ•Œλ €μ£Όλ©΄ 더 νŽΈν•˜κ² μ£ . μ£Όμš” κ·€μ‘±κ³Ό μˆ˜λ„λŠ” 였슀만 식민지와 λ¬΄μ—­μ˜ μ€‘μ‹¬μ§€μ˜€λ˜ μ•„λ¦„λ””λΌλŠ” 지역에 μžˆμ—ˆλ‹€λŠ” 말이 있던데, μ΅œκ·Όμ—λŠ” μ˜€νƒ€λƒ κ·Όμ²˜μ— μžˆλŠ” νŒŒλ¦¬κ°€ μˆ˜λ„λ‘œ ν†΅μΌλ˜μ—ˆμ„ κ±°μ˜ˆμš”. μ΄λ ‡κ²Œ λ…Όλž€μ΄ μžˆμ—ˆλ˜ 걸둜 κΈ°μ–΅ν•˜λŠ”λ°, λ…μΌμ΄λ‚˜ μ΄μ§‘νŠΈ, μ΄μŠ€λΌμ—˜ 같은 ꡭ가듀도 μˆ˜λ„λ‘œ λ‹€λ₯Έ 지역을 μ‚¬μš©ν–ˆλ˜ κ²½μš°κ°€ μžˆμ—ˆμŒμ„ μ•Œκ³  μžˆμ–΄μš”. ν”„λž‘μŠ€λŠ” CIA μ›”μŠ€νŠΈλ¦¬νŠΈ journalμ—μ„œ ν”„λž‘μŠ€ μˆ˜λ„λŠ” νŒŒλ¦¬κ°€ 아닐 거라고 λ§ν•œ 적 μžˆμ„κΉŒμš”? 졜근 λŒ€λΆ€λΆ„μ˜ κ΄€μΈ‘λ‚˜λΌλ“€μ΄ 파리둜 μΈμ •ν•˜κ³  μžˆμœΌλ―€λ‘œ μ •ν™•ν•œμ§€ 확인이 ν•„μš”ν•  것 κ°™μ•„μš”. ν•˜μ§€λ§Œ ν”„λž‘μŠ€μ—μ„œ μˆ˜λ„κ°€ λ³€ν–ˆλŠ”μ§€, μ•„λ‹ˆλ©΄ μ˜› μˆ˜λ„κ°€ ν˜„μž¬ μ •λΆ€ 인근에 μžˆλŠ”μ§€ κΆκΈˆν•˜λ„€μš”. μ•„λ§ˆλ„ 1962λ…„ 제5κ³΅ν™”κ΅­μ—μ„œλ ΉμœΌλ‘œ 슡격된 곳을 ν¬ν•¨ν•œ λͺ¨λ“  μ •λΆ€ 기관이 파리λ₯Ό μ€‘μ‹¬μœΌλ‘œ ν•˜λŠ” ꡐ두보 역할을 ν•˜κ²Œ 된 걸둜 μ•Œκ³  μžˆμ–΄μš”. νŠΉλ³„νžˆ μ—­μ‚¬μ μœΌλ‘œ μ •λ‹Ήν•œ μ£Όμž₯을 μ‚¬μš©ν•΄ κ°œν†΅ κ°€λŠ₯ν•œ 닡변을 μ™„μ„±ν•΄μ•Όκ² λŠ”λ°μš”. \n</think>\n\nν”„λž‘μŠ€μ˜ μˆ˜λ„λŠ” **파리(Paris)**μ˜ˆμš”. μ—­μ‚¬μ μœΌλ‘œ μˆ˜λ„ 역할을 ν•˜μ§€ λͺ»ν–ˆλ˜ 지점에 μœ„μΉ˜ν•œ ν”„λž‘μŠ€ λΉ„κ΅­λ―ΌλŒ€μ±… μ •λΆ€λ₯Ό μ€‘μ‹¬μœΌλ‘œ ν•œ μ •κΆŒμ΄ 1944년에 λ°±μ œμ˜¨μ„ νƒˆν™˜ν•˜μ—¬ μƒˆλ‘œμš΄ μˆ˜λ„λ‘œ μ§€μ •ν•˜λ©° μ΅œμ’…μ μœΌλ‘œ ν™•λ¦½λ˜μ—ˆμ–΄μš”.\n\n### ν”„λž‘μŠ€ μˆ˜λ„ ꡐ체의 μ£Όμš” 이유  \nλ‹Ήμ‹œ μ—°ν•©κ΅°μ˜ μΈλ„μ£Όμ˜ μ˜μ§€λ₯Ό λ°˜μ˜ν•œ μ‘°μΉ˜μ˜€μ–΄μš”. 1932λ…„ 17개 μ—°ν•©κ΅° 단체가 파리 기지λ₯Ό κ³΅μœ ν•˜λ©΄μ„œ 곡식적인 μˆ˜λ„ κΈ°λŠ₯을 μžƒμ—ˆλ‹€λŠ” μ μ—μ„œ 'νŠΉν—ˆ μˆ˜λ„'둜 κ²€μ—΄λ˜λ©° λ―Έκ΅­, 일본 λ“± 유럽 κ΅­κ°€λ“€ 쀑 파리λ₯Ό μ€‘μ‹¬μœΌλ‘œ ν•œ μ •λΆ€ μ£Όλ„μ˜ ν†΅μΉ˜κ°€ μœ λ €λ˜μ–΄ ν‘œμ€€ν™”λ˜μ—ˆμ£ .  \n> **λΉ„μœ **: ν”„λž‘μŠ€ μˆ˜λ„λŠ”

Training procedure

Visualize in Weights & Biases

This model was trained with SFT.

Framework versions

  • TRL: 0.15.0.dev0
  • Transformers: 4.49.0.dev0
  • Pytorch: 2.5.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin GallouΓ©dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month
19
Safetensors
Model size
1.54B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for whooray/Qwen2.5-1.5B-Open-R1-Distill-ko

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(221)
this model

Dataset used to train whooray/Qwen2.5-1.5B-Open-R1-Distill-ko