QuantFactory Banner

QuantFactory/eagle-3b-preview-GGUF

This is quantized version of etri-lirs/eagle-3b-preview created using llama.cpp

Original Model Card

EAGLE: ETRI's Advanced-lightweight Generative Language Engine

(๊ณผ๊ฑฐ์— eGPT๋กœ ๋ถˆ๋ ธ์œผ๋ฉฐ, 2024.11.14 ์— ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ถ”ํ›„ ๋ฆด๋ฆฌ์ฆˆ๋˜๋Š” ๋ชจ๋ธ์˜ prefix๋Š” egpt- ๋Œ€์‹  eagle-๋กœ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค)

๋ณธ ๋ชจ๋ธ์€ ์‚ฌ์ „ํ•™์Šต๋งŒ ์ˆ˜ํ–‰๋œ ๋ชจ๋ธ์ด๋ฉฐ, ๋ณ„๋„์˜ Instruction Tuning ๋“ฑ์ด ์ ์šฉ๋˜์ง€ ์•Š์€ ๊ธฐ์ดˆ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ฑ—๋ด‡ ์Šคํƒ€์ผ์˜ ์ž…์ถœ๋ ฅ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ, ๋ณ„๋„์˜ ๋ฏธ์„ธ์กฐ์ •์„ ๋ฐ˜๋“œ์‹œ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์ •๋ณด

3.1B Decoder-only, Causal ์–ธ์–ด๋ชจ๋ธ. ์ˆ˜ํ•™, ์ •๋Ÿ‰ ์ถ”๋ก ์„ ๋น„๋กฏํ•œ STEM ๋ถ„์•ผ์— ํŠนํ™”๋œ ์†Œ๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์„ ์ง€ํ–ฅํ•ฉ๋‹ˆ๋‹ค. ๋ฒ”์šฉ ์–ธ์–ด๋ชจ๋ธ์˜ ์—ญํ• ์„ ๋ชฉํ‘œ๋กœํ•˜์ง€๋Š” ์•Š๊ธฐ์—, ํ†ต์ƒ์˜ ์ดํ•ด ๊ด€๋ จ ๋ฒ”์šฉ ํƒœ์Šคํฌ ํ‰๊ฐ€(e.g. hellaswag, sentineg ๋“ฑ)์—๋Š” ๋‚ฎ์€ ์„ฑ๋Šฅ์ด ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ ๋ฐ ํ•™์Šต ๋ฐฉ๋ฒ• ์ˆ˜์ •, ๊ฐœ์„ ์œผ๋กœ ์ธํ•ด ๋ณธ ๋ชจ๋ธ์€ ๋น„์ •๊ธฐ์ ์œผ๋กœ ์—…๋ฐ์ดํŠธ ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ฏธ๋ฆฌ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Tokenizer๋Š” LLaMa์˜ ๊ตฌ์„ฑ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ byte-fallbacked BPE + digit ๋ถ„๋ฆฌ ๊ตฌ์„ฑ์„ ๊ฐ€์ง€๋‚˜, BOS/EOS(e.g. <s>,</s>) ํ† ํฐ์ด ๋ชจ๋‘ EOS(</s>)๋กœ ํ†ต์ผ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ† ํฌ๋‚˜์ด์ € ์„ค์ •์—์„œ PAD ํ† ํฐ์€ ๋ณ„๋„๋กœ ์ง€์ •๋˜์–ด ์žˆ์ง€ ์•Š์œผ๋‚˜, Byte-level BPE์˜ ํŠน์„ฑ์ƒ <unk> ์‹ฌ๋ณผ์ด ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ, ๋ฏธ์„ธ์กฐ์ • ๋‹จ๊ณ„์—์„œ๋Š” <unk> ํ† ํฐ์„ PAD ํ† ํฐ์œผ๋กœ ์ง€์ •ํ•˜์—ฌ ํ™œ์šฉํ•  ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. LLaMA ํ˜ธํ™˜ ์•„ํ‚คํ…์ณ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, A100 80GB PCIE * 8์žฅ์—์„œ ์•ฝ 720B tokens๋ฅผ from-scratch๋กœ ์‚ฌ์ „ ํ•™์Šตํ•˜์—ฌ ํš๋“๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์—…๋ฐ์ดํŠธ ๊ธฐ๋ก/Update log

๋‚ ์งœ ๋ฒ„์ „(git tags, revision ID) ์„ธ๋ถ€ ์‚ฌํ•ญ
2024.10.28 v24.10 (ํ˜„์žฌ๋ฒ„์ „) ์ฒซ๋ฒˆ์งธ ํผ๋ธ”๋ฆญ ๋ฆด๋ฆฌ์ฆˆ ํ›„๋ณด. ์•ฝ 720B tokens ํ•™์Šต

ํ†ต์ง€์‚ฌํ•ญ/Acknowledgement

  • ์ด ๋ชจ๋ธ์€ 2024๋…„๋„ ์ •๋ถ€(๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€)์˜ ์žฌ์›์œผ๋กœ ์ •๋ณดํ†ต์‹ ๊ธฐํšํ‰๊ฐ€์›์˜ ์ง€์›์„ ๋ฐ›์•„ ์ˆ˜ํ–‰๋œ ์—ฐ๊ตฌ์ž„ (RS-2023-00216011, ์‚ฌ๋žŒ์ฒ˜๋Ÿผ ๊ฐœ๋…์ ์œผ๋กœ ์ดํ•ด/์ถ”๋ก ์ด ๊ฐ€๋Šฅํ•œ ๋ณตํ•ฉ์ธ๊ณต์ง€๋Šฅ ์›์ฒœ๊ธฐ์ˆ  ์—ฐ๊ตฌ)
  • This work was supported by Institute of Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (RS-2023-00216011, Development of artificial complex intelligence for conceptually understanding and inferring like human)

์ œํ•œ์  ๋ชจ๋ธ ์ ‘๊ทผ ๋ฐ, ๋ชจ๋ธ ์ ‘๊ทผ ํ—ˆ๊ฐ€์™€ ๊ด€๋ จํ•œ ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘ ๋ฐ ์‚ฌ์šฉ ์•ˆ๋‚ด/Information on Collection and Use of Personal Information for Gated Model Access

๋ณธ ๋ชจ๋ธ์€ ์—ฐ๊ตฌ์™€ ๊ต์œก ๋ชฉ์ ์œผ๋กœ๋งŒ ์‚ฌ์šฉ ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ˜„์žฌ ๋ณ„๋„์˜ ์Šน์ธ ์—†์ด, Huggingface ๊ณ„์ •์œผ๋กœ ๋กœ๊ทธ์ธ ํ›„ ์Šน์ธ ์š”์ฒญ์„ ์ˆ˜ํ–‰ํ•˜์‹œ๋ฉด ์ž๋™์œผ๋กœ ๋ชจ๋ธ์„ ๋ฐ›์œผ์‹ค ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ์–ต์„ธ์Šค์™€ ๊ด€๋ จํ•ด์„œ ๋ฌธ์˜ ์‚ฌํ•ญ์ด ์žˆ์œผ์‹œ๋ฉด jhshin82 at etri.re.kr (__at__์„ @์œผ๋กœ ์น˜ํ™˜)๋กœ ๋ฌธ์˜ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

๋ณธ ๋ชจ๋ธ๊ณผ ๊ด€๋ จํ•ด ์‚ฌํšŒ์ , ๋ฒ•์  ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ์„ ์ œํ•œํ•˜๊ณ , ๋ฐฐํฌ๋ฅผ ์ฒ ํšŒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ชจ๋ธ ์ ‘๊ทผ ํ—ˆ๊ฐ€์— ์‚ฌ์šฉ๋œ ์ด๋ฉ”์ผ ์ฃผ์†Œ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ˆ˜์ง‘, ๋ณด์œ , ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘๋™์˜/Concent to collection of Personal Information

๋ณธ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ๊ณผ ๊ด€๋ จ, ๋ฐฐํฌ/์‚ฌ์šฉ ์ œํ•œ/์ฒ ํšŒ, ๊ทธ ์™ธ ์‚ฌ์šฉ์ž์˜ ์ด์ต์— ๊ด€๊ณ„๋œ ๋ผ์ด์„ ์Šค ๋ณ€๊ฒฝ ์‹œ ์ด๋ฅผ ํ†ต์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฐœ์ธ์ •๋ณด๋ฅผ ์ˆ˜์ง‘, ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

์ˆ˜์ง‘ ๋ชฉ์  ์ˆ˜์ง‘ ํ•ญ๋ชฉ ๋ณด์œ , ์ด์šฉ๊ธฐ๊ฐ„
๋ชจ๋ธ์˜ ์‚ฌ์šฉ์ œํ•œ/์ฒ ํšŒ ์š”์ฒญ ๋ชฉ์  ์ด๋ฉ”์ผ ์ฃผ์†Œ, huggingface hub ID ๋ณธ ๋ชจ๋ธ์˜ ๊ณต๊ฐœ ๊ธฐ๊ฐ„ ๋ฐ ์ด์šฉ ๋ชฉ์  ๋‹ฌ์„ฑ ์‹œ
๋ชจ๋ธ์˜ ์‚ฌ์šฉ ๋ผ์ด์„ ์Šค ๋“ฑ ๋ณ€๊ฒฝ ์•ˆ๋‚ด ์ด๋ฉ”์ผ ์ฃผ์†Œ, huggingface hub ID ๋ณธ ๋ชจ๋ธ์˜ ๊ณต๊ฐœ ๊ธฐ๊ฐ„ ๋ฐ ์ด์šฉ ๋ชฉ์  ๋‹ฌ์„ฑ ์‹œ

๋ณธ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ ‘๊ทผ ์š”์ฒญ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๋ชจ๋ธ์— ์ ‘๊ทผํ•˜์‹œ๋Š” ํ–‰์œ„๋Š” ์•„๋ž˜์— ์•ˆ๋‚ด๋œ ์•ˆ๋‚ด์‚ฌํ•ญ, ๋ณธ ๋ชจ๋ธ์˜ ํ•œ๊ณ„, ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ์— ๋Œ€ํ•œ ์ •๋ณด, ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘/์ด์šฉ์— ๋™์˜ํ•˜์‹  ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” ๋™์˜๋ฅผ ๊ฑฐ๋ถ€ํ•˜์‹ค ๊ถŒ๋ฆฌ๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋™์˜๋ฅผ ๊ฑฐ๋ถ€ํ•˜์‹ค ๊ฒฝ์šฐ ๋ชจ๋ธ ์‚ฌ์šฉ์ด ์ œํ•œ๋˜๋ฉฐ, ์ด์— ๊ด€๋ จํ•œ ์‚ฌ์šฉ, ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์ฑ…์ž„์€ ์‚ฌ์šฉ์ž์—๊ฒŒ ์žˆ์Œ์„ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค. ์‚ฌ์šฉ ํ›„ ๋™์˜ ์ฒ ํšŒ, ๊ฐœ์ธ์ •๋ณด ํ๊ธฐ์— ๋Œ€ํ•œ ์‚ฌํ•ญ์€ ์ƒ๊ธฐ ์•ˆ๋‚ด๋œ ๋ฉ”์ผ ์ฃผ์†Œ ๋˜๋Š” Community tab์„ ํ†ตํ•ด์„œ ์š”์ฒญํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ์˜ ํ•œ๊ณ„, ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•œ ๊ด€๋ จ ์ •๋ณด ์•ˆ๋‚ด

๋ณธ ๋ชจ๋ธ์˜ ๊ฐœ๋ฐœ๊ณผ ๊ด€๋ จํ•œ ๊ฐœ๋ฐœ์ž ๋ฐ ์กฐ์ง์€ ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ๋ฅผ ์ค€์ˆ˜ํ•˜๊ณ ์ž ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด์™€ ๊ด€๋ จํ•ด AI ์—ฐ๊ตฌ์— ์‚ฌ์šฉ๋˜๋Š” ์ž…์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋‚ด ํฌํ•จ๋œ ์š•์„ค, ์Œ๋ž€, ์ •์น˜์  ๋‚ด์šฉ ๋ฐ ๊ธฐํƒ€ ๊ฑฐ์นœ ์–ธ์–ด์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ ์ž ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์›์‹œ ์›น ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ ์ƒ ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ด ํ•™์Šต๋œ ๋ณธ ์ƒ์„ฑ ์–ธ์–ด ๋ชจ๋ธ์€ ๊ฒฝ๋„๋œ ์‚ฌ์ƒ์„ ํฌํ•จํ•˜๊ฑฐ๋‚˜, ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋  ์ˆ˜ ์—†๋Š” ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ ์–ธ์–ด ๋ชจ๋ธ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŠน์ • ํ”„๋กฌํ”„ํŠธ์™€ ๊ณต๊ฒฉ์ ์ธ ์ฝ˜ํ…์ธ ๊ฐ€ ๋ฐ˜ํ™˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํฌํ•จ, ๋ณธ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ/์ƒ์„ฑ ๊ฒฐ๊ณผ์™€ ๊ด€๋ จํ•œ ๋‚ด์šฉ์€ ๊ฐœ๋ฐœ์ž ๋ฐ ๊ฐœ๋ฐœ์ž๊ฐ€ ์†ํ•œ ์กฐ์ง์˜ ์‚ฌ์ƒ, ์˜๋„์™€ ์ „ํ˜€ ๊ด€๋ จ์ด ์—†์Œ์„ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค.

ํ…Œ์ŠคํŠธ์ค‘์— ๋ฐœ์ƒํ•œ ๋น„์ •์ƒ์ ์ธ ํ˜น์€ ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋˜์ง€ ์•Š๋Š” ํ…์ŠคํŠธ๊ฐ€ ์ƒ์„ฑ๋œ ๊ฒฝ์šฐ jhshin82 at etri.re.kr๋กœ (__at__์„ @๋กœ ์น˜ํ™˜) ์ถœ๋ ฅ ์œ ๋„์— ์‚ฌ์šฉ๋œ ์ž…๋ ฅ๋ฌธ(ํ”„๋กฌํ”„ํŠธ), ์‚ฌ์šฉ๋œ ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ• ๋ฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ(์˜ˆ: top-p=0.8, temperature, repetition-penalty ๋“ฑ), ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ํ•จ๊ป˜ ๋ณด๋‚ด์ฃผ์‹œ๋ฉด, ์ด๋ฅผ ์–ต์ œํ•˜๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ์„ ๊ธฐ์šธ์ด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ‰๊ฐ€/Evaluations

์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์˜ KOBEST ํ‰๊ฐ€

ํ‰๊ฐ€๋Š” EleutherAI/lm-evaluation-harness, v0.4.2๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, KoBEST(Kim et al., 2022) ํ‰๊ฐ€์…‹์œผ๋กœ fine-tuning ์—†์ด zero-shot, 5-shot ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. (lm-evaluation-harness์˜ KOBEST ํ‰๊ฐ€๋Š” ๋ฒ„์ „์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์–ด, ์ตœ์‹  lm-evaluation-harness(๋ฒ„์ „ 0.4.2 ์ดํ›„)๋ฅผ ํ†ตํ•œ ํ‰๊ฐ€๋ฅผ ์•„๋ž˜ ๋ณ„๋„๋กœ ์ œ์‹œํ•˜์˜€์Šต๋‹ˆ๋‹ค.)

Zero-shot ์„ฑ๋Šฅ KB-BOOLQ (F1) KB-COPA (F1) KB-HELLASWAG (F1) KB-SENTINEG (F1) KB-WIC (F1) Average (F1)
eagle-3b-preview (v24.08) 0.3393 0.5353 0.3446 0.5653 0.3280 0.3994
eagle-3b-preview (v24.09) 0.3343 0.5367 0.3383 0.4991 0.3280 0.3917
eagle-3b-preview (v24.10) 0.3778 0.5648 0.3369 0.4763 0.3280 0.4092
eagle-3b-preview (v24.11) 0.3651 0.5893 0.3551 0.4473 0.3280 0.4101
5-shots ์„ฑ๋Šฅ KB-BOOLQ (F1) KB-COPA (F1) KB-HELLASWAG (F1) KB-SENTINEG (F1) KB-WIC (F1) Average (F1)
eagle-3b-preview (v24.08) 0.4680 0.5580 0.3332 0.4950 0.4830 0.4795
eagle-3b-preview (v24.09) 0.5087 0.5599 0.3257 0.4207 0.4212 0.4681
eagle-3b-preview (v24.10) 0.5207 0.5791 0.3511 0.5959 0.4712 0.5078
eagle-3b-preview (v24.11) 0.4753 0.5924 0.3592 0.5810 0.4930 0.5024
10-shots ์„ฑ๋Šฅ KB-BOOLQ (F1) KB-COPA (F1) KB-HELLASWAG (F1) KB-SENTINEG (F1) KB-WIC (F1) Average (F1)
eagle-3b-preview (v24.08) 0.4243 0.5673 0.3364 0.4232 0.4265 0.4465
eagle-3b-preview (v24.09) 0.5001 0.5597 0.3377 0.3498 0.3578 0.4432
eagle-3b-preview (v24.10) 0.5101 0.5894 0.3675 0.5101 0.4650 0.4994
eagle-3b-preview (v24.11) 0.4151 0.6143 0.3718 0.5883 0.5134 0.4963

์ „์ดํ•™์Šต ๋Šฅ๋ ฅ ํ‰๊ฐ€

์ค€๋น„์ค‘์ž…๋‹ˆ๋‹ค.

๋ชจ๋ธ GSM8k test ๋น„๊ณ 
- - -

์‚ฌ์ „ํ•™์Šต์— ์ฐธ์—ฌํ•œ ๋ฐ์ดํ„ฐ์…‹ ์ •๋ณด/Datasets

  • FIXME: ํ•™์Šต๋ฐ์ดํ„ฐ ๋ชฉ๋ก ์ˆ˜์ •, ์—…๋ฐ์ดํŠธ ํ•„์š”

์•„๋ž˜์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค:

์‚ฌ์šฉ ์š”๋ น/How to use

์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด, transformers>=4.28 ๋ฒ„์ „์—์„œ ์ถ”๋ก  ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

import sys

from transformers import (
        AutoTokenizer, AutoModelForCausalLM, GenerationConfig
        )


def load_model(mdl_path):
    tokenizer = AutoTokenizer.from_pretrained(mdl_path,)
    # device_map ์ธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” accelerator ๋ชจ๋“ˆ ์„ค์น˜ ํ•„์š”.
    model = AutoModelForCausalLM.from_pretrained(mdl_path, device_map="auto",
                                                 torch_dtype="auto")

    return tokenizer, model


if __name__ == '__main__':
    # FIXME: ๋ชจ๋ธ ๊ฒฝ๋กœ ์ˆ˜์ •!
    tokenizer, model = load_model("etri-lirs/egpt-3b-preview")
    # print(model.hf_device_map)
    # ํ•„์š”์— ๋”ฐ๋ผ ์•„๋ž˜ ์ƒ์„ฑ ์˜ต์…˜์„ ์ œ์–ด
    gen_cfg = GenerationConfig(max_new_tokens=256, min_length=0,
                               max_time=10.0, do_sample=True,
                               top_p=0.9, epsilon_cutoff=3e-4,)

    print("** Now Ready to input from stdin.")
    for aline in sys.stdin:
        aline = aline.rstrip("\n\r\t")
        input_cond = tokenizer(aline, add_special_tokens=False, return_tensors="pt").to("cuda")
        outs = model.generate(**input_cond, generation_config=gen_cfg)
        out_str = tokenizer.batch_decode(outs, skip_special_tokens=True,
                                         clean_up_tokenization_spaces=True)
        print(">> " + ' '.join(out_str))
Downloads last month
9
GGUF
Model size
3.1B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this modelโ€™s pipeline type.