radlab
/

pLLama3.2-3B-DPO

Model card Files Files and versions Community

pkedzia commited on Oct 20, 2024

Commit

0c17108

·

verified ·

1 Parent(s): 366105d

Update README.md

Files changed (1) hide show

README.md +28 -2

README.md CHANGED Viewed

@@ -1,7 +1,33 @@
 ---
 license: llama3.2
 ---
-Parameters:
 * temperature: 0.6
-* repetition_penalty: 1.0

 ---
 license: llama3.2
+language:
+- pl
+- en
+- es
+- de
 ---
+### Intro
+We have released a collection of radlab/pLLama3.2 models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3.2 models. As part of the collection, we provide models in 1B and 3B architecture.
+Each model is available in two configurations:
+- radlab/pLLama3-1B, a model in architecture 1B only after fine-tuning
+- radlab/pLLama3-1B-DPO, a model in architecture 1B after fine-tuning and DPO process
+- radlab/pLLama3-3B, a model in architecture 3B only after fine-tuning
+- radlab/pLLama3-3B-DPO, a model in architecture 3B after fine-tuning and DPO process
+### Dataset
+In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets.
+In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors.
+### Learning
+The learning process was divided into two stages:
+- Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs.
+- After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps.
+### Proposed parameters:
 * temperature: 0.6
+* repetition_penalty: 1.0
+### Outro
+Enjoy!