Safetensors
llama
pkedzia commited on
Commit
0c17108
·
verified ·
1 Parent(s): 366105d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -2
README.md CHANGED
@@ -1,7 +1,33 @@
1
  ---
2
  license: llama3.2
 
 
 
 
 
3
  ---
4
 
5
- Parameters:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  * temperature: 0.6
7
- * repetition_penalty: 1.0
 
 
 
 
1
  ---
2
  license: llama3.2
3
+ language:
4
+ - pl
5
+ - en
6
+ - es
7
+ - de
8
  ---
9
 
10
+
11
+ ### Intro
12
+ We have released a collection of radlab/pLLama3.2 models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3.2 models. As part of the collection, we provide models in 1B and 3B architecture.
13
+ Each model is available in two configurations:
14
+ - radlab/pLLama3-1B, a model in architecture 1B only after fine-tuning
15
+ - radlab/pLLama3-1B-DPO, a model in architecture 1B after fine-tuning and DPO process
16
+ - radlab/pLLama3-3B, a model in architecture 3B only after fine-tuning
17
+ - radlab/pLLama3-3B-DPO, a model in architecture 3B after fine-tuning and DPO process
18
+
19
+ ### Dataset
20
+ In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets.
21
+ In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors.
22
+
23
+ ### Learning
24
+ The learning process was divided into two stages:
25
+ - Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs.
26
+ - After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps.
27
+
28
+ ### Proposed parameters:
29
  * temperature: 0.6
30
+ * repetition_penalty: 1.0
31
+
32
+ ### Outro
33
+ Enjoy!