Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ tags:
|
|
12 |
- pre-trained
|
13 |
---
|
14 |
|
15 |
-
The Simbolo's Myanmarsar-GPT symbol is trained on a dataset of
|
16 |
|
17 |
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6598b82502c4796342239a35/rFId3-xyzWW-juDq_er9k.jpeg)
|
18 |
|
@@ -33,7 +33,7 @@ output = model.generate(input_ids, max_length=50)
|
|
33 |
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
34 |
```
|
35 |
### Data
|
36 |
-
The data utilized comprises
|
37 |
|
38 |
### Contributors
|
39 |
Main Contributor: [Sa Phyo Thu Htet](https://github.com/SaPhyoThuHtet)
|
|
|
12 |
- pre-trained
|
13 |
---
|
14 |
|
15 |
+
The Simbolo's Myanmarsar-GPT symbol is trained on a dataset of 100,000 Burmese data and pre-trained using the GPT-2 architecture. Its purpose is to serve as a foundational pre-trained model for the Burmese language, facilitating fine-tuning for specific applications of different tasks such as creative writing, chatbot, machine translation etc.
|
16 |
|
17 |
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6598b82502c4796342239a35/rFId3-xyzWW-juDq_er9k.jpeg)
|
18 |
|
|
|
33 |
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
34 |
```
|
35 |
### Data
|
36 |
+
The [data](https://huggingface.co/datasets/Simbolo-Servicio/wiki-burmese-sentences) utilized comprises 100,000 sentences sourced from Wikipedia.
|
37 |
|
38 |
### Contributors
|
39 |
Main Contributor: [Sa Phyo Thu Htet](https://github.com/SaPhyoThuHtet)
|