KomeijiForce commited on
Commit
df8dc88
Β·
1 Parent(s): ad00394

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -48
README.md CHANGED
@@ -1,51 +1,60 @@
1
  ---
2
- tags:
3
- - generated_from_trainer
4
- model-index:
5
- - name: bart-base-emolm-translate-rev
6
- results: []
 
 
7
  ---
8
 
9
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
- should probably proofread and complete it, then remove this comment. -->
11
-
12
- # bart-base-emolm-translate-rev
13
-
14
- This model is a fine-tuned version of [./saved_models/bart-base](https://huggingface.co/./saved_models/bart-base) on an unknown dataset.
15
-
16
- ## Model description
17
-
18
- More information needed
19
-
20
- ## Intended uses & limitations
21
-
22
- More information needed
23
-
24
- ## Training and evaluation data
25
-
26
- More information needed
27
-
28
- ## Training procedure
29
-
30
- ### Training hyperparameters
31
-
32
- The following hyperparameters were used during training:
33
- - learning_rate: 3e-05
34
- - train_batch_size: 16
35
- - eval_batch_size: 64
36
- - seed: 42
37
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
38
- - lr_scheduler_type: linear
39
- - lr_scheduler_warmup_steps: 2000
40
- - num_epochs: 2.0
41
-
42
- ### Training results
43
-
44
-
45
-
46
- ### Framework versions
47
-
48
- - Transformers 4.29.2
49
- - Pytorch 2.0.0+cu117
50
- - Datasets 2.12.0
51
- - Tokenizers 0.12.1
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - KomeijiForce/Text2Emoji
4
+ language:
5
+ - en
6
+ metrics:
7
+ - bertscore
8
+ pipeline_tag: text2text-generation
9
  ---
10
 
11
+ # EmojiLM
12
+
13
+ This is a [BART](https://huggingface.co/facebook/bart-base) model pre-trained on the [Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji) dataset to translate emojis into texts.
14
+
15
+ For instance, "πŸ•πŸ˜" will be translated into "I love pizza".
16
+
17
+ An example implementation for translation:
18
+
19
+ ```python
20
+ from transformers import BartTokenizer, BartForConditionalGeneration
21
+
22
+ def translate(sentence, **argv):
23
+ inputs = tokenizer(sentence, return_tensors="pt")
24
+ generated_ids = generator.generate(inputs["input_ids"], **argv)
25
+ decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
26
+ return decoded
27
+
28
+ path = "KomeijiForce/bart-base-emojilm-e2t"
29
+ tokenizer = BartTokenizer.from_pretrained(path)
30
+ generator = BartForConditionalGeneration.from_pretrained(path)
31
+
32
+ sentence = "πŸ£πŸ±πŸ˜‹"
33
+ decoded = translate(sentence, num_beams=4, do_sample=True, max_length=100)
34
+ print(decoded)
35
+ ```
36
+
37
+ You will probably get some output like "Sushi is my go-to comfort food."
38
+
39
+ If you find this model & dataset resource useful, please consider cite our paper:
40
+
41
+ ```
42
+ @article{DBLP:journals/corr/abs-2311-01751,
43
+ author = {Letian Peng and
44
+ Zilong Wang and
45
+ Hang Liu and
46
+ Zihan Wang and
47
+ Jingbo Shang},
48
+ title = {EmojiLM: Modeling the New Emoji Language},
49
+ journal = {CoRR},
50
+ volume = {abs/2311.01751},
51
+ year = {2023},
52
+ url = {https://doi.org/10.48550/arXiv.2311.01751},
53
+ doi = {10.48550/ARXIV.2311.01751},
54
+ eprinttype = {arXiv},
55
+ eprint = {2311.01751},
56
+ timestamp = {Tue, 07 Nov 2023 18:17:14 +0100},
57
+ biburl = {https://dblp.org/rec/journals/corr/abs-2311-01751.bib},
58
+ bibsource = {dblp computer science bibliography, https://dblp.org}
59
+ }
60
+ ```