renix-codex commited on
Commit
d00a20a
·
verified ·
1 Parent(s): 63c78c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -35
README.md CHANGED
@@ -1,36 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
- # Formal Language T5 Model
3
-
4
- This model is fine-tuned from T5-base for formal language correction.
5
-
6
- ## Model Details
7
- - Base Model: T5-base
8
- - Training Dataset: Grammarly/COEDIT
9
- - Version: v1.0.0
10
- - Training Time: 0:59:08.871110
11
- - Final Loss: 0.0814
12
-
13
- ## Usage
14
- ```python
15
- from transformers import AutoModelForSeq2SeqGeneration, AutoTokenizer
16
-
17
- model = AutoModelForSeq2SeqGeneration.from_pretrained("renix-codex/formal-lang-rxcx-model")
18
- tokenizer = AutoTokenizer.from_pretrained("renix-codex/formal-lang-rxcx-model")
19
-
20
- text = "make formal: your informal text here"
21
- inputs = tokenizer(text, return_tensors="pt")
22
- outputs = model.generate(**inputs)
23
- formal_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
24
- ```
25
-
26
- ## Training Configuration
27
- - Batch Size: 2
28
- - Gradient Accumulation: 16
29
- - Learning Rate: 3e-05
30
- - Sequence Length: 128
31
- - Training Examples: 69071
32
-
33
- ## Performance
34
- Average Loss: 0.0814
35
- Training Time: 0:59:08.871110
36
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: transformers
5
+ pipeline_tag: text2text-generation
6
+ tags:
7
+ - text-generation
8
+ - formal-language
9
+ - grammar-correction
10
+ - t5
11
+ - english
12
+ - text-formalization
13
 
14
+ model-index:
15
+ - name: formal-lang-rxcx-model
16
+ results:
17
+ - task:
18
+ type: text2text-generation
19
+ name: formal language correction
20
+ metrics:
21
+ - type: loss
22
+ value: 2.1 # Replace with your actual training loss
23
+ name: training_loss
24
+ - type: rouge1
25
+ value: 0.85 # Replace with your actual ROUGE score
26
+ name: rouge1
27
+ - type: accuracy
28
+ value: 0.82 # Replace with your actual accuracy
29
+ name: accuracy
30
+ dataset:
31
+ name: grammarly/coedit
32
+ type: grammarly/coedit
33
+ split: train
34
+
35
+ datasets:
36
+ - grammarly/coedit
37
+
38
+ model-type: t5-base
39
+ inference: true
40
+ base_model: t5-base
41
+
42
+ widget:
43
+ - text: "make formal: hey whats up"
44
+ - text: "make formal: gonna be late for meeting"
45
+ - text: "make formal: this is kinda cool project"
46
+
47
+ extra_gated_prompt: This is a fine-tuned T5 model for converting informal text to formal language.
48
+
49
+ extra_gated_fields:
50
+ Company/Institution: text
51
+ Purpose: text
52
+
53
+ ---
54
+
55
+ # Formal Language T5 Model
56
+
57
+ This model is fine-tuned from T5-base for formal language correction and text formalization.
58
+
59
+ ## Model Description
60
+
61
+ - **Model Type:** T5-base fine-tuned
62
+ - **Language:** English
63
+ - **Task:** Text Formalization and Grammar Correction
64
+ - **License:** Apache 2.0
65
+ - **Base Model:** t5-base
66
+
67
+ ## Intended Uses & Limitations
68
+
69
+ ### Intended Uses
70
+ - Converting informal text to formal language
71
+ - Improving text professionalism
72
+ - Grammar correction
73
+ - Business communication enhancement
74
+ - Academic writing improvement
75
+
76
+ ### Limitations
77
+ - Works best with English text
78
+ - Maximum input length: 128 tokens
79
+ - May not preserve specific domain terminology
80
+ - Best suited for business and academic contexts
81
+
82
+ ## Usage
83
+
84
+ ```python
85
+ from transformers import AutoModelForSeq2SeqGeneration, AutoTokenizer
86
+
87
+ model = AutoModelForSeq2SeqGeneration.from_pretrained("renix-codex/formal-lang-rxcx-model")
88
+ tokenizer = AutoTokenizer.from_pretrained("renix-codex/formal-lang-rxcx-model")
89
+
90
+ # Example usage
91
+ text = "make formal: hey whats up"
92
+ inputs = tokenizer(text, return_tensors="pt")
93
+ outputs = model.generate(**inputs)
94
+ formal_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
95
+ ```
96
+
97
+ ## Example Inputs and Outputs
98
+
99
+ | Informal Input | Formal Output |
100
+ |----------------|---------------|
101
+ | "hey whats up" | "Hello, how are you?" |
102
+ | "gonna be late for meeting" | "I will be late for the meeting." |
103
+ | "this is kinda cool" | "This is quite impressive." |
104
+
105
+ ## Training
106
+
107
+ The model was trained on the Grammarly/COEDIT dataset with the following specifications:
108
+ - Base Model: T5-base
109
+ - Training Hardware: A100 GPU
110
+ - Sequence Length: 128 tokens
111
+ - Input Format: "make formal: [informal text]"
112
+
113
+ ## License
114
+
115
+ Apache License 2.0
116
+
117
+ ## Citation
118
+
119
+ ```bibtex
120
+ @misc{formal-lang-rxcx-model,
121
+ author = {renix-codex},
122
+ title = {Formal Language T5 Model},
123
+ year = {2024},
124
+ publisher = {HuggingFace},
125
+ journal = {HuggingFace Model Hub},
126
+ url = {https://huggingface.co/renix-codex/formal-lang-rxcx-model}
127
+ }
128
+ ```
129
+
130
+ ## Developer
131
+
132
+ Model developed by renix-codex
133
+
134
+ ## Ethical Considerations
135
+
136
+ This model is intended to assist in formal writing while maintaining the original meaning of the text. Users should be aware that:
137
+ - The model may alter the tone of personal or culturally specific expressions
138
+ - It should be used as a writing aid rather than a replacement for human judgment
139
+ - The output should be reviewed for accuracy and appropriateness
140
+
141
+ ## Updates and Versions
142
+
143
+ Initial Release - February 2024
144
+ - Base implementation with T5-base
145
+ - Trained on Grammarly/COEDIT dataset
146
+ - Optimized for formal language conversion