Update README.md
Browse files
README.md
CHANGED
@@ -3,8 +3,6 @@ language:
|
|
3 |
- en
|
4 |
tags:
|
5 |
- toxic comments classification
|
6 |
-
licenses:
|
7 |
-
- cc-by-nc-sa
|
8 |
license: openrail++
|
9 |
base_model:
|
10 |
- FacebookAI/roberta-large
|
@@ -12,6 +10,10 @@ datasets:
|
|
12 |
- google/jigsaw_toxicity_pred
|
13 |
---
|
14 |
|
|
|
|
|
|
|
|
|
15 |
## Toxicity Classification Model
|
16 |
|
17 |
This model is trained for toxicity classification task. The dataset used for training is the merge of the English parts of the three datasets by **Jigsaw** ([Jigsaw 2018](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge), [Jigsaw 2019](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), [Jigsaw 2020](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification)), containing around 2 million examples. We split it into two parts and fine-tune a RoBERTa model ([RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)) on it. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the **AUC-ROC** of 0.98 and **F1-score** of 0.76.
|
|
|
3 |
- en
|
4 |
tags:
|
5 |
- toxic comments classification
|
|
|
|
|
6 |
license: openrail++
|
7 |
base_model:
|
8 |
- FacebookAI/roberta-large
|
|
|
10 |
- google/jigsaw_toxicity_pred
|
11 |
---
|
12 |
|
13 |
+
## Provenance
|
14 |
+
|
15 |
+
garak-llm backup of https://huggingface.co/s-nlp/roberta_toxicity_classifier
|
16 |
+
|
17 |
## Toxicity Classification Model
|
18 |
|
19 |
This model is trained for toxicity classification task. The dataset used for training is the merge of the English parts of the three datasets by **Jigsaw** ([Jigsaw 2018](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge), [Jigsaw 2019](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), [Jigsaw 2020](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification)), containing around 2 million examples. We split it into two parts and fine-tune a RoBERTa model ([RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)) on it. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the **AUC-ROC** of 0.98 and **F1-score** of 0.76.
|