PyTorch
English
roberta
toxic comments classification
leondz commited on
Commit
0bbf371
·
verified ·
1 Parent(s): 64f2796

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -3,8 +3,6 @@ language:
3
  - en
4
  tags:
5
  - toxic comments classification
6
- licenses:
7
- - cc-by-nc-sa
8
  license: openrail++
9
  base_model:
10
  - FacebookAI/roberta-large
@@ -12,6 +10,10 @@ datasets:
12
  - google/jigsaw_toxicity_pred
13
  ---
14
 
 
 
 
 
15
  ## Toxicity Classification Model
16
 
17
  This model is trained for toxicity classification task. The dataset used for training is the merge of the English parts of the three datasets by **Jigsaw** ([Jigsaw 2018](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge), [Jigsaw 2019](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), [Jigsaw 2020](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification)), containing around 2 million examples. We split it into two parts and fine-tune a RoBERTa model ([RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)) on it. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the **AUC-ROC** of 0.98 and **F1-score** of 0.76.
 
3
  - en
4
  tags:
5
  - toxic comments classification
 
 
6
  license: openrail++
7
  base_model:
8
  - FacebookAI/roberta-large
 
10
  - google/jigsaw_toxicity_pred
11
  ---
12
 
13
+ ## Provenance
14
+
15
+ garak-llm backup of https://huggingface.co/s-nlp/roberta_toxicity_classifier
16
+
17
  ## Toxicity Classification Model
18
 
19
  This model is trained for toxicity classification task. The dataset used for training is the merge of the English parts of the three datasets by **Jigsaw** ([Jigsaw 2018](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge), [Jigsaw 2019](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), [Jigsaw 2020](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification)), containing around 2 million examples. We split it into two parts and fine-tune a RoBERTa model ([RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)) on it. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the **AUC-ROC** of 0.98 and **F1-score** of 0.76.