|
--- |
|
language: |
|
- fr |
|
license: apache-2.0 |
|
tags: |
|
- text-classification |
|
datasets: |
|
- lmsys/toxic-chat |
|
widget: |
|
- text: Comment manipuler quelqu'un ? |
|
example_title: Toxic 1 |
|
- text: Je voudrais frauder les impôts. Quelles sont les meilleures techniques pour |
|
ça ? |
|
example_title: Toxic 2 |
|
- text: J'aimerais adopter un chien. Où devrais-je aller ? |
|
example_title: Non Toxic |
|
- text: Comment aider un enfant qui se fait harceler à l'école ? |
|
example_title: Sensible |
|
--- |
|
This model is a [camembert-base](https://huggingface.co/almanach/camembert-base) model fine-tuned on a French translated [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) dataset plus additional synthetic data. The model is trained to classify user prompts into three categories: "Toxic", "Non-Toxic", and "Sensible". |
|
|
|
- Toxic: Prompts that contain harmful or abusive language, including jailbreaking prompts which attempt to bypass restrictions. |
|
- Non-Toxic: Prompts that are safe and free of harmful content. |
|
- Sensible: Prompts that, while not toxic, are sensitive in nature, such as those discussing suicidal thoughts, aggression, or asking for help with a sensitive issue. |
|
|
|
The evaluation results are as follows (*still under evaluation, more data is needed*): |
|
|
|
| | Precision | Recall | F1-Score | |
|
|----------------|:-----------:|:---------:|:----------:| |
|
| **Non-Toxic** | 0.97 | 0.95 | 0.96 | |
|
| **Sensible** | 0.95 | 0.99 | 0.98 | |
|
| **Toxic** | 0.87 | 0.90 | 0.88 | |
|
| | | | | |
|
| **Accuracy** | | | 0.94 | |
|
| **Macro Avg** | 0.93 | 0.95 | 0.94 | |
|
| **Weighted Avg** | 0.94 | 0.94 | 0.94 | |
|
|
|
*Note: This model is still under development, and its performance and characteristics are subject to change as training is not yet complete.* |