Update README.md
Browse files
README.md
CHANGED
@@ -9,36 +9,68 @@ pipeline_tag: zero-shot-classification
|
|
9 |
---
|
10 |
|
11 |
# Presentation
|
12 |
-
We introduce the Bloomz-3b-NLI model, fine-tuned
|
13 |
This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship
|
14 |
between a hypothesis and a set of premises, often expressed as pairs of sentences.
|
15 |
|
16 |
-
The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of
|
17 |
three labels).
|
18 |
-
|
19 |
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
|
20 |
|
21 |
### Language-agnostic approach
|
22 |
It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.
|
23 |
|
24 |
-
### Detaset
|
25 |
-
|
26 |
### Performance
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
# Zero-shot Classification
|
29 |
-
The primary
|
30 |
-
without specific training. What sets the Bloomz-3b-NLI LLMs apart in this
|
31 |
and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT.
|
32 |
|
33 |
The zero-shot classification task can be summarized by:
|
34 |
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
|
35 |
With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set
|
36 |
-
of hypotheses
|
37 |
is the sentence we aim to classify.
|
38 |
|
39 |
### Performance
|
40 |
|
41 |
-
|
|
|
|
|
|
|
42 |
|
43 |
```python
|
44 |
from transformers import pipeline
|
|
|
9 |
---
|
10 |
|
11 |
# Presentation
|
12 |
+
We introduce the Bloomz-3b-NLI model, fine-tuned from the [Bloomz-3b-chat-dpo](https://huggingface.co/cmarkea/bloomz-3b-dpo-chat) foundation model.
|
13 |
This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship
|
14 |
between a hypothesis and a set of premises, often expressed as pairs of sentences.
|
15 |
|
16 |
+
The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of the
|
17 |
three labels).
|
18 |
+
If sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of the modelization is to estimate the following:
|
19 |
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
|
20 |
|
21 |
### Language-agnostic approach
|
22 |
It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.
|
23 |
|
|
|
|
|
24 |
### Performance
|
25 |
|
26 |
+
| **class** | **precision (%)** | **f1-score (%)** | **support** |
|
27 |
+
| :----------------: | :---------------: | :--------------: | :---------: |
|
28 |
+
| **global** | 81.96 | 81.07 | 5,010 |
|
29 |
+
| **contradiction** | 81.80 | 84.04 | 1,670 |
|
30 |
+
| **entailment** | 84.82 | 81.96 | 1,670 |
|
31 |
+
| **neutral** | 76.85 | 77.20 | 1,670 |
|
32 |
+
|
33 |
+
### Benchmark
|
34 |
+
|
35 |
+
Here are the performances for both the hypothesis and premise in French:
|
36 |
+
|
37 |
+
| **model** | **accuracy (%)** | **MCC (x100)** |
|
38 |
+
| :--------------: | :--------------: | :------------: |
|
39 |
+
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 77.45 | 66.24 |
|
40 |
+
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 81.72 | 72.67 |
|
41 |
+
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 83.43 | 75.15 |
|
42 |
+
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.70 | 53.57 |
|
43 |
+
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 81.08 | 71.66 |
|
44 |
+
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 83.13 | 74.89 |
|
45 |
+
|
46 |
+
And now the hypothesis in French and the premise in English (cross-language context):
|
47 |
+
|
48 |
+
| **model** | **accuracy (%)** | **MCC (x100)** |
|
49 |
+
| :--------------: | :--------------: | :------------: |
|
50 |
+
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 16.89 | -26.82 |
|
51 |
+
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 74.59 | 61.97 |
|
52 |
+
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 85.15 | 77.74 |
|
53 |
+
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.84 | 53.55 |
|
54 |
+
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 82.12 | 73.22 |
|
55 |
+
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 85.43 | 78.25 |
|
56 |
+
|
57 |
# Zero-shot Classification
|
58 |
+
The primary interest of training such models lies in their zero-shot classification performance. This means that the model is able to classify any text with any label
|
59 |
+
without a specific training. What sets the Bloomz-3b-NLI LLMs apart in this domain is their ability to model and extract information from significantly more complex
|
60 |
and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT.
|
61 |
|
62 |
The zero-shot classification task can be summarized by:
|
63 |
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
|
64 |
With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set
|
65 |
+
of hypotheses is composed of {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which
|
66 |
is the sentence we aim to classify.
|
67 |
|
68 |
### Performance
|
69 |
|
70 |
+
The model is evaluated based on sentiment analysis evaluation on the French film review site [Allociné](https://huggingface.co/datasets/allocine). The dataset is labeled
|
71 |
+
into 2 classes, positive comments and negative comments. We then use the hypothesis template "Ce commentaire est {}. and the candidate classes "positif" and "negatif".
|
72 |
+
|
73 |
+
# How to use Bloomz-3b-NLI
|
74 |
|
75 |
```python
|
76 |
from transformers import pipeline
|