Cyrile commited on
Commit
3daa517
·
verified ·
1 Parent(s): 1e1f07c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -9
README.md CHANGED
@@ -9,36 +9,68 @@ pipeline_tag: zero-shot-classification
9
  ---
10
 
11
  # Presentation
12
- We introduce the Bloomz-3b-NLI model, fine-tuned on the [Bloomz-3b-chat-dpo](https://huggingface.co/cmarkea/bloomz-3b-dpo-chat) foundation model.
13
  This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship
14
  between a hypothesis and a set of premises, often expressed as pairs of sentences.
15
 
16
- The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of
17
  three labels).
18
- Sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of modelization is determined as follows:
19
  $$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
20
 
21
  ### Language-agnostic approach
22
  It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.
23
 
24
- ### Detaset
25
-
26
  ### Performance
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  # Zero-shot Classification
29
- The primary appeal of training such models lies in their zero-shot classification performance. This means the model is capable of classifying any text with any label
30
- without specific training. What sets the Bloomz-3b-NLI LLMs apart in this realm is their ability to model and extract information from significantly more complex
31
  and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT.
32
 
33
  The zero-shot classification task can be summarized by:
34
  $$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
35
  With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set
36
- of hypotheses comprises {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which
37
  is the sentence we aim to classify.
38
 
39
  ### Performance
40
 
41
- # How to use Bloomz-560m-NLI
 
 
 
42
 
43
  ```python
44
  from transformers import pipeline
 
9
  ---
10
 
11
  # Presentation
12
+ We introduce the Bloomz-3b-NLI model, fine-tuned from the [Bloomz-3b-chat-dpo](https://huggingface.co/cmarkea/bloomz-3b-dpo-chat) foundation model.
13
  This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship
14
  between a hypothesis and a set of premises, often expressed as pairs of sentences.
15
 
16
+ The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of the
17
  three labels).
18
+ If sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of the modelization is to estimate the following:
19
  $$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
20
 
21
  ### Language-agnostic approach
22
  It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.
23
 
 
 
24
  ### Performance
25
 
26
+ | **class** | **precision (%)** | **f1-score (%)** | **support** |
27
+ | :----------------: | :---------------: | :--------------: | :---------: |
28
+ | **global** | 81.96 | 81.07 | 5,010 |
29
+ | **contradiction** | 81.80 | 84.04 | 1,670 |
30
+ | **entailment** | 84.82 | 81.96 | 1,670 |
31
+ | **neutral** | 76.85 | 77.20 | 1,670 |
32
+
33
+ ### Benchmark
34
+
35
+ Here are the performances for both the hypothesis and premise in French:
36
+
37
+ | **model** | **accuracy (%)** | **MCC (x100)** |
38
+ | :--------------: | :--------------: | :------------: |
39
+ | [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 77.45 | 66.24 |
40
+ | [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 81.72 | 72.67 |
41
+ | [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 83.43 | 75.15 |
42
+ | [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.70 | 53.57 |
43
+ | [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 81.08 | 71.66 |
44
+ | [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 83.13 | 74.89 |
45
+
46
+ And now the hypothesis in French and the premise in English (cross-language context):
47
+
48
+ | **model** | **accuracy (%)** | **MCC (x100)** |
49
+ | :--------------: | :--------------: | :------------: |
50
+ | [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 16.89 | -26.82 |
51
+ | [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 74.59 | 61.97 |
52
+ | [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 85.15 | 77.74 |
53
+ | [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.84 | 53.55 |
54
+ | [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 82.12 | 73.22 |
55
+ | [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 85.43 | 78.25 |
56
+
57
  # Zero-shot Classification
58
+ The primary interest of training such models lies in their zero-shot classification performance. This means that the model is able to classify any text with any label
59
+ without a specific training. What sets the Bloomz-3b-NLI LLMs apart in this domain is their ability to model and extract information from significantly more complex
60
  and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT.
61
 
62
  The zero-shot classification task can be summarized by:
63
  $$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
64
  With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set
65
+ of hypotheses is composed of {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which
66
  is the sentence we aim to classify.
67
 
68
  ### Performance
69
 
70
+ The model is evaluated based on sentiment analysis evaluation on the French film review site [Allociné](https://huggingface.co/datasets/allocine). The dataset is labeled
71
+ into 2 classes, positive comments and negative comments. We then use the hypothesis template "Ce commentaire est {}. and the candidate classes "positif" and "negatif".
72
+
73
+ # How to use Bloomz-3b-NLI
74
 
75
  ```python
76
  from transformers import pipeline