PyTorch
Safetensors
English
bert
File size: 3,222 Bytes
7e0d007
cf7ad74
 
7e0d007
df8e59c
7e0d007
df8e59c
 
 
 
7e0d007
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ea65e6
7e0d007
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ea65e6
7e0d007
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ea65e6
7e0d007
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
base_model:
- manueldeprada/FactCC
datasets:
- divyapatel4/Microsoft-PeNS
language:
- en
license: apache-2.0
base_model_relatin:
- finetune
---

# FactCC model for PENS dataset

**The model has been fine-tuned on the PENS dataset to better adapt to the task of factuality assessment for news headlines.**


Original paper: [Evaluating the Factual Consistency of Abstractive Text Summarization](https://arxiv.org/abs/1910.12840)

PENS paper: [PENS: A Dataset and Generic Framework for Personalized News Headline Generation](https://aclanthology.org/2021.acl-long.7)

Related paper: [Fact-Preserved Personalized News Headline Generation](https://ieeexplore.ieee.org/abstract/document/10415680)



Example on how to calculate the FactCC score :
```python
from transformers import BertForSequenceClassification, BertTokenizer
model_path = 'THEATLAS/FactCC-PENS'

tokenizer = BertTokenizer.from_pretrained(model_path)
model = BertForSequenceClassification.from_pretrained(model_path)

text='''The US has "passed the peak" on new coronavirus cases, the White House reported. They predict that some states would reopen this month.
The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world.'''
wrong_summary = '''The pandemic has almost not affected the US'''

input_dict = tokenizer(text, wrong_summary, max_length=512, padding='max_length', truncation='only_first', return_tensors='pt')
logits = model(**input_dict).logits

probs = torch.nn.functional.softmax(logits, dim=1)
fact_scores = probs[0][0].item()

print(f"fact_scores: {fact_scores}")
```


---

**The following introduction is copied from the manueldeprada/FactCC repository.**



This is a more modern implementation of the model and code from [the original github repo](https://github.com/salesforce/factCC)

This model is trained to predict whether a summary is factual with respect to the original text. Basic usage:
```python
from transformers import BertForSequenceClassification, BertTokenizer
model_path = 'THEATLAS/FactCC-PENS'

tokenizer = BertTokenizer.from_pretrained(model_path)
model = BertForSequenceClassification.from_pretrained(model_path)

text='''The US has "passed the peak" on new coronavirus cases, the White House reported. They predict that some states would reopen this month.
The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world.'''
wrong_summary = '''The pandemic has almost not affected the US'''

input_dict = tokenizer(text, wrong_summary, max_length=512, padding='max_length', truncation='only_first', return_tensors='pt')
logits = model(**input_dict).logits
pred = logits.argmax(dim=1)
model.config.id2label[pred.item()] # prints: INCORRECT
```

It can also be used with a pipeline. Beware that since pipelines are not thought to be used with pair of sentences, and you have to use this double-list hack:
```bash
>>> from transformers import pipeline

>>> pipe=pipeline(model="THEATLAS/FactCC-PENS")
>>> pipe([[[text1,summary1]],[[text2,summary2]]],truncation='only_first',padding='max_length')
# output [{'label': 'INCORRECT', 'score': 0.9979124665260315}, {'label': 'CORRECT', 'score': 0.879124665260315}]
```