File size: 1,259 Bytes
a1b22a6 32b2651 03fdcb9 a3d9828 03fdcb9 32b2651 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
license: apache-2.0
---
model base: https://huggingface.co/google-bert/bert-base-uncased
dataset: https://github.com/ramybaly/Article-Bias-Prediction
training parameters:
- batch_size: 100
- epochs: 5
- dropout: 0.05
- max_length: 512
- learning_rate: 3e-5
- warmup_steps: 100
- random_state: 239
training methodology:
- sanitize dataset following specific rule-set, utilize random split as provided in the dataset
- train on train split and evaluate on validation split in each epoch
- evaluate test split only on the model that performed best on validation loss
result summary:
- throughout the five training epochs, model of second epoch achieved the lowest validation loss of 0.3314
- on test split second epoch model achieved f1 score of 0.9041
usage:
```
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
def main(repository: str):
model = AutoModelForSequenceClassification.from_pretrained(repository)
tokenizer = AutoTokenizer.from_pretrained(repository)
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(nlp("the masses are controlled by media."))
if __name__ == "__main__":
main(repository="premsa/political-bias-prediction-allsides-BERT")
```
|