Spaces:
Runtime error
Runtime error
File size: 4,141 Bytes
ab766e1 0f80edd 80dcff0 ab766e1 0f80edd ab766e1 80dcff0 21d7631 ab766e1 0f80edd 80dcff0 0f80edd 80dcff0 0f80edd 80dcff0 0f80edd 80dcff0 0f80edd 80dcff0 0f80edd 80dcff0 0f80edd 80dcff0 0f80edd 80dcff0 0f80edd 80dcff0 0f80edd 80dcff0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
title: BEER
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
BEER 2.0 (BEtter Evaluation as Ranking) is a trained machine translation evaluation metric with high correlation with human judgment both on sentence and corpus level. It is a linear model-based metric for sentence-level evaluation in machine translation (MT) that combines 33 relatively dense features, including character n-grams and reordering features. It employs a learning-to-rank framework to differentiate between function and non-function words and weighs each word type according to its importance for evaluation. The model is trained on ranking similar translations using a vector of feature values for each system output. BEER outperforms the strong baseline metric METEOR in five out of eight language pairs, showing that less sparse features at the sentence level can lead to state-of-the-art results. Features on character n-grams are crucial, and higher-order character n-grams are less prone to sparse counts than word n-grams.
---
# Metric Card for BEER
## Metric description
BEER 2.0 (BEtter Evaluation as Ranking) is a trained machine translation evaluation metric with high correlation with human judgment both on sentence and corpus level. It is a linear model-based metric for sentence-level evaluation in machine translation (MT) that combines 33 relatively dense features, including character n-grams and reordering features.
It employs a learning-to-rank framework to differentiate between function and non-function words and weighs each word type according to its importance for evaluation.
The model is trained on ranking similar translations using a vector of feature values for each system output.
BEER outperforms the strong baseline metric METEOR in five out of eight language pairs, showing that less sparse features at the sentence level can lead to state-of-the-art results.
Features on character n-grams are crucial, and higher-order character n-grams are less prone to sparse counts than word n-grams.
## How to use
BEER has two mandatory arguments:
`predictions`: a `list` of predictions to score. Each prediction should be a string with tokens separated by spaces.
`references`: a `list` of references (multiple `references` per `prediction` are not allowed). Each reference should be a string with tokens separated by spaces.
## Prerequisites
This module downloads and executes the original authors' BEER package. You must have Java installed to run it, and it will fail to load otherwise.
Since it is not Python code and calls the BEER executable, it is much faster to pass a batch of predicitions and references to evaluate in a single call than to iteratively call the metric with one prediction and reference at a time.
```python
>>> meteor = evaluate.load('beer')
>>> predictions = ["It is a guide to action which ensures that the military always obeys the commands of the party", "hello there general kenobi"]
>>> references = ["It is a guide to action that ensures that the military will forever heed Party commands", "hello general kenobi"]
>>> results = meteor.compute(predictions=predictions, references=references)
```
## Output values
The metric outputs a dictionary containing the BEER score and the scores of the individual sentences:
```
{'beer': 0.4557488704361114,
'beer_scores': [0.6380935618609037, 0.7291530494474304]}
```
## Citation
```bibtex
@inproceedings{stanojevic-simaan-2014-fitting,
title = "Fitting Sentence Level Translation Evaluation with Many Dense Features",
author = "Stanojevi{\'c}, Milo{\v{s}} and
Sima{'}an, Khalil",
booktitle = "Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP})",
month = oct,
year = "2014",
address = "Doha, Qatar",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D14-1025",
doi = "10.3115/v1/D14-1025",
pages = "202--206",
}
```
## Further References
- [BEER -- Official GitHub](https://github.com/stanojevic/beer)
|