Spaces:
Runtime error
Runtime error
Manuel de Prada
commited on
Commit
·
21d7631
1
Parent(s):
80dcff0
beer metric
Browse files
README.md
CHANGED
@@ -11,11 +11,7 @@ tags:
|
|
11 |
- evaluate
|
12 |
- metric
|
13 |
description: >-
|
14 |
-
BEER 2.0 (BEtter Evaluation as Ranking) is a trained machine translation evaluation metric with high correlation with human judgment both on sentence and corpus level. It is a linear model-based metric for sentence-level evaluation in machine translation (MT) that combines 33 relatively dense features, including character n-grams and reordering features.
|
15 |
-
It employs a learning-to-rank framework to differentiate between function and non-function words and weighs each word type according to its importance for evaluation.
|
16 |
-
The model is trained on ranking similar translations using a vector of feature values for each system output.
|
17 |
-
BEER outperforms the strong baseline metric METEOR in five out of eight language pairs, showing that less sparse features at the sentence level can lead to state-of-the-art results.
|
18 |
-
Features on character n-grams are crucial, and higher-order character n-grams are less prone to sparse counts than word n-grams.
|
19 |
---
|
20 |
|
21 |
# Metric Card for BEER
|
|
|
11 |
- evaluate
|
12 |
- metric
|
13 |
description: >-
|
14 |
+
BEER 2.0 (BEtter Evaluation as Ranking) is a trained machine translation evaluation metric with high correlation with human judgment both on sentence and corpus level. It is a linear model-based metric for sentence-level evaluation in machine translation (MT) that combines 33 relatively dense features, including character n-grams and reordering features. It employs a learning-to-rank framework to differentiate between function and non-function words and weighs each word type according to its importance for evaluation. The model is trained on ranking similar translations using a vector of feature values for each system output. BEER outperforms the strong baseline metric METEOR in five out of eight language pairs, showing that less sparse features at the sentence level can lead to state-of-the-art results. Features on character n-grams are crucial, and higher-order character n-grams are less prone to sparse counts than word n-grams.
|
|
|
|
|
|
|
|
|
15 |
---
|
16 |
|
17 |
# Metric Card for BEER
|