YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Here’s a README template for your project, designed to highlight the models used, evaluation methodology, and key results. You can adapt this for Hugging Face or any similar platform.


English-to-Japanese Translation Project

Overview

This project focuses on building a robust system for English-to-Japanese translation using state-of-the-art multilingual models. Two models were used: mT5 as the primary model and mBART as the secondary model. Together, they ensure high-quality translations and versatility in multilingual tasks.


Models Used

1. mT5 (Primary Model)

  • Reason for Selection:

    • mT5 is highly versatile and trained on a broad multilingual dataset, making it suitable for translation and other tasks like summarization or answering questions.
    • It performs well without extensive fine-tuning, saving computational resources.
  • Strengths:

    • Handles translation naturally with minimal training.
    • Can perform additional tasks beyond translation.
  • Limitations:

    • Sometimes lacks precision in detailed translations.

2. mBART (Secondary Model)

  • Reason for Selection:

    • mBART specializes in multilingual translation tasks and provides highly accurate translations when fine-tuned.
  • Strengths:

    • Optimized for translation accuracy, especially for long sentences and contextual consistency.
    • Handles grammatical and contextual errors well.
  • Limitations:

    • Less flexible for tasks like summarization or question answering compared to mT5.

Evaluation Strategy

To evaluate model performance, the following metrics were used:

  1. BLEU Score:

    • Measures how close the model's output is to the correct translation.
    • Chosen because it is a standard for evaluating translation accuracy.
  2. Training Loss:

    • Tracks how well the model is learning during training.
    • A lower loss shows better learning and fewer errors.
  3. Perplexity:

    • Checks the confidence of the model’s predictions.
    • Lower perplexity means fewer mistakes and more fluent translations.

Steps Taken

  1. Fine-tuned both models using a dataset of English-Japanese text pairs to improve translation accuracy.
  2. Tested the models on unseen data to measure their real-world performance.
  3. Applied optimizations like 4-bit quantization to reduce memory usage and make the models faster during evaluation.

Results

  • mT5:

    • Performed well in handling translations and additional tasks like summarization and answering questions.
    • Showed versatility but sometimes lacked detailed accuracy for translations.
  • mBART:

    • Delivered precise and contextually accurate translations, especially for longer sentences.
    • Required fine-tuning but outperformed mT5 in translation-focused tasks.
  • Overall Conclusion:
    mT5 is a flexible model for multilingual tasks, while mBART ensures high-quality translations. Together, they balance versatility and accuracy, making them ideal for English-to-Japanese translations.


How to Use

  1. Load the models from Hugging Face
  2. Fine-tune the models for your dataset using English-Japanese text pairs.
  3. Evaluate performance using BLEU Score, training loss, and perplexity.

Future Work

  • Expand the dataset for better fine-tuning.
  • Explore task-specific fine-tuning for mT5 to improve its translation accuracy.
  • Optimize the models further for deployment in resource-constrained environments.

References


Downloads last month
19
Safetensors
Model size
382M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.