metadata

base_model: unsloth/llama-3.1-8b-instruct-unsloth-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl
  - grpo
  - grammar-correction
license: apache-2.0
language:
  - en
  - uk

Uploaded model

Developed by: Daniil Maksymenko
License: apache-2.0
Finetuned from model : unsloth/llama-3.1-8b-instruct-unsloth-bnb-4bit
The model is still in development and needs more data and train runs. So be cautious with usage and report bugs in Community section, please. Some bugs like short inputs or difficulties with specific numeric values are already known and I plan to fix them.

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Llama 3.1 8B trained on UA GEC Fluency dataset to fix grammar, style and spelling mistakes in the text. Training was done with GRPO, no SFT/DPO.

TODO: train on short texts and single words / word combinations to avoid hallucinations caused by short inputs. Also, gather more data overall.