Here is the draft for the README.md
file for the McGill-DMaS/DMaS-LLaMa-Lite-step-23.9k model card on Huggingface:
DMaS-LLaMa-Lite-step-23.9k
This repository provides access to DMaS-LLaMa-Lite-step-23.9k, a 1.7-billion-parameter language model based on the LLaMa architecture. The model has been trained from scratch as part of the DMaS-LLaMa-Lite project using approximately 20 billion tokens of high-quality educational content.
Model Overview
- Architecture: LLaMa-based
- Parameters: 1.7B (36 layers, 32 attention heads, RMSNorm)
- Tokenizer: GPT-2 tokenizer
- Training Data: FineWeb-Edu subset (educational text)
- Training Steps: 23,900
- Optimizer: AdamW with linear warmup and decay
- Hardware: Trained on 1-2 RTX A6000 GPUs with PyTorch DDP
- Dataset Source: FineWeb-Edu Dataset
The training process emphasizes qualitative improvements in coherence, fluency, and factual grounding, demonstrating competitive results even with fewer tokens compared to larger-scale models.
This checkpoint represents the model's state at 23,900 training steps. Validation loss and downstream performance benchmarks demonstrate notable early improvements in text fluency and alignment with prompts.
Training Code
The training script, including configurations and instructions, is open-sourced and available here:
📄 DMaS-LLaMa-Lite Training Code
Usage
You can load the model with Hugging Face Transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "McGill-DMaS/DMaS-LLaMa-Lite-step-23.9k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("The Pyramids of Giza in Egypt are some of the oldest man-made structures in the world.", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
If you use this model or its training insights in your work, please cite the following paper:
@article{li2024effectiveness,
title={Experience of Training a 1.7B-Parameter LLaMa Model From Scratch},
author={Li, Miles Q and Fung, Benjamin and Huang, Shih-Chia},
journal={arXiv preprint arXiv:2412.13335},
year={2024}
}
License
This model and code are released under the Apache License 2.0. Please check the respective repositories for detailed terms.
- Downloads last month
- 2