Llama-3.1-5B-Instruct

Llama-3.1 is a collection of multilingual large language models (LLMs) that includes pretrained and instruction-tuned generative models in various sizes. The Llama-3.1-5B-Instruct model is part of the series optimized for multilingual dialogue use cases, offering powerful conversational abilities and outperforming many open-source and closed chat models on key industry benchmarks.

Model Overview

  • Size: 5B parameters
  • Model Architecture: Llama-3.1 is an auto-regressive language model using an optimized transformer architecture.
  • Training: The model is fine-tuned using Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align with human preferences, ensuring helpfulness, safety, and natural conversations.

The Llama-3.1-5B-Instruct model is optimized for multilingual text generation and excels in a variety of dialog-based use cases. It is designed to handle a wide array of tasks, including question answering, translation, and instruction following.

How to Use

Requirements

  • Install the latest version of Transformers:

    pip install --upgrade transformers
    
  • Ensure you have PyTorch installed with support for bfloat16:

    pip install torch
    

Example Code

Below is an example of how to use the Llama-3.1-5B-Instruct model for conversational inference:

import transformers
import torch

# Define the model ID
model_id = "prithivMLmods/Llama-3.1-5B-Instruct"

# Set up the pipeline for text generation
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",  # Use the best device available
)

# Define conversation messages
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

# Generate a response
outputs = pipeline(
    messages,
    max_new_tokens=256,
)

# Print the generated response
print(outputs[0]["generated_text"][-1])

Model Details

  • Model Type: Instruction-Tuned Large Language Model (LLM)
  • Training: Trained using supervised fine-tuning and reinforcement learning with human feedback.
  • Supported Tasks: Dialogue generation, question answering, translation, and other text-based tasks.

Performance

The Llama-3.1-5B-Instruct model outperforms many existing models on several benchmarks, making it a reliable choice for conversational AI tasks in multilingual environments.

Notes

  • This model is optimized for safety and helpfulness, ensuring a positive user experience.
  • The torch_dtype is set to bfloat16 to optimize memory usage and performance.

Downloads last month
90
Safetensors
Model size
5.41B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.