๐Ÿฆ™ ALLaM-7B-Instruct-GGUF

This repository provides quantized GGUF versions of ALLaM-7B-Instruct, optimized for efficient inference using llama.cpp.

โš ๏ธ Acknowledgment

The original model was developed by ALLaM-AI and is available here:
๐Ÿ”— ALLaM-7B-Instruct-Preview

This repository only provides quantized versions for improved performance on different hardware.


โœจ Overview

ALLaM-7B-Instruct is an Arabic-centric instruction-tuned model based on Metaโ€™s LLaMA architecture, designed for natural language understanding and generation in Arabic.

๐Ÿš€ Whatโ€™s New?

โœ… GGUF Format โ€“ Optimized for llama.cpp
โœ… Multiple Quantization Levels โ€“ Balance between precision and efficiency
โœ… Run on CPUs & Low-Resource Devices โ€“ No need for high-end GPUs!


๐Ÿ“‚ Available Model Quantizations

Model Variant Precision Size Best For
ALLaM-7B-Instruct-f16.gguf FP16 Large High-precision tasks
ALLaM-7B-Instruct-Q8_0.gguf 8-bit Medium Balanced quality & speed
ALLaM-7B-Instruct-Q6_K.gguf 6-bit Small Good trade-off
ALLaM-7B-Instruct-Q5_0.gguf 5-bit Small Alternative quantization
ALLaM-7B-Instruct-Q5_K_M.gguf 5-bit Smaller Fast inference
ALLaM-7B-Instruct-Q4_0.gguf 4-bit Very Small Legacy format
ALLaM-7B-Instruct-Q4_K_M.gguf 4-bit Very Small Low-memory devices
ALLaM-7B-Instruct-Q2_K.gguf 2-bit Smallest Extreme efficiency

๐Ÿ“– Installation & Setup

1๏ธโƒฃ Install llama.cpp

Clone and build llama.cpp:

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make

2๏ธโƒฃ Download the Model

Choose and download a .gguf file from this repository.

3๏ธโƒฃ Run Inference

Use llama.cpp to generate responses:

./main -m ALLaM-7B-Instruct-Q4_0.gguf -p "ูƒูŠู ุฃุฌู‡ุฒ ูƒูˆุจ ุดุงู‡ูŠุŸ"

Expected Output:

ู„ุชุญุถูŠุฑ ูƒูˆุจ ุดุงูŠุŒ ุงุบู„ูŠ ุงู„ู…ุงุกุŒ ุถุน ุงู„ุดุงูŠ ููŠ ุงู„ูƒูˆุจุŒ ูˆุงุณูƒุจ ุงู„ู…ุงุก ุงู„ุณุงุฎู† ููˆู‚ู‡. ุงุชุฑูƒู‡ ู„ุฏู‚ุงุฆู‚ ุซู… ุงุณุชู…ุชุน ุจู…ุฐุงู‚ู‡!

๐Ÿ“Š Benchmarks & Performance

Quantization Format Model Size CPU (Tokens/sec) GPU (Tokens/sec)
FP16 Large ~2 ~15
Q8_0 Medium ~4 ~30
Q6_K Smaller ~6 ~40
Q5_0 Small ~7 ~42
Q5_K_M Smaller ~8 ~45
Q4_0 Very Small ~9 ~48
Q4_K_M Very Small ~10 ~50
Q2_K Smallest ~12 ~55

Performance may vary based on hardware and configuration.


๐Ÿ“œ License

This model follows the ALLaM-AI license. Refer to their Hugging Face repository for details.

โค๏ธ Acknowledgments

  • ALLaM-AI for developing the original ALLaM-7B-Instruct model.
  • llama.cpp by ggerganov for optimized inference.

โญ Contributions & Feedback

If you find this quantized model useful, feel free to contribute, provide feedback, or share your results!


Downloads last month
253
GGUF
Model size
7B params
Architecture
llama

2-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for eltay89/ALLaM-7B-Instruct-GGUF

Quantized
(5)
this model