Granite-3.1-2B-Reasoning-GGUF (Quantized for Efficiency)

Model Overview

This is a GGUF quantized version of ruslanmv/granite-3.1-2b-Reasoning, fine-tuned from ibm-granite/granite-3.1-2b-instruct. The GGUF format allows for efficient inference on CPU and GPU, optimized for use with Kbit quantization levels (4-bit, 5-bit, and 8-bit).

Developed by: ruslanmv
License: Apache 2.0
Base Model: ibm-granite/granite-3.1-2b-instruct
Fine-tuned for: Logical reasoning, structured problem-solving, long-context tasks
Quantized GGUF versions available:
- 4-bit: Q4_K_M
- 5-bit: Q5_K_M
- 8-bit: Q8_0
Supported Languages: English
Architecture: Granite
Model Size: 2.53B params

Why Use the GGUF Quantized Version?

The GGUF format is designed for optimized CPU and GPU inference, enabling:

✅ Lower memory usage for running on consumer hardware
✅ Faster inference speeds without compromising reasoning ability
✅ Compatibility with popular inference engines like llama.cpp, ctransformers, and KoboldCpp

Installation & Usage

To use this model with llama.cpp, install the required dependencies:

pip install llama-cpp-python

Running the Model

To run the model using llama.cpp:

from llama_cpp import Llama

model_path = "path/to/ruslanmv/granite-3.1-2b-Reasoning-GGUF.Q4_K_M.gguf"

llm = Llama(model_path=model_path)

input_text = "Can you explain the difference between inductive and deductive reasoning?"
output = llm(input_text, max_tokens=400)

print(output["choices"][0]["text"])

Alternatively, using ctransformers:

pip install ctransformers

from ctransformers import AutoModelForCausalLM

model_path = "path/to/ruslanmv/granite-3.1-2b-Reasoning-GGUF.Q4_K_M.gguf"

model = AutoModelForCausalLM.from_pretrained(model_path, model_type="llama", gpu_layers=50)

input_text = "What are the key principles of logical reasoning?"
output = model(input_text, max_new_tokens=400)

print(output)

Intended Use

Granite-3.1-2B-Reasoning-GGUF is optimized for efficient inference while maintaining strong reasoning capabilities, making it ideal for:

Logical and analytical problem-solving
Text-based reasoning tasks
Mathematical and symbolic reasoning
Advanced instruction-following

This model is particularly useful for CPU-based deployments and users who need low-memory, high-performance text generation.

License & Acknowledgments

This model is released under the Apache 2.0 license. It is fine-tuned from IBM’s Granite 3.1-2B-Instruct model and quantized using GGUF for optimal efficiency. Special thanks to the IBM Granite Team for developing the base model.

For more details, visit the IBM Granite Documentation.

Citation

If you use this model in your research or applications, please cite:

@misc{ruslanmv2025granite,
  title={Fine-Tuning and GGUF Quantization of Granite-3.1 for Advanced Reasoning},
  author={Ruslan M.V.},
  year={2025},
  url={https://huggingface.co/ruslanmv/granite-3.1-2b-Reasoning-GGUF}
}

ruslanmv
/

granite-3.1-2b-Reasoning-GGUF