# MiniMax-Text-01 |
## 1. Introduction |
MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model. |
<p align="center"> |
<img width="100%" src="figures/TextBench.png"> |
</p> |
## 2. Model Architecture |
The architecture of MiniMax-Text-01 is briefly described as follows: |
- Total Parameters: 456B |
- Activated Parameters per Token: 45.9B |
- Number Layers: 80 |
- Hybrid Attention: a softmax attention is positioned after every 7 lightning attention. |
- Number of attention heads: 64 |
- Attention head dimension: 128 |
- Mixture of Experts: |
- Number of experts: 32 |
- Expert hidden dimension: 9216 |
- Top-2 routing strategy |
- Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000 |
- Hidden Size: 6144 |
- Vocab Size: 200,064 |
## 3. Evaluation |
### Core Academic Benchmarks |
| **Tasks** | **GPT-4o (11-20)** | **Claude-3.5-Sonnet (10-22)** | **Gemini-1.5-Pro (002)** | **Gemini-2.0-Flash (exp)** | **Qwen2.5-72B-Inst.** | **DeepSeek-V3** | **Llama-3.1-405B-Inst.** | **MiniMax-Text-01** | |
|-------------------------------|--------------------|-------------------------------|--------------------------|----------------------------|-----------------------|-----------------|--------------------------|---------------------| |
| **General** | | | | | | | | | |
| MMLU<sup>*</sup> | 85.7 | 88.3 | 86.8 | 86.5 | 86.1 | 88.5 | **88.6** | 88.5 | |
| MMLU-Pro<sup>*</sup> | 74.4 | **78.0** | 75.8 | 76.4 | 71.1 | 75.9 | 73.3 | 75.7 | |
| SimpleQA | **39.0** | 28.1 | 23.4 | 26.6 | 10.3 | 24.9 | 23.2 | 23.7 | |
| C-SimpleQA | 64.6 | 56.8 | 59.4 | 63.3 | 52.2 | 64.8 | 54.7 | **67.4** | |
| IFEval _(avg)_ | 84.1 | **90.1** | 89.4 | 88.4 | 87.2 | 87.3 | 86.4 | 89.1 | |
| Arena-Hard | **92.4** | 87.6 | 85.3 | 72.7 | 81.2 | 91.4 | 63.5 | 89.1 | |
| **Reasoning** | | | | | | | | | |
| GPQA<sup>*</sup> _(diamond)_ | 46.0 | **65.0** | 59.1 | 62.1 | 49.0 | 59.1 | 50.7 | 54.4 | |
| DROP<sup>*</sup> _(F1)_ | 89.2 | 88.8 | 89.2 | 89.3 | 85.0 | 91.0 | **92.5** | 87.8 | |
| **Mathematics** | | | | | | | | | |
| GSM8k<sup>*</sup> | 95.6 | **96.9** | 95.2 | 95.4 | 95.8 | 96.7 | 96.7 | 94.8 | |
| MATH<sup>*</sup> | 76.6 | 74.1 | **84.6** | 83.9 | 81.8 | **84.6** | 73.8 | 77.4 | |
| **Coding** | | | | | | | | | |
| MBPP + | 76.2 | 75.1 | 75.4 | 75.9 | 77.0 | **78.8** | 73.0 | 71.7 | |
| HumanEval | 90.2 | **93.7** | 86.6 | 89.6 | 86.6 | 92.1 | 89.0 | 86.9 | |
<sup>*</sup> Evaluated following a _0-shot CoT_ setting. |
### Long Benchmarks |
#### 4M Needle In A Haystack Test |
<p align="center"> |
<img width="90%" src="figures/niah.png"> |
</p> |
#### Ruler |
| Model | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M | |
|-------|----|----|-----|-----|-----|------|------|------|----| |
| **GPT-4o (11-20)** | **0.970** | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - | |
| **Claude-3.5-Sonnet (10-22)** | 0.965 | 0.960 | 0.957 | 0.950 | **0.952** | 0.938 | - | - | - | |
| **Gemini-1.5-Pro (002)** | 0.962 | 0.960 | **0.960** | **0.958** | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 | |
| **Gemini-2.0-Flash (exp)** | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - | |
| **MiniMax-Text-01** | 0.963 | **0.961** | 0.953 | 0.954 | 0.943 | **0.947** | **0.945** | **0.928** | **0.910** | |
#### LongBench v2 |
| **Model** | **overall** | **easy** | **hard** | **short** | **medium** | **long** | |
|----------------------------|-------------|----------|----------|------------|------------|----------| |
| Human | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 | |
| **w/ CoT** | | | | | | | |
| GPT-4o (11-20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 | |
| Claude-3.5-Sonnet (10-22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 | |
| Deepseek-V3 | - | - | - | - | - | - | |
| Qwen2.5-72B-Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 | |
| **MiniMax-Text-01** | **56.5** | **66.1** | **50.5** | **61.7** | **56.7** | **47.2** | |
| **w/o CoT** | | | | | | | |
| GPT-4o (11-20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 | |
| Claude-3.5-Sonnet (10-22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 | |
| Deepseek-V3 | 48.7 | - | - | - | - | - | |
| Qwen2.5-72B-Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | **44.4** | |
| **MiniMax-Text-01** | **52.9** | **60.9** | **47.9** | **58.9** | **52.6** | 43.5 | |
#### MTOB |
| **Context Type** | **no context** | **half book** | **full book** | **Δ half book** | **Δ full book** | |
|------------------|----------------|---------------|---------------|------------------|-----------------| |
| **eng → kalam (ChrF)** | | | | | | |
| GPT-4o (11-20) | 9.90 | **54.30** | - | 44.40 | - | |
| Claude-3.5-Sonnet (10-22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 | |
| Gemini-1.5-Pro (002) | 16.79 | 53.68 | **57.90** | 36.89 | 41.11 | |
| Gemini-2.0-Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 | |
| Qwen-Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 | |
| **MiniMax-Text-01** | 6.0 | 51.74 | 51.60 | **45.7** | **45.6** | |
| **kalam → eng (BLEURT)** | | | | | | |
| GPT-4o (11-20) | 33.20 | 58.30 | - | 25.10 | - | |
| Claude-3.5-Sonnet (10-22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 | |
| Gemini-1.5-Pro (002) | 32.02 | **61.52** | **63.09** | **29.50** | **31.07** | |
| Gemini-2.0-Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 | |
| Qwen-Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 | |
| **MiniMax-Text-01** | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 | |
## 4. Quickstart |
Here we provide a simple example of loading the tokenizer and model to generate content. |
```python |
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig |
# load hf config |
hf_config = AutoConfig.from_pretrained("MiniMax-Text-01", trust_remote_code=True) |
# quantization config, int8 is recommended |
quantization_config = QuantoConfig( |
weights="int8", |
modules_to_not_convert=[ |
"lm_head", |
"embed_tokens", |
] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)] |
+ [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)] |
) |
# set device map |
device_map = { |
'model.embed_tokens': 'cuda:0', |
'model.norm': f'cuda:{world_size - 1}', |
'lm_head': f'cuda:{world_size - 1}' |
} |
# assume 8 GPUs |
world_size = 8 |
layers_per_device = hf_config.num_hidden_layers // world_size |
for i in range(world_size): |
for j in range(layers_per_device): |
device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}' |
# load tokenizer |
tokenizer = AutoTokenizer.from_pretrained("MiniMax-Text-01") |
prompt = "Hello!" |
messages = [ |
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]}, |
{"role": "user", "content": [{"type": "text", "text": prompt}]}, |
] |
text = tokenizer.apply_chat_template( |
messages, |
tokenize=False, |
add_generation_prompt=True |
) |
# tokenize and move to device |
model_inputs = tokenizer(text, return_tensors="pt").to("cuda") |
# load bfloat16 model, move to device, and apply quantization |
quantized_model = AutoModelForCausalLM.from_pretrained( |
"MiniMax-Text-01", |
torch_dtype="bfloat16", |
device_map=device_map, |
quantization_config=quantization_config, |
trust_remote_code=True, |
offload_buffers=True, |
) |
# generate response |
generation_config = GenerationConfig( |
max_new_tokens=20, |
eos_token_id=200020, |
use_cache=True, |
) |
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config) |
print(f"generated_ids: {generated_ids}") |
generated_ids = [ |
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
] |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
``` |
## 5. Chatbot & API |
For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://intl.minimaxi.com) for developers. |
Contact us at [[email protected]](mailto:[email protected]). |