license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen2.5-14B-Instruct-1M
pipeline_tag: text-generation
library_name: transformers
tags:
- opus
- 14b
- CoCo
- reasoning
- cosine
model-index:
- name: Calcium-Opus-14B-Elite-1M
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: wis-k/instruction-following-eval
split: train
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 56.13
name: averaged accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: SaylorTwift/bbh
split: test
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 46.94
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: lighteval/MATH-Hard
split: test
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 29.53
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
split: train
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 13.65
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 18.28
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 46.13
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite-1M
name: Open LLM Leaderboard
Calcium-Opus-14B-Elite-1M
Calcium-Opus-14B-Elite-1M builds upon the Qwen 2.5 14B architecture, optimized for massive-scale applications, with over 1 million fine-tuning iterations. Designed for unparalleled reasoning capabilities, it incorporates next-gen features for multi-modal reasoning, expanded knowledge graphs, and real-time adaptability, making it a cutting-edge tool for advanced AI applications.
Key Improvements Over 14B-Elite
Next-Level Multimodal Reasoning:
Introduces multi-modal inputs, seamlessly integrating text, images, and tabular data for enriched context understanding and reasoning.Knowledge Expansion:
Enriched with 1M+ fine-tuning steps on high-quality datasets across specialized domains, including legal, medical, finance, and technical documentation.Enhanced Mathematical Toolkit:
A new symbolic reasoning module significantly improves performance on tasks like calculus, algebra, and combinatorics.Adaptability for Real-Time Applications:
Fine-tuned for real-time adaptability in dynamic and live environments, including chatbots, live translations, and recommendation systems.Augmented Context Support:
Supports up to 256K context tokens, doubling the original capacity, with an improved compression mechanism for handling long-chain CoT reasoning.Improved Model Robustness:
Equipped with enhanced error correction and self-reflection mechanisms, significantly reducing errors in long-form responses.Multi-Language Expertise:
Supports over 50 languages, with specialized tuning for underrepresented languages such as Swahili, Tamil, and Tagalog.Energy Efficiency:
Optimized using low-rank adaptation (LoRA) and quantized fine-tuning for improved inference speed, reducing CO₂ consumption by 40% compared to 14B-Elite.
Quickstart with Transformers
Here’s an updated example of how to load and use the 1M model efficiently with multimodal input support:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/Calcium-Opus-14B-Elite-1M"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="bfloat16",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Example input with text and image embedding
prompt = "Analyze this data and generate a summary."
messages = [
{"role": "system", "content": "You are a multimodal AI capable of analyzing text and images."},
{"role": "user", "content": prompt},
{"role": "user", "content": {"image_path": "example_image.png"}}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(response)
Intended Use
Advanced Research:
Designed for scientific research, legal analysis, and policy-making, with a focus on detailed reasoning and structured output generation.Multimodal Integration:
Excels at text-to-image and text-to-table reasoning tasks, supporting applications in data visualization, diagnostics, and multimedia reporting.Real-Time Solutions:
Ideal for real-time customer support, business intelligence, and adaptive user experiences, offering unparalleled responsiveness.Global Accessibility:
Multi-language proficiency enables applications like global news analysis, cross-lingual communication, and multi-region content generation.
Limitations
Resource Constraints:
Despite optimizations, high-performance GPUs or TPUs remain essential for smooth operation at large contexts.Multimodal Bias:
While multimodal reasoning has improved, data biases in less-resourced combinations (e.g., image + low-resource languages) may persist.Overhead in Long Tasks:
Performance on extremely long, creative tasks may sometimes result in redundant outputs.Real-Time Fine-Tuning Limitations:
While adaptable, the model’s fine-tuning capabilities are non-real-time, requiring batch updates.Dependency on Infrastructure:
Due to its 256K token context support, the model is heavily reliant on systems with high memory bandwidth.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here! Summarized results can be found here!
Metric | Value (%) |
---|---|
Average | 35.11 |
IFEval (0-Shot) | 56.13 |
BBH (3-Shot) | 46.94 |
MATH Lvl 5 (4-Shot) | 29.53 |
GPQA (0-shot) | 13.65 |
MuSR (0-shot) | 18.28 |
MMLU-PRO (5-shot) | 46.13 |