概要
このモデルはQwQの長文生成能力とR1の性能を合わせたモデルを作ることを目標にMergekitとFTを用いて製作しました。
How to use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "DataPilot/SKYDRIVE-32B-v0.1"
tokenizer_name = ""
if tokenizer_name == "":
tokenizer_name = model_name
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
prompt = "メタデータを解析し、自己進化をするAIであるnurture intelligenceが実現した未来の日常生活の姿を教えてください。"
messages = [
{"role": "system", "content": "あなたは優秀な日本語アシスタントであり長考モデルです。問題解決をするための思考をした上で回答を行ってください。"},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=4096
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
謝辞
このモデルの作成者皆様と、計算資源を貸していただいたVOLTMINDに感謝します。 問題解決に協力してくださったhayashiさんにも感謝申し上げます。
Mergekit config
merge_method: slerp
base_model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
models:
- model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
- model: NovaSky-AI/Sky-T1-32B-Flash
parameters:
t: 0.4
dtype: bfloat16
name: SKYCAVE_element_Sky_jp
---
merge_method: breadcrumbs_ties
base_model: Qwen/Qwen2.5-32B
tokenizer_source: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
name: SKYDRIVE_element_jp_01
models:
- model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
parameters:
weight: 1.0
- model: FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
parameters:
weight: 0.75
dtype: bfloat16
---
merge_method: task_arithmetic
base_model: Qwen/Qwen2.5-32B
tokenizer_source: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
name: SKYDRIVE_element_jp_02
models:
- model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
parameters:
weight: 1.0
- model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
parameters:
weight: 0.9
dtype: bfloat16
---
merge_method: slerp
base_model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
models:
- model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
- model: TeamDelta/ABEJA-Qwen2.5-32B-base-jp-v0.1
parameters:
t: 0.5
dtype: bfloat16
name: SKYDRIVE_element_jp_03
---
merge_method: model_stock
base_model: Qwen/Qwen2.5-32B-Instruct
models:
- model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
- model: SKYCAVE_element_Sky_jp
- model: SKYDRIVE_element_jp_01
- model: SKYDRIVE_element_jp_02
- model: SKYDRIVE_element_jp_03
dtype: bfloat16
pad_to_multiple_of: 512
tokenizer_source: base
name: SKYDRIVE-32B-v0.1
- Downloads last month
- 66
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.