When will config.json be available?

#10
by DevJain7 - opened

I wanted to use the raw weights locally by downloading them. Without config.json I am facing errors.

Is there any other way to use the weights. I don't want to use inference.

Mistral AI_ org

Hi @DevJain7 , how are you running the model? Mamba Codestral is based on the Mamba architecture, it wont work like other Transformers based models. Hence why we advise using mistral-inference.

@pandora-s
I want to use the model like:

local_directory = "/home/admin/mamba_codestral/model_weights"

model = AutoModelForCausalLM.from_pretrained(local_directory, device_map=device)
tokenizer = AutoTokenizer.from_pretrained(local_directory)

user_query = "some_sample_query"

(Then I want to generate text based on the user_query)
output = model.generate(
inputs["input_ids"],
attention_mask=inputs["attention_mask"],
pad_token_id=tokenizer.eos_token_id,
max_new_tokens=8160,
num_return_sequences=1,
do_sample=True,
top_p=0.9,
temperature=0.1,).to(device)

Hope this gives you an idea of how I tend to use the model.

Hi @DevJain7 , you can use an HF revision with the most recent version of transformers, we ported it to transformers recently (latest version is necessary). Weights are in this repo on a different revision than main, namely refs/pr/9. I don't think it has the AutoModel mapping just yet but this should work

from transformers import MambaConfig, Mamba2ForCausalLM, AutoTokenizer
import torch
model_id = 'mistralai/Mamba-Codestral-7B-v0.1'
tokenizer = AutoTokenizer.from_pretrained(model_id, revision='refs/pr/9', from_slow=True, legacy=False)
model = Mamba2ForCausalLM.from_pretrained(model_id, revision='refs/pr/9')
input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

If you want to download the weights locally I suggest using something like

pip install --upgrade huggingface_hub
huggingface_cli login  # add your token here when prompted
huggingface_cli download 'mistralai/Mamba-Codestral-7B-v0.1' --local_dir . --revision="refs/pr/9"

That being said, mistral-inference should work as well!

Hey,
I am able to run with mistral-inference.
Since i want to connect with continue-dev, i am exploring other options.

The commands are incorrect w.r.t huggingface

pip install -U "huggingface_hub[cli]
huggingface-cli login --token $HF_TOKEN
huggingface-cli download 'mistralai/Mamba-Codestral-7B-v0.1' --local-dir . --revision="refs/pr/9"

deleted

Hi @DevJain7 , you can use an HF revision with the most recent version of transformers, we ported it to transformers recently (latest version is necessary). Weights are in this repo on a different revision than

That is good news for those of us who normally use things like text-gen instead of custom code.

Current progress to run codestral-mamba locally - https://github.com/slabstech/llm-recipes/tree/main/tutorials/mamba

need to quantise the model, not fitting into 24GB 4090 card, some issue to make it running on 4x 4090,
hope to solve it tonight

I tried to run a space with the weights at - https://huggingface.co/spaces/gaganyatri/codestral-api using the weights from https://huggingface.co/gaganyatri/codestral-7B
Waiting for free gpu, to verify the modifications.

What is the expected VRAM for the mamba model, it does not fit in 24GB card

This comment has been hidden
DevJain7 changed discussion status to closed
DevJain7 changed discussion status to open

Sign up or log in to comment