When will config.json be available?
I wanted to use the raw weights locally by downloading them. Without config.json I am facing errors.
Is there any other way to use the weights. I don't want to use inference.
@pandora-s
I want to use the model like:
local_directory = "/home/admin/mamba_codestral/model_weights"
model = AutoModelForCausalLM.from_pretrained(local_directory, device_map=device)
tokenizer = AutoTokenizer.from_pretrained(local_directory)
user_query = "some_sample_query"
(Then I want to generate text based on the user_query)
output = model.generate(
inputs["input_ids"],
attention_mask=inputs["attention_mask"],
pad_token_id=tokenizer.eos_token_id,
max_new_tokens=8160,
num_return_sequences=1,
do_sample=True,
top_p=0.9,
temperature=0.1,).to(device)
Hope this gives you an idea of how I tend to use the model.
Hi
@DevJain7
, you can use an HF revision with the most recent version of transformers
, we ported it to transformers recently (latest version is necessary). Weights are in this repo on a different revision than main
, namely refs/pr/9
. I don't think it has the AutoModel mapping just yet but this should work
from transformers import MambaConfig, Mamba2ForCausalLM, AutoTokenizer
import torch
model_id = 'mistralai/Mamba-Codestral-7B-v0.1'
tokenizer = AutoTokenizer.from_pretrained(model_id, revision='refs/pr/9', from_slow=True, legacy=False)
model = Mamba2ForCausalLM.from_pretrained(model_id, revision='refs/pr/9')
input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
If you want to download the weights locally I suggest using something like
pip install --upgrade huggingface_hub
huggingface_cli login # add your token here when prompted
huggingface_cli download 'mistralai/Mamba-Codestral-7B-v0.1' --local_dir . --revision="refs/pr/9"
That being said, mistral-inference
should work as well!
Hey,
I am able to run with mistral-inference.
Since i want to connect with continue-dev, i am exploring other options.
The commands are incorrect w.r.t huggingface
pip install -U "huggingface_hub[cli]
huggingface-cli login --token $HF_TOKEN
huggingface-cli download 'mistralai/Mamba-Codestral-7B-v0.1' --local-dir . --revision="refs/pr/9"
Hi @DevJain7 , you can use an HF revision with the most recent version of
transformers
, we ported it to transformers recently (latest version is necessary). Weights are in this repo on a different revision than
That is good news for those of us who normally use things like text-gen instead of custom code.
Current progress to run codestral-mamba locally - https://github.com/slabstech/llm-recipes/tree/main/tutorials/mamba
need to quantise the model, not fitting into 24GB 4090 card, some issue to make it running on 4x 4090,
hope to solve it tonight
I tried to run a space with the weights at - https://huggingface.co/spaces/gaganyatri/codestral-api using the weights from https://huggingface.co/gaganyatri/codestral-7B
Waiting for free gpu, to verify the modifications.
What is the expected VRAM for the mamba model, it does not fit in 24GB card