Where is the tokenizer?

#1
by MichaelLowrance - opened
This comment has been hidden
MichaelLowrance changed discussion status to closed

The tokenizer is built directly into the GGUF model file

This comment has been hidden

The tokenizer is built directly into the GGUF model file

Could you share the source files for the Uncensored_gemma_2b model? I'm interested in the full set for testing purposes. I'd really appreciate the help!

As much as I would love to provide you with the source files of the model, I don't have them. The only models I've kept in safetensors format are:

  1. Uncensored_Phi-3_Mini_128k_Instruct_SEAFTENSOR
  2. Uncensored_Phi_3_mini_128k_Safetensors
  3. Uncensored_PHI_3_mini_safetensors
  4. Uncensored_gemma_9b_Safetensors
  5. Uncensored_llama_3.2_3b_safetensors

The best I can offer is the fine-tuning script I used and the datasets on which it was trained, if that would be of any help to you.

As much as I would love to provide you with the source files of the model, I don't have them. The only models I've kept in safetensors format are:

  1. Uncensored_Phi-3_Mini_128k_Instruct_SEAFTENSOR
  2. Uncensored_Phi_3_mini_128k_Safetensors
  3. Uncensored_PHI_3_mini_safetensors
  4. Uncensored_gemma_9b_Safetensors
  5. Uncensored_llama_3.2_3b_safetensors

The best I can offer is the fine-tuning script I used and the datasets on which it was trained, if that would be of any help to you.

Thanks for the response! The fine-tuning script and datasets used for training the model would work as well. Could you please share those?

Link to the script used:

https://drive.google.com/file/d/1i60_Ai3gBBoTAFNOE1Q1kJwhN7cc4tvl/view?usp=drive_link

Datasets used:

   "teknium/openhermes",# 328MB
   "xzuyn/tv-alpaca-open-instruct-uncensored-blend",# 52.6MB
   "V3N0M/Jenna-54K",# 52.5MB
   "arafatar/toxic_uncensored_LGBTQ_csv",# 2.2MB
   "Xennon-BD/Alpaca-uncensored",# 64.5KB 
   "diffnamehard/toxic-dpo-v0.1-NoWarning-alpaca",# 682 kB

If I remember correctly, Unsloth does support this model. My script tends to use more VRAM, so if you don't have enough GPU power, it may be better to use Unsloth.

I also found this information, which might be helpful:
Vaibhavs10

Jun 7, 2024

Collaborator

As of today you should be able to not just load & run inference but also convert a Mistral or LLama checkpoint over to the Transformers format from the latest Transformers release!

Linking the relevant docs here: Transformers docs

Transformers supports conversion from all the major quantisation formats:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

To further save persist the model checkpoint, you can:

tokenizer.save_pretrained('directory')
model.save_pretrained('directory')

We're hoping to extend this functionality further based on whichever architectures are requested further by the community!

Please log your requests over on this Transformers issue

: https://github.com/ggerganov/llama.cpp/discussions/3770

If you have any questions, feel free to let me know.

Hello, I'm encountering an issue when trying to convert my model to FP16.

Here's my script:

model = model.to(torch.float32)  # or float16
# Now convert the model to FP16
model_fp16_path = "./model_fp16"
model.save_pretrained(model_fp16_path)
tokenizer.save_pretrained(model_fp16_path)

But I get the following error:

ValueError: You cannot cast a bitsandbytes model in a new `dtype`. Make sure to load the model using `from_pretrained` using the desired `dtype` by passing the correct `torch_dtype` argument.

If I specify torch_dtype=torch.float32, I encounter the same error:

ValueError: You cannot cast a bitsandbytes model in a new `dtype`. Make sure to load the model using `from_pretrained` using the desired `torch_dtype` argument.

Could someone guide me on how to resolve this error? Thanks in advance!

The issue is that you initially ran the 4-bit save. This resulted in the following warning:

UserWarning: Merge lora module to 4-bit linear may get different generations due to rounding errors.
Saving and merging the LoRA adapters into the 4-bit model without separately saving the LoRA adapters
To avoid this, you should only run the Save to f16Bit operation.

Use either "model = model.to(torch.float32)" or "model = model.to(torch.float16)" as follows:

model = model.to(torch.float32)
# Now convert the model to FP16
model_fp16_path = "./model_fp16"
# model.half()  # Convert model to FP16
model.save_pretrained(model_fp16_path)
tokenizer.save_pretrained(model_fp16_path)
model = model.to(torch.float16)
# Now convert the model to FP16
model_fp16_path = "./model_fp16"
# model.half()  # Convert model to FP16 by taking half
model.save_pretrained(model_fp16_path)
tokenizer.save_pretrained(model_fp16_path)

I apologize if that confused you. It's a script I haven't refined much since I'm the only one using it.

I hope you didn't lose any money on computing power due to the error. Though, considering my late response, I believe you've already found the cause of the issue.

If you need any further clarification, please don't hesitate to ask.

Sign up or log in to comment