Where is the tokenizer?

by MichaelLowrance - opened Oct 27, 2024

Discussion

MichaelLowrance

Oct 27, 2024

This comment has been hidden

MichaelLowrance changed discussion status to closed Oct 27, 2024

ICEPVP8977

Owner Oct 27, 2024

The tokenizer is built directly into the GGUF model file

MichaelLowrance

Oct 28, 2024

This comment has been hidden

MichaelLowrance

Oct 28, 2024

The tokenizer is built directly into the GGUF model file

Could you share the source files for the Uncensored_gemma_2b model? I'm interested in the full set for testing purposes. I'd really appreciate the help!

ICEPVP8977

Owner Oct 28, 2024

As much as I would love to provide you with the source files of the model, I don't have them. The only models I've kept in safetensors format are:

Uncensored_Phi-3_Mini_128k_Instruct_SEAFTENSOR
Uncensored_Phi_3_mini_128k_Safetensors
Uncensored_PHI_3_mini_safetensors
Uncensored_gemma_9b_Safetensors
Uncensored_llama_3.2_3b_safetensors

The best I can offer is the fine-tuning script I used and the datasets on which it was trained, if that would be of any help to you.

MichaelLowrance

Oct 29, 2024

As much as I would love to provide you with the source files of the model, I don't have them. The only models I've kept in safetensors format are:

Uncensored_Phi-3_Mini_128k_Instruct_SEAFTENSOR

Uncensored_Phi_3_mini_128k_Safetensors

Uncensored_PHI_3_mini_safetensors

Uncensored_gemma_9b_Safetensors

Uncensored_llama_3.2_3b_safetensors

The best I can offer is the fine-tuning script I used and the datasets on which it was trained, if that would be of any help to you.

Thanks for the response! The fine-tuning script and datasets used for training the model would work as well. Could you please share those?

ICEPVP8977

Owner Oct 29, 2024

Link to the script used:

https://drive.google.com/file/d/1i60_Ai3gBBoTAFNOE1Q1kJwhN7cc4tvl/view?usp=drive_link

Datasets used:

   "teknium/openhermes",# 328MB
   "xzuyn/tv-alpaca-open-instruct-uncensored-blend",# 52.6MB
   "V3N0M/Jenna-54K",# 52.5MB
   "arafatar/toxic_uncensored_LGBTQ_csv",# 2.2MB
   "Xennon-BD/Alpaca-uncensored",# 64.5KB 
   "diffnamehard/toxic-dpo-v0.1-NoWarning-alpaca",# 682 kB

If I remember correctly, Unsloth does support this model. My script tends to use more VRAM, so if you don't have enough GPU power, it may be better to use Unsloth.

I also found this information, which might be helpful:
Vaibhavs10

Jun 7, 2024

Collaborator

As of today you should be able to not just load & run inference but also convert a Mistral or LLama checkpoint over to the Transformers format from the latest Transformers release!

Linking the relevant docs here: Transformers docs

Transformers supports conversion from all the major quantisation formats:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

To further save persist the model checkpoint, you can:

tokenizer.save_pretrained('directory')
model.save_pretrained('directory')

We're hoping to extend this functionality further based on whichever architectures are requested further by the community!

Please log your requests over on this Transformers issue

: https://github.com/ggerganov/llama.cpp/discussions/3770

If you have any questions, feel free to let me know.

MichaelLowrance

Nov 2, 2024

Hello, I'm encountering an issue when trying to convert my model to FP16.

Here's my script:

model = model.to(torch.float32)  # or float16
# Now convert the model to FP16
model_fp16_path = "./model_fp16"
model.save_pretrained(model_fp16_path)
tokenizer.save_pretrained(model_fp16_path)

But I get the following error:

ValueError: You cannot cast a bitsandbytes model in a new `dtype`. Make sure to load the model using `from_pretrained` using the desired `dtype` by passing the correct `torch_dtype` argument.

If I specify torch_dtype=torch.float32, I encounter the same error:

ValueError: You cannot cast a bitsandbytes model in a new `dtype`. Make sure to load the model using `from_pretrained` using the desired `torch_dtype` argument.

Could someone guide me on how to resolve this error? Thanks in advance!

ICEPVP8977

Owner Nov 3, 2024

The issue is that you initially ran the 4-bit save. This resulted in the following warning:

UserWarning: Merge lora module to 4-bit linear may get different generations due to rounding errors.
Saving and merging the LoRA adapters into the 4-bit model without separately saving the LoRA adapters
To avoid this, you should only run the Save to f16Bit operation.

Use either "model = model.to(torch.float32)" or "model = model.to(torch.float16)" as follows:

model = model.to(torch.float32)
# Now convert the model to FP16
model_fp16_path = "./model_fp16"
# model.half()  # Convert model to FP16
model.save_pretrained(model_fp16_path)
tokenizer.save_pretrained(model_fp16_path)

model = model.to(torch.float16)
# Now convert the model to FP16
model_fp16_path = "./model_fp16"
# model.half()  # Convert model to FP16 by taking half
model.save_pretrained(model_fp16_path)
tokenizer.save_pretrained(model_fp16_path)

I apologize if that confused you. It's a script I haven't refined much since I'm the only one using it.

I hope you didn't lose any money on computing power due to the error. Though, considering my late response, I believe you've already found the cause of the issue.

If you need any further clarification, please don't hesitate to ask.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment