How can you convert the pth into the gguf model?

#1
by mzwing - opened

Hi, thanks for your awesome work!

I want to convert more comprehensive quantization varients for the original model, but failed to find a way to deal with the pth file format. What's worse, the convert_rwkv_checkpoint_to_hf.py script provided by transformers also complained this:

Traceback (most recent call last):
  File "/home/mzwing/AI/runner/tools/convert_rwkv_checkpoint_to_hf.py", line 201, in <module>
    convert_rmkv_checkpoint_to_hf_format(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        args.repo_id,
        ^^^^^^^^^^^^^
    ...<5 lines>...
        model_name=args.model_name,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/mzwing/AI/runner/tools/convert_rwkv_checkpoint_to_hf.py", line 151, in convert_rmkv_checkpoint_to_hf_format
    torch.save({k: v.cpu().clone() for k, v in state_dict.items()}, os.path.join(output_dir, shard_file))
                                               ^^^^^^^^^^^^^^^^
AttributeError: 'Tensor' object has no attribute 'items'. Did you mean: 'item'?

If I ignore the error and continue converting it to gguf, llama.cpp's convert_hf_to_gguf.pywill throw this:

Traceback (most recent call last):
  File "/home/mzwing/AI/repos/llama.cpp/./convert_hf_to_gguf.py", line 5140, in <module>
    main()
    ~~~~^^
  File "/home/mzwing/AI/repos/llama.cpp/./convert_hf_to_gguf.py", line 5112, in main
    model_architecture = hparams["architectures"][0]
                         ~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'architectures'

So, how can you convert the pth into the gguf model? Could you please help me? Thanks a lot!

Sorry for the late reply😥. You need to use the pth_to_hf.py file to convert the pth file to hf format, and then convert the hf file to gguf. Below is the content of pth_to_hf.py❤

# Convert the model for the pytoch_model.bin
import torch
 
SOURCE_MODEL="./v6-FinchX-14B-pth/rwkv-14b-final.pth"
TARGET_MODEL="./v6-Finch-14B-HF/pytorch_model.bin"
 
# delete target model
import os
if os.path.exists(TARGET_MODEL):
    os.remove(TARGET_MODEL)
 
model = torch.load(SOURCE_MODEL, mmap=True, map_location='cpu')
# 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
# Rename all the keys, to include "rwkv."
new_model = {}
for key in model.keys():
 
    # If the keys start with "blocks"
    if key.startswith("blocks."):
        new_key = "rwkv." + key
        # Replace .att. with .attention.
        new_key = new_key.replace(".att.", ".attention.")
        # Replace .ffn. with .feed_forward.
        new_key = new_key.replace(".ffn.", ".feed_forward.")
        # Replace `0.ln0.` with `0.pre_ln.`
        new_key = new_key.replace("0.ln0.", "0.pre_ln.")
    else:
        # No rename needed
        new_key = key
 
        # Rename `emb.weight` to `rwkv.embeddings.weight`
        if key == "emb.weight":
            new_key = "rwkv.embeddings.weight"
 
        # Rename the `ln_out.x` to `rwkv.ln_out.x
        if key.startswith("ln_out."):
            new_key = "rwkv." + key
 
    print("Renaming key:", key, "--to-->", new_key)
    new_model[new_key] = model[key]
 
# Save the new model
print("Saving the new model to:", TARGET_MODEL)
torch.save(new_model, TARGET_MODEL)

Thanks for your reply!

However, if I use this script to convert the pth file, the llama.cpp's convert_hf_to_gguf.py will complain that:

INFO:hf-to-gguf:Loading model: RWKV6-3B-Chn-UnlimitedRP-mini-chat-HF
Traceback (most recent call last):
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 5140, in <module>
    main()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 5108, in main
    hparams = Model.load_hparams(dir_model)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 468, in load_hparams
    with open(dir_model / "config.json", "r", encoding="utf-8") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '../RWKV6-3B-Chn-UnlimitedRP-mini-chat-HF/config.json'

BTW, does this script come from https://rwkv.cn/llamacpp#appendix-code? I believe it does convert the model into HF format, but it forgets to save the model info to config.json, etc.

The required config.json and other files are in this URL: https://huggingface.co/RWKV/rwkv-6-world-3b

yes! This pyScript comes from https://rwkv.cn/llamacpp#appendix-code

Oh thx a lot! I will have a try later. ❤️

This time when I run the convert_hf_to_gguf.py I encountered a new traceback 😢:

> python ./convert_hf_to_gguf.py --outtype f16 --outfile ../RWKV6-3B-Chn-UnlimitedRP-mini-chat-GGUF.F16.gguf ../RWKV6-3B-Chn-UnlimitedRP-mini-chat-HF/
INFO:hf-to-gguf:Loading model: RWKV6-3B-Chn-UnlimitedRP-mini-chat-HF
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model.bin'
INFO:hf-to-gguf:token_embd.weight,                    torch.bfloat16 --> F16, shape = {2560, 65536}
............
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 5140, in <module>
    main()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 5134, in main
    model_instance.write()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 440, in write
    self.prepare_metadata(vocab_only=False)
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 433, in prepare_metadata
    self.set_vocab()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 3330, in set_vocab
    assert (self.dir_model / "rwkv_vocab_v20230424.txt").is_file()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

What can I do next?

OK I think I find the rwkv_vocab_v20230424.txt is actually here: https://huggingface.co/RWKV/v6-Finch-1B6-HF/blob/main/rwkv_vocab_v20230424.txt

However, I found that I still cannot convert it successfully. It now complains a new traceback:

Traceback (most recent call last):
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 5140, in <module>
    main()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 5134, in main
    model_instance.write()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 440, in write
    self.prepare_metadata(vocab_only=False)
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 433, in prepare_metadata
    self.set_vocab()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 3340, in set_vocab
    assert len(parts) >= 3
           ^^^^^^^^^^^^^^^
AssertionError

I found a GitHub repo https://github.com/BBuf/RWKV-World-HF-Tokenizer can help convert!

However, it still requires some manual editing (see its README.md for more details).

I think I will fork it to make some improvements.

BTW, if want to use the repo, maybe we should also set transformers==4.46.3, or the script will just refuse to work 😢...

And the file I mentioned here is still needed to be put in the HF folder.

(Looks like a bit complicated...)

OK I think I find the rwkv_vocab_v20230424.txt is actually here: https://huggingface.co/RWKV/v6-Finch-1B6-HF/blob/main/rwkv_vocab_v20230424.txt

However, I found that I still cannot convert it successfully. It now complains a new traceback:

Traceback (most recent call last):
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 5140, in <module>
    main()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 5134, in main
    model_instance.write()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 440, in write
    self.prepare_metadata(vocab_only=False)
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 433, in prepare_metadata
    self.set_vocab()
  File "/content/llama.cpp/./convert_hf_to_gguf.py", line 3340, in set_vocab
    assert len(parts) >= 3
           ^^^^^^^^^^^^^^^
AssertionError

You can try using the files in this repo 🤔: https://huggingface.co/RWKV/v6-Finch-3B-HF

I found a GitHub repo https://github.com/BBuf/RWKV-World-HF-Tokenizer can help convert!

However, it still requires some manual editing (see its README.md for more details).

I think I will fork it to make some improvements.

BTW, if want to use the repo, maybe we should also set transformers==4.46.3, or the script will just refuse to work 😢...

And the file I mentioned here is still needed to be put in the HF folder.

(Looks like a bit complicated...)

Since I don't have LLM related expertise, I don't quite understand how it works, sorry

You can try using the files in this repo 🤔: https://huggingface.co/RWKV/v6-Finch-3B-HF

It worked!

Looks like both of the converting results are the same!

Thanks for your patient reply!!!!!

My result: https://huggingface.co/mzwing/RWKV6-3B-Chn-UnlimitedRP-mini-chat-GGUF/tree/main

mzwing changed discussion status to closed

You can try using the files in this repo 🤔: https://huggingface.co/RWKV/v6-Finch-3B-HF

It worked!

Looks like both of the converting results are the same!

Thanks for your patient reply!!!!!

My result: https://huggingface.co/mzwing/RWKV6-3B-Chn-UnlimitedRP-mini-chat-GGUF/tree/main

😊I'm glad I solved your problem.

Sign up or log in to comment