Can I use DeepSpeed with the vision fine-tuning code?
Hello,
Thank you for sharing such a great model.
When trying to fine-tune the model using DeepSpeed ZeRO stage 2 and 3 based on your provided vision fine-tuning code, I encountered the following error.
Is this model compatible with DeepSpeed? If not, could you please guide me on how to modify the code to enable multi-GPU training?
I also found a similar issue reported in the repository below:
https://github.com/modelscope/ms-swift/issues/3380
Thank you!
[rank0]: File "/home/***/.conda/envs/****/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4111, in from_pretrained
[rank0]: model = cls(config, *model_args, **model_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper
[rank0]: f(module, *args, **kwargs)
[rank0]: File "/data/projects/***/****/src/pig_mllm/models/phi4mm.py", line 40, in __init__
[rank0]: super().__init__(config)
[rank0]: File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper
[rank0]: f(module, *args, **kwargs)
[rank0]: File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 1942, in __init__
[rank0]: self.model = Phi4MMModel(config)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper
[rank0]: f(module, *args, **kwargs)
[rank0]: File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 1633, in __init__
[rank0]: self.embed_tokens_extend = Phi4MMImageAudioEmbedding(config, **embedding_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper
[rank0]: f(module, *args, **kwargs)
[rank0]: File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 671, in __init__
[rank0]: self.image_embed = Phi4MMImageEmbedding(config, **self.image_embd_layer_kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper
[rank0]: f(module, *args, **kwargs)
[rank0]: File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 91, in __init__
[rank0]: L, D = pe_weight.size()
[rank0]: ^^^^
[rank0]: ValueError: not enough values to unpack (expected 2, got 1)
Hi, thanks for your interest in Phi-4-MM.
Unfortunately we don't plan to add official support for deepspeed. Since our lora-based model is efficient in GPU memory used by the optimizer, deepspeed zero-1 and zero-2 won't help that much. For zero-3, our experience suggests that the additional communication overhead may not worth the memory saving, given that we are training a ~4B model.
That being said, it is still possible to combine deepspeed zero. However, it usually doesn't worth it. Therefore, we would like to keep our implementation simple.