Can I use DeepSpeed with the vision fine-tuning code?

#35
by GoGiants1 - opened

Hello,
Thank you for sharing such a great model.
When trying to fine-tune the model using DeepSpeed ZeRO stage 2 and 3 based on your provided vision fine-tuning code, I encountered the following error.
Is this model compatible with DeepSpeed? If not, could you please guide me on how to modify the code to enable multi-GPU training?

I also found a similar issue reported in the repository below:
https://github.com/modelscope/ms-swift/issues/3380

Thank you!

[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4111, in from_pretrained                                
[rank0]:     model = cls(config, *model_args, **model_kwargs)                                                                                                                    
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                    
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/data/projects/***/****/src/pig_mllm/models/phi4mm.py", line 40, in __init__                                                                          
[rank0]:     super().__init__(config)                                                                                                                                            
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 1942, in __init__                                                                                                                                                         
[rank0]:     self.model = Phi4MMModel(config)                                                                                                                                    
[rank0]:                  ^^^^^^^^^^^^^^^^^^^                                                                                                                                    
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 1633, in __init__                                                                                                                                                         
[rank0]:     self.embed_tokens_extend = Phi4MMImageAudioEmbedding(config, **embedding_config)                                                                                    
[rank0]:                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                    
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 671, in __init__                                                                                                                                                          
[rank0]:     self.image_embed = Phi4MMImageEmbedding(config, **self.image_embd_layer_kwargs)                                                                                     
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                     
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 91, in __init__                                                                                                                                                           
[rank0]:     L, D = pe_weight.size()
[rank0]:     ^^^^
[rank0]: ValueError: not enough values to unpack (expected 2, got 1)

Hi, thanks for your interest in Phi-4-MM.

Unfortunately we don't plan to add official support for deepspeed. Since our lora-based model is efficient in GPU memory used by the optimizer, deepspeed zero-1 and zero-2 won't help that much. For zero-3, our experience suggests that the additional communication overhead may not worth the memory saving, given that we are training a ~4B model.

That being said, it is still possible to combine deepspeed zero. However, it usually doesn't worth it. Therefore, we would like to keep our implementation simple.

Sign up or log in to comment