microsoft/Phi-4-multimodal-instruct · Can I use DeepSpeed with the vision fine-tuning code?

Hello,
Thank you for sharing such a great model.
When trying to fine-tune the model using DeepSpeed ZeRO stage 2 and 3 based on your provided vision fine-tuning code, I encountered the following error.
Is this model compatible with DeepSpeed? If not, could you please guide me on how to modify the code to enable multi-GPU training?
I also found a similar issue reported in the repository below:
https://github.com/modelscope/ms-swift/issues/3380
Thank you!
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4111, in from_pretrained                                
[rank0]:     model = cls(config, *model_args, **model_kwargs)                                                                                                                    
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                    
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/data/projects/***/****/src/pig_mllm/models/phi4mm.py", line 40, in __init__                                                                          
[rank0]:     super().__init__(config)                                                                                                                                            
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 1942, in __init__                                                                                                                                                         
[rank0]:     self.model = Phi4MMModel(config)                                                                                                                                    
[rank0]:                  ^^^^^^^^^^^^^^^^^^^                                                                                                                                    
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 1633, in __init__                                                                                                                                                         
[rank0]:     self.embed_tokens_extend = Phi4MMImageAudioEmbedding(config, **embedding_config)                                                                                    
[rank0]:                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                    
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 671, in __init__                                                                                                                                                          
[rank0]:     self.image_embed = Phi4MMImageEmbedding(config, **self.image_embd_layer_kwargs)                                                                                     
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                     
[rank0]:   File "/home/***/.conda/envs/****/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 511, in wrapper                         
[rank0]:     f(module, *args, **kwargs)                                                                                                                                          
[rank0]:   File "/home/***/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/985802b4e1db71df6d366368508d5b30bd743c42/modeling_phi4mm.py"
, line 91, in __init__                                                                                                                                                           
[rank0]:     L, D = pe_weight.size()
[rank0]:     ^^^^
[rank0]: ValueError: not enough values to unpack (expected 2, got 1)