Spaces:
Running
on
L40S
Apply for community grant: Personal project (gpu)
Apply for community grant: Academic project (gpu). We develop an identity-preserving text-to-video generation model, ConsisID, which can keep human-identity consistent in the generated video.
arxiv: https://arxiv.org/abs/2411.17440
paper: https://huggingface.co/papers/2411.17440
page: https://pku-yuangroup.github.io/ConsisID/
code: https://github.com/PKU-YuanGroup/ConsisID
Hi @BestWishYsh , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.
thanks, let me check
BTW, it would be nice if you could provide more info, like arXiv link, GitHub URL, etc., when opening a grant request. Your request was tagged as a spam as it didn't have meaningful info.
i am sorry, thanks for your suggestion
@hysts hi, I have set @spaces.GPU(), but it still shows RuntimeError: No CUDA GPUs are available. Do you know how to fix it, thanks.
Can you provide more info about the error, like a stack trace?
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 135, in worker_init
torch.init(nvidia_uuid)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 354, in init
torch.Tensor([0]).cuda()
File "/usr/local/lib/python3.10/site-packages/torch/cuda/init.py", line 314, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 214, in gradio_handler
raise res.value
RuntimeError: No CUDA GPUs are available
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/asyncio.py", line 943, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 184, in gradio_handler
schedule_response = client.schedule(task_id=task_id, request=request, duration=duration)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/client.py", line 119, in schedule
raise gr.Error(
gradio.exceptions.Error: 'The requested GPU duration (240s) is larger than the maximum allowed'
Is duration=240s too long?
Thanks! I've been looking into the issue, but so far haven't been able to resolve it. The fact that this Space takes like 10 minutes or so just to restart is making debugging difficult. Anyway, I'll let you know once I find something.
Well, not 100% sure what is the cause of the error, but looks like this diff fixes the CUDA error:
diff --git a/app.py b/app.py
index e119d2b..60bb4de 100644
--- a/app.py
+++ b/app.py
@@ -122,7 +122,6 @@ def infer(
num_inference_steps: int,
guidance_scale: float,
seed: int = 42,
- progress=gr.Progress(track_tqdm=True),
):
if seed == -1:
seed = random.randint(0, 2**8 - 1)
@@ -170,6 +169,38 @@ def infer(
return (video_pt, seed)
[email protected](duration=180)
+def generate(
+ prompt,
+ image_input,
+ seed_value,
+ scale_status,
+ rife_status,
+):
+ latents, seed = infer(
+ prompt,
+ image_input,
+ num_inference_steps=50,
+ guidance_scale=7.0,
+ seed=seed_value,
+ )
+ if scale_status:
+ latents = upscale_batch_and_concatenate(upscale_model, latents, device)
+ if rife_status:
+ latents = rife_inference_with_latents(frame_interpolation_model, latents)
+
+ batch_size = latents.shape[0]
+ batch_video_frames = []
+ for batch_idx in range(batch_size):
+ pt_image = latents[batch_idx]
+ pt_image = torch.stack([pt_image[i] for i in range(pt_image.shape[0])])
+
+ image_np = VaeImageProcessor.pt_to_numpy(pt_image)
+ image_pil = VaeImageProcessor.numpy_to_pil(image_np)
+ batch_video_frames.append(image_pil)
+ return batch_video_frames
+
+
def convert_to_gif(video_path):
clip = VideoFileClip(video_path)
gif_path = video_path.replace(".mp4", ".gif")
@@ -320,8 +351,8 @@ with gr.Blocks() as demo:
</table>
""")
- @spaces.GPU(duration=180)
- def generate(
+
+ def run(
prompt,
image_input,
seed_value,
@@ -329,29 +360,11 @@ with gr.Blocks() as demo:
rife_status,
progress=gr.Progress(track_tqdm=True)
):
- latents, seed = infer(
- prompt,
- image_input,
- num_inference_steps=50,
- guidance_scale=7.0,
- seed=seed_value,
- progress=progress,
- )
- if scale_status:
- latents = upscale_batch_and_concatenate(upscale_model, latents, device)
- if rife_status:
- latents = rife_inference_with_latents(frame_interpolation_model, latents)
-
- batch_size = latents.shape[0]
- batch_video_frames = []
- for batch_idx in range(batch_size):
- pt_image = latents[batch_idx]
- pt_image = torch.stack([pt_image[i] for i in range(pt_image.shape[0])])
-
- image_np = VaeImageProcessor.pt_to_numpy(pt_image)
- image_pil = VaeImageProcessor.numpy_to_pil(image_np)
- batch_video_frames.append(image_pil)
-
+ batch_video_frames = generate(prompt,
+ image_input,
+ seed_value,
+ scale_status,
+ rife_status)
video_path = save_video(batch_video_frames[0], fps=math.ceil((len(batch_video_frames[0]) - 1) / 6))
video_update = gr.update(visible=True, value=video_path)
gif_path = convert_to_gif(video_path)
@@ -361,7 +374,7 @@ with gr.Blocks() as demo:
return video_path, video_update, gif_update, seed_update
generate_button.click(
- generate,
+ fn=run,
inputs=[prompt, image_input, seed_param, enable_scale, enable_rife],
outputs=[video_output, download_video_button, download_gif_button, seed_text],
)
This should resolve the current error, but when I run the Space, the inference took more than the specified duration, which is 180 seconds, and the "GPU task aborted error" occurred. Maybe you can decrease the number of inference steps to avoid the error, though.
Thanks a lot, but is the maximum time of ZeroGPU only 180s? Reducing the step will seriously reduce the generated quality ...
It seems that the longest duration is only 120s
https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/118
But your Space doesn't seem to raise the maximum duration error with duration=180
.
The discussion is a bit old and the bug that showed -1 day or something has already been fixed, so I'm not sure but maybe the 120 second issue in the discussion might have been fixed as well.
Does this error mean that users need to subscribe to enterprise hub for it to run the hf space? But it seems that other spaces using zerogpu do not need ...
Ah, ok, it involves a couple of known issues of spaces
.
- There's a bug where the quota error is raised when the remaining quota is exactly the same as the specified
duration
. - There's a bug where users are considered not logged in when the
@spaces.GPU
decorator is added to inner functions.
You can avoid the second issue by adding an attribute to the outer function, like wrapper_fn.zergpu = True
.
In your case, adding the following line after the definition of the run
function would fix the second issue.
run.zerogpu = True
If I remember correctly, logged-in users have 300 seconds of quota, so they don't have to subscribe to PRO.
Thanks, let me try it
Hmm, I see. Thanks for checking. This is the way @cbensimon told me before and it worked back then, but something might have been changed since then. Let's wait for @cbensimon 's response.
Oh, weird.
I think it would be great if this Space can run on ZeroGPU, but maybe we can assign L40S with 1 hour sleep time in the meantime if your Space can run on it.
OK, I just switched the hardware to L40S to see if it works.
thanks a lot, let's wait for it to restart. And should we add "run.zerogpu = True" when using L40S?
And should we add "run.zerogpu = True" when using L40S?
No, it should work without it. I doesn't affect normal GPU execution.
Also, spaces
does nothing on Spaces with normal GPU, so you don't have to remove the @spaces.GPU
decorator either.
The space runs normally (with L40S). It seems that ZeroGPU has some hidden bugs.
Looks like CUDA OOM occurred.
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, *args)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
response = f(*args, **kwargs)
File "/home/user/app/app.py", line 350, in run
batch_video_frames, seed = generate(
File "/home/user/app/app.py", line 156, in generate
video_pt = pipe(
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/user/app/models/pipeline_consisid.py", line 883, in __call__
video = self.decode_latents(latents)
File "/home/user/app/models/pipeline_consisid.py", line 463, in decode_latents
frames = self.vae.decode(latents).sample
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1278, in decode
decoded = self._decode(z).sample
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 1249, in _decode
z_intermediate, conv_cache = self.decoder(z_intermediate, conv_cache=conv_cache)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 970, in forward
hidden_states, new_conv_cache[conv_cache_key] = up_block(
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 647, in forward
hidden_states, new_conv_cache[conv_cache_key] = resnet(
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 291, in forward
hidden_states, new_conv_cache["norm1"] = self.norm1(hidden_states, zq, conv_cache=conv_cache.get("norm1"))
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py", line 187, in forward
new_f = norm_f * conv_y + conv_b
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.32 GiB. GPU 0 has a total capacity of 44.32 GiB of which 181.25 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 35.02 GiB is allocated by PyTorch, and 7.06 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
oh, let's try again by adding thes two lines
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
the space can generate videos normally now, and it seems that only about 19G of gpu memory is needed
Awesome!
Also, thanks for your efforts in debugging the ZeroGPU issues.
Ah, ok, it involves a couple of known issues of
spaces
.
- There's a bug where the quota error is raised when the remaining quota is exactly the same as the specified
duration
.- There's a bug where users are considered not logged in when the
@spaces.GPU
decorator is added to inner functions.You can avoid the second issue by adding an attribute to the outer function, like
wrapper_fn.zergpu = True
.
In your case, adding the following line after the definition of therun
function would fix the second issue.
run.zerogpu = True
If I remember correctly, logged-in users have 300 seconds of quota, so they don't have to subscribe to PRO.
Hi
@hysts
,
Is the 2ns bug mentioned here solved now? Because i'm facing exactly this situation with my space https://huggingface.co/spaces/filapro/cad-recode
Hi
@filapro
No, unfortunately, it hasn’t been fixed yet. I saw some discussions about it internally a few days ago, so as far as I know, it’s still being worked on.
As I mentioned above, you should be able to work around the issue by adding something like wrapper_fn.zerogpu = True
to the wrapper function.
I just duplicated your Space and tested, but adding run_test_safe.zerogpu = True
fixes the issue for your Space. Can you try that?
Where I have to add that "wrapper_fn.zerogpu = True" or "run.zerogpu= True" line in my code?
this is my code
https://gist.github.com/RageshAntonyHM/f4442b079c3a9eefca90d05924b24742
@RageshAntony
Your code already has the outermost function decorated, so you don't need this workaround.
This workaround matters only when the function set in the gradio event trigger (in your case, run_button.click
on line 319) is not decorated with @spaces.GPU
and instead calls another decorated function. For example, if your generate_all
function is not decorated and generate_image
is decorated instead, then you can add generate_all.zerogpu = True
after the definition of the function, like in line 312 or something.
Yeah. I found the fix in another thread and moved @spaces.GPU to generate_all and it worked. But thanks because your func_name.zerogpu is a needed one though