Now in 5 languages!
Dense Grounded Understanding of Images and Videos
Text to Audio (Sound SFX) Generator
Video Super-Resolution with Text-to-Video Model
Vision Transformer Attention Visualization
https://huggingface.co/papers/2501.03006
Gaze detection using Moondream