21 9 8

Miquel Farré

mfarre

AI & ML interests

I like everything video

Recent Activity

upvoted an article 6 days ago

Announcing NVIDIA Cosmos World Foundation Models

new activity 24 days ago

HuggingFaceFV/finevideo:Cleanup TTS

liked a Space 27 days ago

HuggingFaceH4/blogpost-scaling-test-time-compute

View all activity

Articles

Organizations

mfarre's activity

upvoted an article 6 days ago

Article

Announcing NVIDIA Cosmos World Foundation Models

•

6 days ago

• 21

upvoted a paper 28 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published about 1 month ago • 137

upvoted a paper 3 months ago

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published Oct 22, 2024 • 25

upvoted 2 articles 4 months ago

Article

FineVideo: behind the scenes

Sep 23, 2024

• 27

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18, 2024

• 72

upvoted an article 5 months ago

Article

Scaling robotics datasets with video encoding

Aug 27, 2024

• 35

upvoted a paper 5 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 124

Miquel Farré

AI & ML interests

Recent Activity

Articles

SmolVLM - small yet mighty Vision Language Model

CinePile 2.0 - making stronger datasets with adversarial refinement

FineVideo: behind the scenes

Scaling robotics datasets with video encoding

Organizations

mfarre's activity

Announcing NVIDIA Cosmos World Foundation Models

FineVideo: behind the scenes

Docmatix - a huge dataset for Document Visual Question Answering

Scaling robotics datasets with video encoding