Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.09624

about 21 hours ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 42
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 57

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

Paper • 2408.10195 • Published Aug 19, 2024 • 13
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing

Paper • 2408.08000 • Published Aug 15, 2024 • 9
DC3DO: Diffusion Classifier for 3D Objects

Paper • 2408.06693 • Published Aug 13, 2024 • 11
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Paper • 2408.14211 • Published Aug 26, 2024 • 11

Perception and abstraction. Each modality is tokenized and embedded into vectors for model to comprehend.

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 40
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 117
Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26, 2024 • 48
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28, 2024 • 43

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 24
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 29
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 38

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Paper • 2412.14161 • Published Dec 18, 2024 • 51
Training Software Engineering Agents and Verifiers with SWE-Gym

Paper • 2412.21139 • Published Dec 30, 2024 • 22
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 82
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

Paper • 2408.00764 • Published Aug 1, 2024 • 1

Modality: Video

LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

Paper • 2402.10294 • Published Feb 15, 2024 • 25
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 71
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 140
GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 90

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Paper • 2401.09416 • Published Jan 17, 2024 • 11
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

Paper • 2401.10171 • Published Jan 18, 2024 • 14
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Paper • 2311.09217 • Published Nov 15, 2023 • 22
GALA: Generating Animatable Layered Assets from a Single Scan

Paper • 2401.12979 • Published Jan 23, 2024 • 9

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs