Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.02495

Sparse Mixture of Experts datasets for mathematical reasoning and complex calculations.

open-r1/OpenR1-Math-220k

Viewer • Updated 20 days ago • 450k • 47.1k • 477
cognitivecomputations/dolphin-r1

Viewer • Updated Jan 30 • 814k • 5.17k • 270
open-r1/OpenThoughts-114k-math

Viewer • Updated Jan 30 • 89.1k • 2.27k • 72
open-thoughts/OpenThoughts-114k

Viewer • Updated 18 days ago • 228k • 93.7k • 648

On Domain-Specific Post-Training for Multimodal Large Language Models

Paper • 2411.19930 • Published Nov 29, 2024 • 27
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published 4 days ago • 78
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer

Paper • 2503.02495 • Published 6 days ago • 7
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective

Paper • 2503.01933 • Published 7 days ago • 10

Mixture-of-Experts and Co

about 19 hours ago

Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer

Paper • 2503.02495 • Published 6 days ago • 7

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 84
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 27 days ago • 47
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer

Paper • 2503.02495 • Published 6 days ago • 7

about 4 hours ago

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 28
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 14
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 32

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs