DDK: Distilling Domain Knowledge for Efficient Large Language Models Paper • 2407.16154 • Published Jul 23, 2024 • 22
LongIns: A Challenging Long-context Instruction-based Exam for LLMs Paper • 2406.17588 • Published Jun 25, 2024 • 23
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published Jun 21, 2024 • 16
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents Paper • 2406.13923 • Published Jun 20, 2024 • 23
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models Paper • 2406.01375 • Published Jun 3, 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3, 2024 • 45
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model Paper • 2404.04167 • Published Apr 5, 2024 • 13
MuPT: A Generative Symbolic Music Pretrained Transformer Paper • 2404.06393 • Published Apr 9, 2024 • 16
The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis Paper • 2404.01204 • Published Apr 1, 2024
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models Paper • 2402.13109 • Published Feb 20, 2024
ComposerX: Multi-Agent Symbolic Music Composition with LLMs Paper • 2404.18081 • Published Apr 28, 2024 • 2
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published May 29, 2024 • 47
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection Paper • 2301.01767 • Published Jan 4, 2023
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Paper • 2401.18084 • Published Jan 31, 2024