Running 20 20 Zero Bubble Pipeline Parallellism 🏆 Optimize pipeline schedules for efficient computing
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding Paper • 2401.09149 • Published Jan 17, 2024 • 1
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning Paper • 2311.00257 • Published Nov 1, 2023 • 10