Running 2.23k 2.23k The Ultra-Scale Playbook π The ultimate guide to training LLM on large GPU Clusters
Congliu/Chinese-DeepSeek-R1-Distill-data-110k Viewer β’ Updated 20 days ago β’ 110k β’ 7.74k β’ 519
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs By wolfram β’ Dec 4, 2024 β’ 76
Retentive Network: A Successor to Transformer for Large Language Models Paper β’ 2307.08621 β’ Published Jul 17, 2023 β’ 170 β’ 34