-
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 25 -
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Paper • 2502.06635 • Published • 4 -
The Curse of Depth in Large Language Models
Paper • 2502.05795 • Published • 30 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 187
jzwong
jzwong
AI & ML interests
None yet
Recent Activity
updated
a collection
4 days ago
LLM
updated
a collection
4 days ago
O1
updated
a collection
4 days ago
O1
Organizations
None yet
Collections
4
models
None public yet
datasets
None public yet