AlbertShi

None yet

liked a dataset about 1 hour ago

SmallDoge/SmallThoughts

reacted to JingzeShi's post with 🚀 about 1 hour ago

liked a model 18 days ago

AlbertShi's activity

liked a dataset about 1 hour ago

reacted to JingzeShi's post with 🚀 about 1 hour ago

Post

1821

We distill a more accurate and concise dataset from DeepSeek R1, and also provide a distillation pipeline code repository.🤗

Dataset: SmallDoge/SmallThoughts
Code: https://github.com/SmallDoges/small-thoughts

liked a model 18 days ago

upvoted a paper about 1 month ago

liked 4 models about 1 month ago

upvoted a collection about 1 month ago

liked 3 models about 1 month ago

reacted to JingzeShi's post with 👍🤯👀 about 1 month ago

Post

2077

Only a single RTX 4090 running model pre-training is really slow, even for small language models!!! (https://huggingface.co/collections/JingzeShi/doge-slm-677fd879f8c4fd0f43e05458)

reacted to JingzeShi's post with 🔥 about 1 month ago

Post

1714

🤩warmup -> stable -> decay leanring rate scheduler:
😎use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint