Shwai He's picture

1 8 14

Shwai He

Shwai

·

https://shwai-he.github.io/

Shwai-He

AI & ML interests

Deep Learning, Mechine Learning, Natural Language Processing.

Recent Activity

upvoted a paper about 14 hours ago

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

commented on a paper about 14 hours ago

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

upvoted a paper about 1 month ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

View all activity

Organizations

Shwai's activity

upvoted a paper about 14 hours ago

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Paper • 2503.05066 • Published 6 days ago • 3

commented a paper about 14 hours ago

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Paper • 2503.05066 • Published 6 days ago • 3 •

upvoted 2 papers about 1 month ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 345

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 18

liked 14 models 5 months ago

s1ghhh/Mistral-7B-v0.1-Drop4Attn

Updated Sep 8, 2024 • 9 • 2

s1ghhh/Llama-2-13b-Drop8Block

Updated Sep 8, 2024 • 8 • 2

s1ghhh/Llama-2-13b-Drop4Block

Updated Sep 8, 2024 • 11 • 2

s1ghhh/Llama-2-13b-Drop8Attn

Updated Sep 8, 2024 • 12 • 2

s1ghhh/Llama-2-13b-Drop4Attn

Updated Sep 8, 2024 • 7 • 2

s1ghhh/Llama-2-13b-Drop4MLP

Updated Sep 8, 2024 • 8 • 2

s1ghhh/Llama-2-13b-Drop8MLP

Updated Sep 8, 2024 • 15 • 2

s1ghhh/Mistral-7B-v0.1-Drop4Block

Updated Sep 8, 2024 • 10 • 2

s1ghhh/Mistral-7B-v0.1-Drop8Block

Updated Sep 8, 2024 • 7 • 2

s1ghhh/Mistral-7B-v0.1-Drop8Attn

Updated Sep 8, 2024 • 13 • 2

s1ghhh/Mistral-7B-v0.1-Drop4MLP

Updated Sep 8, 2024 • 9 • 2

s1ghhh/Mistral-7B-v0.1-Drop8MLP

Updated Sep 8, 2024 • 9 • 2

s1ghhh/Llama-3-70b-Drop

Text Generation • Updated Oct 23, 2024 • 10 • 3

s1ghhh/Llama-2-70b-Drop

Text Generation • Updated Oct 23, 2024 • 14 • 2

authored a paper 5 months ago

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

Paper • 2410.13184 • Published Oct 17, 2024 • 2

upvoted a collection 5 months ago

LLM-Drop

Model weights of paper "What Matters in Transformers? Not All Attention is Needed" (https://arxiv.org/abs/2406.15786) • 14 items • Updated Oct 23, 2024 • 4