Use memory to store inactive experts

#45
by xm10086 - opened

The bandwidth of server memory is typically around 100-200GB/s or even higher. Transferring activated experts via PCIe for inference seems feasible—are there any potential obstacles?

Sign up or log in to comment