The bandwidth of server memory is typically around 100-200GB/s or even higher. Transferring activated experts via PCIe for inference seems feasible—are there any potential obstacles?
· Sign up or log in to comment