File size: 159 Bytes
5fb3686
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
---
datasets:
- EleutherAI/pile
language:
- en
---

Based model but uses layernorm instead of QK.sum(-1) for the normalization, for better hardware efficiency.