pcunwa commited on
Commit
17b0f6d
·
verified ·
1 Parent(s): aac9a78

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ **⚠Warning⚠ this is an experimental weight. It may not have practical performance.**<br>
2
+ **Also, the model file must be manually rewritten or replaced to use this weight.**<br>
3
+
4
+ The model file is available here.<br>
5
+ https://github.com/lucidrains/BS-RoFormer
6
+
7
+ The BS-Roformer has been updated in terms of architecture for the first time in a while.<br>
8
+ In the 0.5.x update, a mechanism called "Value Residual Learning" was introduced. (https://arxiv.org/abs/2410.17897)<br>
9
+ The paper argues that this mechanism can reduce the over-focus of attention and further reduce the vanishing gradient problem.