Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
**⚠Warning⚠ this is an experimental weight. It may not have practical performance.**<br>
|
2 |
+
**Also, the model file must be manually rewritten or replaced to use this weight.**<br>
|
3 |
+
|
4 |
+
The model file is available here.<br>
|
5 |
+
https://github.com/lucidrains/BS-RoFormer
|
6 |
+
|
7 |
+
The BS-Roformer has been updated in terms of architecture for the first time in a while.<br>
|
8 |
+
In the 0.5.x update, a mechanism called "Value Residual Learning" was introduced. (https://arxiv.org/abs/2410.17897)<br>
|
9 |
+
The paper argues that this mechanism can reduce the over-focus of attention and further reduce the vanishing gradient problem.
|