Commit
·
c8d6e52
1
Parent(s):
3c0d8f4
Include Lucy in acknowledgments, extend
Browse files
README.md
CHANGED
@@ -1,15 +1,16 @@
|
|
1 |
# BERT-Wiki-Paragraphs
|
2 |
|
3 |
-
Authors: Satya Almasian
|
|
|
4 |
Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
|
5 |
The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
|
6 |
|
7 |
Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
|
8 |
-
We utilize automatically generated samples from Wikipedia for training,
|
9 |
-
|
10 |
-
We use the same articles as ([Koshorek et al., 2018](https://arxiv.org/abs/1803.09337)),
|
11 |
albeit from a 2021 dump of Wikpeida, and split at paragraph boundaries instead of the sentence level.
|
12 |
|
13 |
## Training Setup
|
14 |
-
The model was trained for 3 epochs from
|
|
|
15 |
Training was performed on a single Titan RTX GPU over the duration of 3 weeks.
|
|
|
1 |
# BERT-Wiki-Paragraphs
|
2 |
|
3 |
+
Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
|
4 |
+
Contact us at `<lastname>@informatik.uni-heidelberg.de`
|
5 |
Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
|
6 |
The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
|
7 |
|
8 |
Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
|
9 |
+
We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.
|
10 |
+
We use the same articles as ([Koshorek et al., 2018](https://arxiv.org/abs/1803.09337)),
|
|
|
11 |
albeit from a 2021 dump of Wikpeida, and split at paragraph boundaries instead of the sentence level.
|
12 |
|
13 |
## Training Setup
|
14 |
+
The model was trained for 3 epochs from `bert-base-uncased` on paragraph pairs (limited to 512 subwork with the `longest_first` truncation strategy).
|
15 |
+
We use a batch size of 24 wit 2 iterations gradient accumulation (effective batch size of 48), and a learning rate of 1e-4, with gradient clipping at 5.
|
16 |
Training was performed on a single Titan RTX GPU over the duration of 3 weeks.
|