dennlinger commited on
Commit
c8d6e52
·
1 Parent(s): 3c0d8f4

Include Lucy in acknowledgments, extend

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -1,15 +1,16 @@
1
  # BERT-Wiki-Paragraphs
2
 
3
- Authors: Satya Almasian*, Dennis Aumiller*, Michael Gertz
 
4
  Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
5
  The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
6
 
7
  Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
8
- We utilize automatically generated samples from Wikipedia for training,
9
- where paragrahs from within the same section are assumed to be topically coherent.
10
- We use the same articles as ([Koshorek et al., 2018](https://arxiv.org/abs/1803.09337)),
11
  albeit from a 2021 dump of Wikpeida, and split at paragraph boundaries instead of the sentence level.
12
 
13
  ## Training Setup
14
- The model was trained for 3 epochs from the "bert-base-uncased" checkpoint on paragraph pairs (truncated to 512 max length).
 
15
  Training was performed on a single Titan RTX GPU over the duration of 3 weeks.
 
1
  # BERT-Wiki-Paragraphs
2
 
3
+ Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
4
+ Contact us at `<lastname>@informatik.uni-heidelberg.de`
5
  Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
6
  The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
7
 
8
  Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
9
+ We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.
10
+ We use the same articles as ([Koshorek et al., 2018](https://arxiv.org/abs/1803.09337)),
 
11
  albeit from a 2021 dump of Wikpeida, and split at paragraph boundaries instead of the sentence level.
12
 
13
  ## Training Setup
14
+ The model was trained for 3 epochs from `bert-base-uncased` on paragraph pairs (limited to 512 subwork with the `longest_first` truncation strategy).
15
+ We use a batch size of 24 wit 2 iterations gradient accumulation (effective batch size of 48), and a learning rate of 1e-4, with gradient clipping at 5.
16
  Training was performed on a single Titan RTX GPU over the duration of 3 weeks.