HDT
Collection
Data and model weights for our COLM' 24 paper, HDT: Hierarchical Document Transformer. Project page https://cli212.github.io/HDT/
•
6 items
•
Updated
•
1
To use the pre-trained model for UL2, use the following snippet:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# See the `MDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('howey/HDT-ED')
model_name = 'howey/HDT-ED'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
For more details, please see our github repository: HDT
The model, which has a context length of 8192
and is similar in size to BERT with approximately 110M
parameters,
was trained on standard UL2 task with a Transformer-based architecture using our proposed hierarchical attention.
The training regimen comprised 72 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of 2.6 billion
tokens.
For more details, please see our paper: HDT: Hierarchical Document Transformer.
Please cite our work using the bibtex below:
BibTeX:
@inproceedings{He2024COLM,
title={HDT: Hierarchical Document Transformer},
author={Haoyu He and Markus Flicke and Jan Buchmann and Iryna Gurevych and Andreas Geiger},
year={2024},
booktitle={Conference on Language Modeling}
}
Haoyu ([email protected])