Longformer Encoder-Decoder (LED) fine-tuned on ILC

This model is a fine-tuned version of led-base-16384 on the ILC dataset.

As described in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan, led-base-16384 was initialized from bart-base since both models share the exact same architecture. To be able to process 16K tokens, bart-base's position embedding matrix was simply copied 16 times.


from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
device = "cuda" if torch.cuda.is_available() else "CPU"

checkpoint = "d0r1h/led-base-ilc"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, return_dict_in_generate=True).to(device)
case = "......."
input_ids = tokenizer(case, return_tensors="pt").input_ids.to(device)
global_attention_mask = torch.zeros_like(input_ids)
global_attention_mask[:, 0] = 1
sequences = model.generate(input_ids, 
                           global_attention_mask=global_attention_mask).sequences
summary = tokenizer.batch_decode(sequences, 
                                 skip_special_tokens=True)
                                 

Evaluation results

When the model is used for summarizing ILC documents(10 samples), it achieves the following results:

Model rouge1-f rouge1-p rouge2-f rouge2-p rougeL-f rougeL-p
led-ilc 42 47 22 24 39 44
led-base 3 39 1 21 3 37

This notebook shows how led can effectively be used for downstream tasks such as summarization.

Downloads last month
41
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using d0r1h/led-base-ilc 1