OmniTab

OmniTab is a table-based QA model proposed in OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering. The original Github repository is https://github.com/jzbjyb/OmniTab.

Description

neulab/omnitab-large-16shot (based on BART architecture) is initialized with microsoft/tapex-large and continuously pretrained on natural and synthetic data (SQL2NL model trained in the 16-shot setting).

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import pandas as pd

tokenizer = AutoTokenizer.from_pretrained("neulab/omnitab-large-16shot")
model = AutoModelForSeq2SeqLM.from_pretrained("neulab/omnitab-large-16shot")

data = {
    "year": [1896, 1900, 1904, 2004, 2008, 2012],
    "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
}
table = pd.DataFrame.from_dict(data)

query = "In which year did beijing host the Olympic Games?"
encoding = tokenizer(table=table, query=query, return_tensors="pt")

outputs = model.generate(**encoding)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# [' 2008']

Reference

@inproceedings{jiang-etal-2022-omnitab,
  title = "{O}mni{T}ab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering",
  author = "Jiang, Zhengbao and Mao, Yi and He, Pengcheng and Neubig, Graham and Chen, Weizhu",
  booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
  month = jul,
  year = "2022",
}
Downloads last month
22
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train neulab/omnitab-large-16shot