Text Generation
Transformers
PyTorch
llama
text-generation-inference
Inference Endpoints

Flacuna: A Vicuna made of Flan

Paper | Model | Dataset

📣 We still have numerous experiments awaiting completion (details are here), requiring additional computing resources in our lab. If any industry professionals reading this are willing to provide assistance, please feel free to reach out to us at [email protected].

Image

Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is already an excellent writing assistant, and the intention behind Flacuna was to enhance Vicuna's problem-solving capabilities. To achieve this, we curated a dedicated instruction dataset called Flan-mini.

Dataset Name Source Dataset Size
Flan2021 Flan 388K
Public Pool of Prompts Flan 320K
Natural instructions v2 Flan 200K
CoT Flan 100K
Code Search HF/code_search_net 100K
Code Contest HF/deepmind/code_contests 50K
Apps HF/codeparrot/apps 50K
GPT4-Alpaca GPT-4 52K
Code-Alpaca ChatGPT 20K
ShareGPT ChatGPT 60K
Total - 1.34M

Problem Solving Ability

As a result of this fine-tuning process, Flacuna exhibited notable performance improvements in problem-solving across multiple benchmark datasets, both in few-shot and zero-shot settings.

Model Size MMLU (5-shot) BBH (3-shot) DROP (3-shot) CRASS (3-shot) HumanEval (0-shot) Avg.
StableVicuna 13B 49.2 (+3.0) 37.5 (+0.4) 34.3 (-1.0) 67.5 (+8.7) 15.9 (+2.5) 40.9 (+2.7)
Vicuna 13B 50.6 (+4.5) 37.6 (+0.5) 32.6 (-3.0) 60.9 (+2.1) 11.6 (-1.8) 38.7 (+0.6)
Flacuna 13B 51.1 (+5.0) 39.3 (+2.2) 43.6 (+8.0) 74.1 (+15.3) 11.0 (-2.4) 43.8 (+5.6)
Model Size MMLU (0-shot) BBH (0-shot) CRASS (0-shot)
StableVicuna 13B 47.5 18.5 64.2
Vicuna 13B 48.3 28.3 65.7
Flacuna 13B 49.4 32.5 67.9

During training, Flacuna is a 13B checkpoint of LLaMA and employed a maximum input sequence length of 1280. We utilized LoRA for parameter-efficient fine-tuning.

Chatbot / Writing Assistant

While Flacuna primarily excels in problem-solving tasks, we made efforts to maintain the impressive writing and chatting ability of Vicuna. To achieve this, we incorporated conversational datasets generated by GPT-4, such as GPT-4-Alpaca and ShareGPT, into the Flan-mini collection. To use Flacuna as a chatbot or writing assistant, we recommend you use the following template:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {definition of the task}.\n\n
{question}\n
Output: ASSISTANT:

Please note that we still recommend using Vicuna as your preferred Chatbot or Writing Assistant, over Flacuna. Flacuna's primary strength lies in problem-solving tasks, making it ideal for such applications.

The following table presents the writing performance of Flacuna on the IMPACT dataset, which is a component of the InstructEval evaluation suite. The generated responses have been evaluated by ChatGPT, and their relevance and coherence have been scored on a scale of 1 to 5.

Model Size Informative Rel. Informative Coh. Professional Rel. Professional Coh. Argumentative Rel. Argumentative Coh. Creative Rel. Creative Coh. Avg. Rel. Avg. Coh.
ChatGPT - 3.34 3.98 3.88 3.96 3.96 3.82 3.92 3.94 3.78 3.93
Flan-Alpaca 11B 3.56 3.46 3.54 3.70 3.22 3.28 3.70 3.40 3.51 3.46
Flan-T5 11B 2.64 3.24 2.62 3.22 2.54 3.40 2.50 2.72 2.58 3.15
Dolly-V2 12B 3.54 3.64 2.96 3.74 3.66 3.20 3.02 3.18 3.30 3.44
StableVicuna 13B 3.54 3.64 2.96 3.74 3.30 3.20 3.02 3.18 3.21 3.44
Vicuna 13B 3.60 3.96 3.74 3.82 3.82 3.56 3.82 3.92 3.75 3.82
Flacuna 13B 3.02 3.42 3.48 3.52 3.38 3.02 3.92 3.80 3.45 3.44

Basic Usage

git clone https://huggingface.co/declare-lab/flacuna-13b-v1.0
cd flacuna-13b-v1.0
python flacuna.py

Training or Fine-tuning Flacuna

The trainer codes are available here: https://github.com/declare-lab/flacuna.

License

Non-commercial license

Citation

@misc{ghosal2023flacuna,
      title={Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning}, 
      author={Deepanway Ghosal and Yew Ken Chia and Navonil Majumder and Soujanya Poria},
      year={2023},
      eprint={2307.02053},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
21
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train declare-lab/flacuna-13b-v1.0

Space using declare-lab/flacuna-13b-v1.0 1