IntenLM-20B is officially released on Hugging Face Hub

Community Article Published September 30, 2023

For more information about InternLM,Welcome to follow our twitter: https://twitter.com/intern_lm

We are thrilled to introduce our new models —— InternLM-20B. InternLM-20B was pre-trained over 2.3T tokens. It is contributed by Shanghai AI Laboratory and researchers from different universities and companies. Now, it is available on HuggingFace!

You can download the models and try its demo on Huggingface with following links:

Model:
Base Model: https://huggingface.co/internlm/internlm-20b
Chat Model: https://huggingface.co/internlm/internlm-chat-20b

Appication: https://huggingface.co/spaces/BridgeEight/internlm-20B-chat-w4-turbomind (support by community @BridgeEight)
In this blog, we will introduce InternLM-20B, explain its advantages and how to play with it.

Introduction

InternLM-20B contains two large language models pre-trained on over 2.3T Tokens, containing high-quality English, Chinese, and code data. Additionally, the Chat version has undergone SFT and RLHF training, enabling it to better and more securely meet users' needs.In terms of model structure, InternLM-20B opted for a deeper architecture, with a depth set at 60 layers.Furthermore, cthe pre-training data used for InternLM-20B underwent higher quality cleansing and was supplemented with data rich in knowledge and designed for reinforcing understanding and reasoning capabilities.Thus, it exhibits significant improvements in understanding, reasoning, mathematical, and programming abilities—all of which test the technical proficiency of language models.

Highlights

Compared to previous models, InternLM-20B features the following characteristics:

  • Outstanding overall performance
    InternLM-20B has excellent overall performance. Not only surpass open-source models of similar scale (including Llama-33B, Llama2-13B, and many other 7B, 13B models), but also achieving better scores comparable to Llama2-70B.
  • Strong utility invocation capability
    InternLM-20B expands the boundaries of the model capabilities, building better connection between large models and real-world scenarios. InternLM-20B supports dozens of plugins and thousands of API functions, obtaining the best results on the ToolBench test set. In compare with ChatGPT, it achieved a win rate of 61.7%. Additionally, InternLM-20B possesses code interpreter and self-correction abilities, providing a solid technical foundation for building intelligent agents.
  • Supports a 16k context length
    InternLM-20B extends the context window up to 16,000 tokens which will better support long-context understanding, long text generation, and ultra-long dialogue.
  • Better value alignment InternLM-20B is more safe and reliable in value alignment comparing with previous models. During the training process, we carried out a two-stage value alignment based on Supervised Fine-Tuning (SFT) and Reinforcement Learning based on Human Feedback (RLHF). It significantly improved its security through expert red team adversarial training. It can better deal with biased questions, and provide positive guidance.
  • Abondon open-source tools and training data
    We provide various toolkits and open-source dataset besides InternLM models. Including the pre-training toolkit InternLM-Train, efficiently fine-tuning toolkit XTuner, the compressing and deploying toolkit LMDeploy, the evaluation toolkit OpenCompass, and a lightweight framework for building LLM-based agents Lagent. Those tookits, together with the open-source data platform OpenDataLab, forms a powerful open-source tools and data system, jointly providing end-to-end research and application support for academia and the industry.
    For more information about InternLM-20B, please visit https://github.com/InternLM/InternLM

Usage

Deploying InternLM-20B with LMDeploy

We recommend using LMDeploy for 4-bit quantization and inference capabilities to deploy the InternLM-20B model. Compared to FP16 inference, LMDeploy 4-bit quantized inference not only reduces the model's memory usage by over 60%, but more importantly, with extreme optimized kernel, the inference performance has not been compromised. Instead, it's more than twice times the speed of FP16 inference on A100.

Batch Size Data type Input Token Number Output Token Number Token throughput (toiken/s) mem(GB)
1 FP16 256 512 33.64 41.39
1 W4A16 256 512 79.12 15.67
16 FP16 256 512 409.69 77.21
16 W4A16 256 512 708.76 51.48

Here are the steps for quickly deploying and chatting with the InternLM-20B-4bit model:

  1. Install lmdeploy
pip install 'lmdeploy>=0.0.9'
  1. Download InternLM-20B-4bit Model
git-lfs install
git clone https://huggingface.co/internlm/internlm-chat-20b-4bit
  1. Convert model
python3 -m lmdeploy.serve.turbomind.deploy internlm-chat \
    --model-path ./internlm-chat-20b-4bit \
    --model-format awq \
    --group-size 128
  1. Launch gradio Service
python3 -m lmdeploy.serve.gradio.app ./workspace --server_name {ip_addr} --server_port {port}

For more information about LMDeploy, please visit https://github.com/InternLM/lmdeploy

Using XTuner to Fine-Tune InternLM-20B on a Single 24G GPU

XTuner is a low-cost large model training and fine-tuning toolbox, developed by Shanghai Artificial Intelligence Laboratory. Using XTuner, fine-tuning the InternLM-20B requires only 24G of memory -- easily achievable with a single RTX3090!

Presently, XTuner offers support for full-parameter, LoRA, and QLoRA fine-tuning of large language models. It seamlessly integrates DeepSpeed ZeRO 2/3 optimization techniques and is compatible with a wide range of popular open-source datasets such as Alpaca and OpenAssistant, among others. XTuner is designed to be user-friendly, allowing users to utilize it straight "out of the box"!

Hardware Requirements

Model Fine-tuning Type Minimum Resources Example Device
InternLM-20B Full-parameter w/ ZeRO-3 550GB 8x A100 80GB
LoRA w/ ZeRO-3 150GB 2x A100 80GB
QLoRA 24GB 1x 3090 24GB

Quick Start

With only two commands, we can achieve QLoRA fine-tuning of InternLM-20B on a 24G GPU (taking the oasst1 dataset as an example).

pip install xtuner
xtuner train internlm_20b_qlora_oasst1_512_e3

At the same time, XTuner has provided many ready-to-use InternLM-20B fine-tuning configurations, and we can view them with:

xtuner list-cfg -p internlm_20b

For professional developers with specific needs such as custom training processes or custom training data, XTuner offers an exportable example configuration file. This can be customized and modified for a flexible configuration to meet diverse requirements:

xtuner copy-cfg internlm_20b_qlora_oasst1_512_e3 ${SAVE_PATH}

For more information about XTuner, please visit https://github.com/InternLM/xtuner