Apply for community grant: Academic project (gpu)

#1
by ASLP-lab - opened

We have conducted the following work: OSUM (Open Speech Understanding Model), a novel approach to developing Speech Understanding Language Models (SULMs) with limited academic resources. OSUM aims to bridge the gap between industry-scale models and academic research by providing a transparent and efficient framework for training SULMs. It utilizes tens of thousands of hours of multi-task data and employs a multi-stage training process to develop a model capable of multi-level audio understanding. This model lays the foundation for the eventual realization of comprehensive audio understanding capabilities.

We hope to have a public demo page where people can intuitively experience the performance of OSUM. Its inference requires approximately 18GB of GPU memory, as its base model is a 7B-parameter LLM. For more detailed information, please refer to the following paper: https://arxiv.org/abs/2501.13306.

Another contribution of this project is the comprehensive open-sourcing of training and inference code to facilitate further development in this field. The official code repository can be found at: https://github.com/ASLP-lab/OSUM.

Hi @ASLP-lab , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

Sign up or log in to comment