|
--- |
|
language: |
|
- en |
|
library_name: stable-audio-tools |
|
license: other |
|
license_name: stabilityai-ai-community |
|
license_link: LICENSE.md |
|
pipeline_tag: audio-to-audio |
|
extra_gated_prompt: >- |
|
By clicking "Agree", you agree to the [License |
|
Agreement](https://huggingface.co/stabilityai/stable-audio-open-1.0/blob/main/LICENSE.md) |
|
and acknowledge Stability AI's [Privacy |
|
Policy](https://stability.ai/privacy-policy). |
|
extra_gated_fields: |
|
Name: text |
|
Email: text |
|
Country: country |
|
Organization or Affiliation: text |
|
Receive email updates and promotions on Stability AI products, services, and research?: |
|
type: select |
|
options: |
|
- 'Yes' |
|
- 'No' |
|
--- |
|
|
|
|
|
|
|
|
|
# stable-codec-speech-16k Model Card |
|
![arch](arch.png "Architecture") |
|
`stable-codec-speech-16k` is a Transformer-based codec model designed for high-quality, low-bitrate audio coding. It processes audio waveforms by encoding them into discrete tokens, which can later be decoded back into the original audio waveform. |
|
|
|
Please note: For individuals or organizations generating annual revenue of US $1,000,000 (or local currency equivalent) or more, regardless of the source of that revenue, you must obtain an enterprise commercial license directly from Stability AI before commercially using Stable Codec, any derivative work of Stable Codec (such as a “fine tune” model), or their outputs. You may submit a request for an Enterprise License at https://stability.ai/enterprise. Please refer to Stability AI's Community License, available at https://stability.ai/license, for more information. |
|
|
|
|
|
### Model Description |
|
|
|
* **Developed by**: [Stability AI](https://stability.ai/) |
|
* **Model type**: Transformer audio codec model |
|
* **Model details**: This released model is a speech codec designed to compress real-world speech data into a suitable format for generative modeling. It provides a foundational tool for developing downstream applications in speech understanding and generation, such as text-to-speech systems and conversational AI models. |
|
Please check our [arXiv page](https://arxiv.org/abs/2411.19842) and [Github repo](https://github.com/Stability-AI/stable-codec) for details. |
|
|
|
|
|
### License |
|
|
|
- **Community License:** Free for research, non-commercial, and commercial use by organizations and individuals generating annual revenue of US $1,000,000 (or local currency equivalent) or more, regardless of the source of that revenue. If your annual revenue exceeds US $1M, any commercial use of this model or derivative works thereof requires obtaining an Enterprise License directly from Stability AI. You may submit a request for an Enterprise License at https://stability.ai/enterprise. Please refer to Stability AI's Community License, available at https://stability.ai/license, for more information. |
|
|
|
|
|
### Model Sources |
|
|
|
* **Repository**: https://github.com/Stability-AI/stable-codec |
|
* **Audio demos**: https://stability-ai.github.io/stable-codec-demo/ |
|
* **arXiv page**: https://arxiv.org/abs/2411.19842 |
|
|
|
|
|
### Training Dataset |
|
The model was trained on datasets derived from creative commons or public domain audiobook recordings. See [academic paper](https://arxiv.org/abs/2411.19842) for more details. |
|
|
|
## Usage |
|
|
|
For usage instructions, please refer to our [GitHub repository](https://github.com/Stability-AI/stable-codec) |
|
|
|
|
|
### Intended Uses |
|
|
|
Intended uses include the following: |
|
Efficient compression of speech signals for storage or streaming purposes. |
|
Enhancing speech-based applications, such as telecommunication systems and real-time communication platforms. |
|
Research and development in audio coding and speech synthesis, including understanding and improving codec performance. |
|
Development of downstream applications including speech recognition and generation. |
|
|
|
All uses of the model should be in accordance with our [Acceptable Use Policy](https://stability.ai/use-policy). |
|
|
|
|
|
### Out-of-Scope Uses |
|
|
|
This model is purely trained on non-overlapping clean English speech, and exhibits optimal performance in these situations. It is not suitable for applications requiring high-fidelity music or environmental sound coding. |
|
|
|
### Contact |
|
|
|
Please report any issues with the model or contact us: |
|
|
|
* Safety issues: [email protected] |
|
* Security issues: [email protected] |
|
* Privacy issues: [email protected] |
|
* License and general: https://stability.ai/license |
|
* Enterprise license: https://stability.ai/enterprise |
|
|
|
|
|
|
|
|