kamahori nielsr HF staff commited on
Commit
5d02f38
·
verified ·
1 Parent(s): 57db549

Add quick start code and citation to model card (#1)

Browse files

- Add quick start code and citation to model card (9aa404281aa9d5e3dc472eda671afb58127097d3)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +63 -8
README.md CHANGED
@@ -1,13 +1,13 @@
1
  ---
2
- license: apache-2.0
3
- library_name: transformers
4
  base_model: openai/whisper-large-v3-turbo
5
- tags:
6
- - audio
7
- - automatic-speech-recognition
8
- - whisper
9
- - hf-asr-leaderboard
10
  pipeline_tag: automatic-speech-recognition
 
 
 
 
 
11
  ---
12
 
13
  # Model Card for Lite-Whisper large-v3-turbo-acc
@@ -32,4 +32,59 @@ Following is the average word error rate (WER) evaluated on the [ESB datasets](h
32
  | [lite-whisper-large-v3-turbo](https://huggingface.co/efficient-speech/lite-whisper-large-v3-turbo) | 12.6 | 374M | 172M |
33
  | [lite-whisper-large-v3-turbo-fast](https://huggingface.co/efficient-speech/lite-whisper-large-v3-turbo-fast) | 20.1 | 313M | 172M |
34
  | &nbsp; | &nbsp; | &nbsp; | &nbsp; |
35
- | [whisper-medium](https://huggingface.co/openai/whisper-medium) | 14.8 | 306M | 457M |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: openai/whisper-large-v3-turbo
3
+ library_name: transformers
4
+ license: apache-2.0
 
 
 
5
  pipeline_tag: automatic-speech-recognition
6
+ tags:
7
+ - audio
8
+ - automatic-speech-recognition
9
+ - whisper
10
+ - hf-asr-leaderboard
11
  ---
12
 
13
  # Model Card for Lite-Whisper large-v3-turbo-acc
 
32
  | [lite-whisper-large-v3-turbo](https://huggingface.co/efficient-speech/lite-whisper-large-v3-turbo) | 12.6 | 374M | 172M |
33
  | [lite-whisper-large-v3-turbo-fast](https://huggingface.co/efficient-speech/lite-whisper-large-v3-turbo-fast) | 20.1 | 313M | 172M |
34
  | &nbsp; | &nbsp; | &nbsp; | &nbsp; |
35
+ | [whisper-medium](https://huggingface.co/openai/whisper-medium) | 14.8 | 306M | 457M |
36
+
37
+ ## Quick Start
38
+
39
+ The easiest way to run our model is to use our integration with HuggingFace Transformers library.
40
+ We provide model weights for the compressed version of OpenAI Whisper series [here](https://huggingface.co/efficient-speech).
41
+
42
+ ```python
43
+ import librosa
44
+ import torch
45
+ from transformers import AutoProcessor, AutoModel
46
+
47
+ device = "cuda:0"
48
+ dtype = torch.float16
49
+
50
+ # load the compressed Whisper model
51
+ model = AutoModel.from_pretrained(
52
+ "efficient-speech/lite-whisper-large-v3-turbo",
53
+ trust_remote_code=True,
54
+ )
55
+ model.to(dtype).to(device)
56
+
57
+ # we use the same processor as the original model
58
+ processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")
59
+
60
+ # set the path to your audio file
61
+ path = "path/to/audio.wav"
62
+ audio, _ = librosa.load(path, sr=16000)
63
+
64
+ input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
65
+ input_features = input_features.to(dtype).to(device)
66
+
67
+ predicted_ids = model.generate(input_features)
68
+ transcription = processor.batch_decode(
69
+ predicted_ids,
70
+ skip_special_tokens=True
71
+ )[0]
72
+
73
+ print(transcription)
74
+ ```
75
+
76
+ ## Citation
77
+
78
+ If you use LiteASR in your research, please cite the following paper:
79
+
80
+ ```
81
+ @misc{kamahori2025liteasrefficientautomaticspeech,
82
+ title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation},
83
+ author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
84
+ year={2025},
85
+ eprint={2502.20583},
86
+ archivePrefix={arXiv},
87
+ primaryClass={cs.LG},
88
+ url={https://arxiv.org/abs/2502.20583},
89
+ }
90
+ ```