whisper-lang-id

This model is a fine-tuned version of openai/whisper-tiny on mozilla-foundation/common_voice_11_0 dataset

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Mozilla foundation/common_voice_11.0

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
No log 1.0 175 0.0148 0.995 0.9950

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.

Example Usage

Here is an example of how to use the model for Language Idenfication with Gradio:

import torch
from transformers import pipeline
import gradio as gr

# Use a pipeline as a high-level helper
pipe = pipeline("audio-classification", model="Lingalingeswaran/whisper-lang-id")

def identify_language(audio_file):
    """Identifies the language of an audio file."""
    try:
        result = pipe(audio_file)
        predicted_label = result[0]['label']
        score = result[0]['score']

        if predicted_label == "LABEL_0":
            predicted_label = "Tamil"
        elif predicted_label == "LABEL_1":
            predicted_label = "English"
        else:
            predicted_label = predicted_label

        return f"Predicted Language: {predicted_label}, Score: {score:.4f}"
    except Exception as e:
        return f"Error during language identification: {e}"

# Gradio interface
def create_gradio_interface():
    with gr.Blocks() as demo:
        gr.Markdown("### Language Identification from Audio File")
        gr.Markdown("Upload an audio file or use your microphone to detect the language spoken.")

        # Corrected the sources argument
        audio_input = gr.Audio(sources=["microphone", "upload"], type="filepath", label="Record or Upload Audio")
        result_output = gr.Textbox(label="Language Identification Result", interactive=False)

        # Submit button
        submit_btn = gr.Button("Submit")
        submit_btn.click(identify_language, inputs=audio_input, outputs=result_output)

        # Clear button
        clear_btn = gr.Button("Clear")
        clear_btn.click(lambda: (None, None), outputs=[audio_input, result_output])  # Clear audio and result

    demo.launch()

# Run the Gradio interface
create_gradio_interface()
Downloads last month
118
Safetensors
Model size
8.31M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Lingalingeswaran/whisper-lang-id

Finetuned
(1304)
this model

Dataset used to train Lingalingeswaran/whisper-lang-id