Bangla Whisper Large V3 - Bengali Speech Recognition Model

This model is a fine-tuned version of openai/whisper-large-v3 for Bengali (Bangla) speech recognition.

Model Description

  • Base Model: Whisper Large V3
  • Language: Bengali (bn)
  • Task: Automatic Speech Recognition (Transcription)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation) with Unsloth
  • Training Data: 3182 samples from 20 Bengali regional dialects

Training Details

Training Data

  • Total Samples: 3350
  • Training Samples: 3182
  • Validation Samples: 168
  • Regions: 20 different Bengali-speaking regions (Dhaka, Chittagong, Sylhet, Rajshahi, Khulna, etc.)

Training Hyperparameters

  • Epochs: 1
  • Batch Size: 8
  • Gradient Accumulation: 2
  • Effective Batch Size: 16
  • Learning Rate: 0.0001
  • Optimizer: OptimizerNames.ADAMW_TORCH
  • LoRA Rank: 32
  • LoRA Alpha: 32
  • Target Modules: q_proj, v_proj, k_proj, o_proj, encoder_attn layers, fc1, fc2

Training Results

  • Training Time: 41.20 minutes
  • Final Training Loss: 0.3332
  • Speed: 1.29 samples/second
  • GPU: Tesla T4

Usage

Installation

pip install transformers librosa soundfile torch

Basic Usage

import torch
import librosa
import numpy as np
from transformers import WhisperForConditionalGeneration, WhisperProcessor

# Load model and processor
model = WhisperForConditionalGeneration.from_pretrained(
    "seyam2023/bangla-whisper-large-v3",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = WhisperProcessor.from_pretrained("seyam2023/bangla-whisper-large-v3")

# Load audio file
audio, _ = librosa.load("your_audio.wav", sr=16000, mono=True)
audio = audio / (np.max(np.abs(audio)) + 1e-8)

# Process and transcribe
inputs = processor.feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
input_features = inputs.input_features.to(model.device, dtype=torch.float16)

with torch.no_grad():
    pred_ids = model.generate(
        input_features,
        max_new_tokens=128,
        num_beams=1,
        do_sample=False
    )

transcription = processor.batch_decode(pred_ids, skip_special_tokens=True)[0]
print(transcription)

Batch Processing

# Process multiple audio files
audio_files = ["file1.wav", "file2.wav", "file3.wav"]

for audio_file in audio_files:
    audio, _ = librosa.load(audio_file, sr=16000, mono=True)
    audio = audio / (np.max(np.abs(audio)) + 1e-8)
    
    inputs = processor.feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
    input_features = inputs.input_features.to(model.device, dtype=torch.float16)
    
    with torch.no_grad():
        pred_ids = model.generate(input_features, max_new_tokens=128)
    
    text = processor.batch_decode(pred_ids, skip_special_tokens=True)[0]
    print(f"{audio_file}: {text}")

Performance

This model has been trained on diverse Bengali audio data from multiple regional dialects, making it robust for:

  • Standard Bengali (Dhaka dialect)
  • Regional variations (Chittagong, Sylhet, Noakhali, etc.)
  • Various audio quality conditions
  • Different speaking styles and speeds

Limitations

  • Optimized for Bengali language only
  • Performance may vary with:
    • Heavy background noise
    • Very low-quality audio recordings
    • Non-native Bengali speakers
    • Mixed language speech (code-switching)

Training Infrastructure

  • Hardware: Tesla T4
  • Framework: PyTorch, Transformers, Unsloth
  • Precision: FP16 (Mixed Precision Training)

Citation

If you use this model, please cite:

@misc{bangla-whisper-large-v3,
  author = {Touhidul Alam Seyam and Md Abtahee Kabir and Noore Tamanna Orny},
  title = {Bangla Whisper Large V3: Bengali Speech Recognition},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/seyam2023/bangla-whisper-large-v3}}
}

Team

Team Huntrix

  • Touhidul Alam Seyam
  • Md Abtahee Kabir
  • Noore Tamanna Orny

Acknowledgments

License

Apache 2.0

Contact

For questions or feedback, please open an issue in the model repository or contact Team Huntrix.

Downloads last month
4
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results