Whisper TFLite Model Generation and Test

Converts OpenAI Whisper speech recognition models to TFLite format for on-device inference (e.g. Android), and generates the mel filter + vocab binary file needed by native C++ runtimes.

Requirements

  • Python 3.9
  • macOS, Linux, or Google Colab

Dependencies are installed automatically on first run.

Three Generation Modes

The script supports three modes depending on the --language argument:

1. English-only (.en models)

python3.9 whisper_tflite_model_generation_and_test.py --model whisper-tiny.en
  • Output: whisper-tiny.en.tflite
  • Signature: serving_default (transcribe English)
  • forced_decoder_ids: [[2, 50359], [3, 50363]]

2. Single-language (explicit language code)

python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language fr
python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language de
  • Output: whisper-base.fr.tflite, whisper-base.de.tflite
  • Signature: serving_default (transcribe the specified language)
  • forced_decoder_ids: [[1, <lang_token>], [2, 50359], [3, 50363]]

3. Transcribe-translate (auto language detection)

python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language auto
  • Output: whisper-base-transcribe-translate.tflite
  • Signatures: serving_default (= transcribe), serving_transcribe, serving_translate
  • No language token forced — Whisper auto-detects the spoken language
  • forced_decoder_ids (transcribe): [[2, 50359], [3, 50363]]
  • forced_decoder_ids (translate): [[2, 50358], [3, 50363]]

Usage

# Default: whisper-tiny.en (English-only)
python3.9 whisper_tflite_model_generation_and_test.py

# Single-language French model
python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language fr

# Transcribe-translate model (auto-detect language)
python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language auto

# Test translate signature specifically
python3.9 whisper_tflite_model_generation_and_test.py --model whisper-small --language auto --task translate

# Show all options
python3.9 whisper_tflite_model_generation_and_test.py --help

Command Line Arguments

Argument Default Description
--model whisper-tiny.en Whisper model to convert
--language en Language code (en, fr, de, ...) or auto for auto-detection
--task transcribe transcribe or translate. Only relevant with --language auto
--multilingual / --no-multilingual --multilingual Vocab binary type. Auto-set to --no-multilingual for .en models

Supported Models

Model Type Parameters Required VRAM Relative Speed
whisper-tiny.en English-only ~39M ~1 GB ~10x
whisper-tiny Multilingual ~39M ~1 GB ~10x
whisper-base.en English-only ~74M ~1 GB ~7x
whisper-base Multilingual ~74M ~1 GB ~7x
whisper-small.en English-only ~244M ~2 GB ~4x
whisper-small Multilingual ~244M ~2 GB ~4x
whisper-medium.en English-only ~769M ~5 GB ~2x
whisper-medium Multilingual ~769M ~5 GB ~2x
whisper-large Multilingual ~1550M ~10 GB 1x
whisper-large-v3 Multilingual ~1550M ~10 GB 1x
whisper-turbo Multilingual ~809M ~6 GB ~8x

Supported Languages

en, fr, hi, ko, de, zh, ja, es, ar, ru, pt, it, nl, sv, pl, da, fi, and many more.

Use auto for language auto-detection. See the full list.

TFLite Serving Signatures

Mode Signatures Description
English-only (.en) serving_default Transcribe English
Single-language serving_default Transcribe the forced language
Transcribe-translate (auto) serving_default, serving_transcribe, serving_translate Auto-detect language, transcribe or translate to English

Token reference:

  • 50358 = <|translate|>
  • 50359 = <|transcribe|>
  • 50363 = <|notimestamps|>

What the Script Does

Step Description
0 Install/verify Python dependencies
1 Configure model parameters and fetch decoder token mappings
2 Load the Whisper model and run a test transcription (English: LibriSpeech, other languages: Google FLEURS)
3 Patch TFForceTokensLogitsProcessor to avoid NaN values during TFLite export
4 Wrap the model with serving signature(s) and save as TF SavedModel
5 Convert the SavedModel to TFLite with dynamic range quantization
6 Verify the TFLite model produces correct output via the TFLite Interpreter
7 (Optional) Test the TFLite model against .wav audio files
8 Generate the mel filters + vocab binary file

Output Files

# English-only
whisper-tiny.en.tflite
filters_vocab_en.bin

# Single-language (e.g. French)
whisper-base.fr.tflite
filters_vocab_multilingual.bin

# Transcribe-translate (auto)
whisper-base-transcribe-translate.tflite
filters_vocab_multilingual.bin

These files are what you need for on-device Whisper inference on Android or other embedded platforms.

Citing

If you are using the Whisper tflite model, please cite:

@misc{nyadla-sys,
  author={Niranjan Yadla},
  title={{Whisper TFLite: OpenAI Whisper Model Port for Edge Devices}},
  year=2022,
  howpublished={GitHub Repository},
  url={https://github.com/nyadla-sys/whisper.tflite},
  url={https://github.com/moonshine-ai/openai-whisper},
  note={Original TFLite implementation of OpenAI Whisper for on-device automatic speech recognition}
}
Downloads last month
641
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support