Whisper TFLite Model Generation and Test

Converts OpenAI Whisper speech recognition models to TFLite format for on-device inference (e.g. Android), and generates the mel filter + vocab binary file needed by native C++ runtimes.

Requirements

Python 3.9
macOS, Linux, or Google Colab

Dependencies are installed automatically on first run.

Three Generation Modes

The script supports three modes depending on the --language argument:

1. English-only (`.en` models)

python3.9 whisper_tflite_model_generation_and_test.py --model whisper-tiny.en

Output: whisper-tiny.en.tflite
Signature: serving_default (transcribe English)
forced_decoder_ids: [[2, 50359], [3, 50363]]

2. Single-language (explicit language code)

python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language fr
python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language de

Output: whisper-base.fr.tflite, whisper-base.de.tflite
Signature: serving_default (transcribe the specified language)
forced_decoder_ids: [[1, <lang_token>], [2, 50359], [3, 50363]]

3. Transcribe-translate (auto language detection)

python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language auto

Output: whisper-base-transcribe-translate.tflite
Signatures: serving_default (= transcribe), serving_transcribe, serving_translate
No language token forced — Whisper auto-detects the spoken language
forced_decoder_ids (transcribe): [[2, 50359], [3, 50363]]
forced_decoder_ids (translate): [[2, 50358], [3, 50363]]

Usage

# Default: whisper-tiny.en (English-only)
python3.9 whisper_tflite_model_generation_and_test.py

# Single-language French model
python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language fr

# Transcribe-translate model (auto-detect language)
python3.9 whisper_tflite_model_generation_and_test.py --model whisper-base --language auto

# Test translate signature specifically
python3.9 whisper_tflite_model_generation_and_test.py --model whisper-small --language auto --task translate

# Show all options
python3.9 whisper_tflite_model_generation_and_test.py --help

Command Line Arguments

Argument	Default	Description
`--model`	`whisper-tiny.en`	Whisper model to convert
`--language`	`en`	Language code (`en`, `fr`, `de`, ...) or `auto` for auto-detection
`--task`	`transcribe`	`transcribe` or `translate`. Only relevant with `--language auto`
`--multilingual` / `--no-multilingual`	`--multilingual`	Vocab binary type. Auto-set to `--no-multilingual` for `.en` models

Supported Models

Model	Type	Parameters	Required VRAM	Relative Speed
`whisper-tiny.en`	English-only	~39M	~1 GB	~10x
`whisper-tiny`	Multilingual	~39M	~1 GB	~10x
`whisper-base.en`	English-only	~74M	~1 GB	~7x
`whisper-base`	Multilingual	~74M	~1 GB	~7x
`whisper-small.en`	English-only	~244M	~2 GB	~4x
`whisper-small`	Multilingual	~244M	~2 GB	~4x
`whisper-medium.en`	English-only	~769M	~5 GB	~2x
`whisper-medium`	Multilingual	~769M	~5 GB	~2x
`whisper-large`	Multilingual	~1550M	~10 GB	1x
`whisper-large-v3`	Multilingual	~1550M	~10 GB	1x
`whisper-turbo`	Multilingual	~809M	~6 GB	~8x

Supported Languages

en, fr, hi, ko, de, zh, ja, es, ar, ru, pt, it, nl, sv, pl, da, fi, and many more.

Use auto for language auto-detection. See the full list.

TFLite Serving Signatures

Mode	Signatures	Description
English-only (`.en`)	`serving_default`	Transcribe English
Single-language	`serving_default`	Transcribe the forced language
Transcribe-translate (`auto`)	`serving_default`, `serving_transcribe`, `serving_translate`	Auto-detect language, transcribe or translate to English

Token reference:

50358 = <|translate|>
50359 = <|transcribe|>
50363 = <|notimestamps|>

What the Script Does

Step	Description
0	Install/verify Python dependencies
1	Configure model parameters and fetch decoder token mappings
2	Load the Whisper model and run a test transcription (English: LibriSpeech, other languages: Google FLEURS)
3	Patch `TFForceTokensLogitsProcessor` to avoid NaN values during TFLite export
4	Wrap the model with serving signature(s) and save as TF SavedModel
5	Convert the SavedModel to TFLite with dynamic range quantization
6	Verify the TFLite model produces correct output via the TFLite Interpreter
7	(Optional) Test the TFLite model against `.wav` audio files
8	Generate the mel filters + vocab binary file

Output Files

# English-only
whisper-tiny.en.tflite
filters_vocab_en.bin

# Single-language (e.g. French)
whisper-base.fr.tflite
filters_vocab_multilingual.bin

# Transcribe-translate (auto)
whisper-base-transcribe-translate.tflite
filters_vocab_multilingual.bin

These files are what you need for on-device Whisper inference on Android or other embedded platforms.

Citing

If you are using the Whisper tflite model, please cite:

@misc{nyadla-sys,
  author={Niranjan Yadla},
  title={{Whisper TFLite: OpenAI Whisper Model Port for Edge Devices}},
  year=2022,
  howpublished={GitHub Repository},
  url={https://github.com/nyadla-sys/whisper.tflite},
  url={https://github.com/moonshine-ai/openai-whisper},
  note={Original TFLite implementation of OpenAI Whisper for on-device automatic speech recognition}
}

Downloads last month: 641