parakeet-tdt-ctc-110m-coreml-int8

CoreML conversion of nvidia/parakeet-tdt_ctc-110m — INT8 PER CHANNEL SYMMETRIC quantized.

Architecture TDT (Token-and-Duration Transducer)
Language English
Sample rate 16000 Hz
Max audio 15.0s
Vocab size 1024
Framework NVIDIA NeMo → CoreML (coremltools)

Components

File Component Best compute
parakeet_mel_encoder.mlpackage mel_encoder ANE / GPU
parakeet_ctc_decoder.mlpackage ctc_decoder ANE / GPU
parakeet_decoder.mlpackage decoder CPU only
parakeet_joint_decision_single_step.mlpackage joint_decision_single_step ANE / GPU

Usage

pip install ovos-stt-plugin-coreml
from ovos_stt_plugin_coreml import CoremlSTT
from ovos_plugin_manager.utils.audio import AudioFile

stt = CoremlSTT(config={"metadata": "metadata.json"})

with AudioFile("speech.wav") as f:
    audio = f.read()
print(stt.execute(audio))

Source model

nvidia/parakeet-tdt_ctc-110m

Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenVoiceOS/parakeet-tdt-ctc-110m-coreml-int8

Quantized
(11)
this model

Collection including OpenVoiceOS/parakeet-tdt-ctc-110m-coreml-int8