Clear: on-device speech enhancement

48 kHz on-device speech enhancement. Takes noisy mono or stereo audio (phone mic, untreated room, traffic), returns a podcast-ready file: denoised, dereverbed, voice warm and present.

Try it

Live demo: desert-ant-labs/clear-demo — drop in a recording and hear raw vs cleaned, fully in your browser.
iOS / macOS: clear-swift — Swift package; both variants bundled, works offline.
Android / JVM: clear-kotlin — Kotlin SDK via JitPack.
JavaScript / TypeScript: @desert-ant-labs/clear — npm package for Node + browser (source).

For commercial licensing above 100k MAU, email licensing@desertant.ai.

Variants

Variant	Character	When to use
`clear-studio`	Quiet, studio-like; silences near zero	Default. Works across the full range of input quality: phone audio, laptop mic, untreated rooms, USB / XLR podcast captures.
`clear-natural`	Room tone, breath, lip texture preserved	Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional.

If the source is already clean and you want the model to stay invisible, pick clear-natural. Otherwise clear-studio is the default.

Files

Both variants share the same architecture and realtime cost; only the weights differ.

Both variants are 6-bit palettized (k-means LUT) — ~5× smaller than the fp32 weights with no perceptible quality loss (DNSMOS OVRL within ~0.02 of the float model).

Variant	File	Format	Size
`clear-studio`	`clear-studio.mlmodelc/`	Core ML, 6-bit palettized, precompiled	~1.9 MB
`clear-studio`	`clear-studio.onnx`	ONNX, 6-bit palettized (fp16-stored)	~4.5 MB
`clear-natural`	`clear-natural.mlmodelc/`	Core ML, 6-bit palettized, precompiled	~1.9 MB
`clear-natural`	`clear-natural.onnx`	ONNX, 6-bit palettized (fp16-stored)	~4.5 MB

The ONNX keeps fp32 inputs/outputs, so host code is unchanged. The Core ML .mlmodelc is precompiled (load it directly; no .mlpackage compile step).

Use

ONNX

from huggingface_hub import hf_hub_download
import onnxruntime as ort

path    = hf_hub_download("desert-ant-labs/clear", "clear-studio.onnx")
session = ort.InferenceSession(path, providers=["CPUExecutionProvider"])

Inputs and outputs

Architecture: DeepFilterNet 3 (DFN3-half).
Sample rate: 48 kHz, mono or stereo (per-channel inference).
Inference contract: spec / feat_erb / feat_spec → spec_enhanced. STFT, ERB, and ISTFT are host-side DSP, not part of the model graph.

Performance

Both variants run at the same speed. Enhancing a 5-minute clip on the Apple Neural Engine:

Device	Chip	Mono	Stereo
iPhone 15 Pro	A17 Pro	4.88 s (61× realtime)	6.53 s (46×)
iPhone 17 Pro	A19 Pro	3.70 s (81× realtime)	5.16 s (58×)

Cold model load is ~0.6 s; later loads ~100 ms via the system ANE cache.

Limitations

Trained on English speech; non-English speech still benefits but has not been measured against per-language ground truth.
Heavy background music or multi-speaker overlap degrades quality.
Mastering is informational only; verify against the platform's actual loudness target before publishing.

Built on

DeepFilterNet 3 by Rikorose, MIT. Fine-tuned on the Desert Ant Labs speech corpus.

License

Released under the Desert Ant Labs Source-Available License v1.0 (see LICENSE.md).

Free for commercial use up to 100,000 Monthly Active Users (MAU).
Above 100,000 MAU a commercial license is required. Contact licensing@desertant.ai.

Citation

@software{clear_2026,
  title  = {Clear: on-device speech enhancement},
  author = {Desert Ant Labs},
  year   = {2026},
  url    = {https://huggingface.co/desert-ant-labs/clear},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

desert-ant-labs
/

clear