|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: audio-to-audio |
|
|
tags: |
|
|
- pytorch |
|
|
- audio |
|
|
- upsampling |
|
|
--- |
|
|
# FlashSR |
|
|
|
|
|
FlashSR is a 2MB audio super-resolution model based on the HierSpeech++'s upsampler architecture. It upscales 16kHz audio to 48kHz at speeds ranging from 200x to 400x real-time. |
|
|
|
|
|
### Details |
|
|
* **Model Size:** 2MB for pytorch version, 500KB for onnx version |
|
|
* **Input Rate:** 16kHz |
|
|
* **Output Rate:** 48kHz |
|
|
* **Inference Speed:** 200x - 400x real-time depending on gpu and dtype |
|
|
|
|
|
### Performance Summary |
|
|
FlashSR is designed for high-speed frequency reconstruction. It offers a significantly lower computational footprint compared to alternatives such as Resemble-Enhance and ClearerVoice, while maintaining similar output quality. |
|
|
|
|
|
|
|
|
|
|
|
### Benchmark Comparison |
|
|
|
|
|
| Model | Speed | Size | |
|
|
| :--- | :--- | :--- | |
|
|
| **FlashSR** | **200x - 400x realtime** | **2MB/500KB** | |
|
|
| Resemble-Enhance | < 20x realtime | ~700MB+ | |
|
|
| ClearerVoice | < 20x realtime | ~200MB+ | |
|
|
|
|
|
### Usage |
|
|
Usage instructions for onnx/pytorch and source code are available on GitHub: |
|
|
https://github.com/ysharma3501/FlashSR |
|
|
|
|
|
### Credits |
|
|
Thanks to the authors of **HierSpeech++** as this was based on it's 48khz upsampler and [Xenova](https://github.com/xenova/) for onnx code. |