wildlife_acoustic_species_identifier
Overview
This model is an Audio Spectrogram Transformer (AST) fine-tuned for the identification of keystone wildlife species in sub-Saharan Africa. It processes short audio clips (1-10 seconds) to detect vocalizations in complex jungle and savanna soundscapes.
Model Architecture
The model treats audio classification as an image task by transforming the input waveform into a Log-Mel Spectrogram.
- Spectrogram Transformation: Raw audio is processed into 128 Mel-frequency bins.
- Backbone: A Vision Transformer (ViT) architecture that processes patches of the spectrogram.
- Stride: Optimized strides (10, 10) for high-resolution temporal and frequency analysis.
Intended Use
- Conservation Monitoring: Automated biodiversity audits in protected areas.
- Anti-Poaching: Detecting specific distress signals or territorial vocalizations.
- Citizen Science: Identifying species in user-submitted field recordings.
Limitations
- Background Noise: Performance drops in heavy rain or high wind conditions.
- Distant Sounds: Low-amplitude vocalizations may be masked by closer insect or bird noise.
- Overlap: Struggles to identify individual species when multiple different animals are vocalizing simultaneously (polyphony).
- Downloads last month
- 16