wildlife_acoustic_species_identifier

Overview

This model is an Audio Spectrogram Transformer (AST) fine-tuned for the identification of keystone wildlife species in sub-Saharan Africa. It processes short audio clips (1-10 seconds) to detect vocalizations in complex jungle and savanna soundscapes.

Model Architecture

The model treats audio classification as an image task by transforming the input waveform into a Log-Mel Spectrogram.

  • Spectrogram Transformation: Raw audio is processed into 128 Mel-frequency bins.
  • Backbone: A Vision Transformer (ViT) architecture that processes patches of the spectrogram.
  • Stride: Optimized strides (10, 10) for high-resolution temporal and frequency analysis.

Intended Use

  • Conservation Monitoring: Automated biodiversity audits in protected areas.
  • Anti-Poaching: Detecting specific distress signals or territorial vocalizations.
  • Citizen Science: Identifying species in user-submitted field recordings.

Limitations

  • Background Noise: Performance drops in heavy rain or high wind conditions.
  • Distant Sounds: Low-amplitude vocalizations may be masked by closer insect or bird noise.
  • Overlap: Struggles to identify individual species when multiple different animals are vocalizing simultaneously (polyphony).
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support