UrbanSound_EventDetection_Wav2Vec2

πŸ‘‚ Overview

The UrbanSound_EventDetection_Wav2Vec2 is a highly efficient model based on the pre-trained Wav2Vec2 architecture, fine-tuned specifically for classifying momentary and continuous sound events within urban environments. It processes raw audio waveforms to identify one of eight high-priority urban sound classes, focusing on high-impact and potentially anomalous events.

🧠 Model Architecture

This model utilizes the standard Wav2Vec2 pipeline, which operates directly on raw audio data without the need for manual feature extraction (like MFCCs).

  • Base Model: facebook/wav2vec2-base
  • Feature Extractor: A stack of 1D convolutional layers extracts local features from the raw waveform.
  • Transformer Encoder: 12 layers of Transformer blocks capture long-range dependencies and global context within the audio clip.
  • Classification Head: A task-specific linear layer is placed on top of the contextualized representations to predict one of the 8 event labels.
  • Target Classes: Car_Horn, Children_Playing, Dog_Barking, Machinery_Hum, Siren_Emergency, Train_Whistle, Tire_Screech, and Glass_Shattering.

🎯 Intended Use

This model is intended for smart city, safety, and acoustic monitoring systems:

  1. Acoustic Surveillance: Real-time detection of emergency sounds (Siren, Glass Shattering, Tire Screech) for public safety alerting.
  2. Noise Pollution Monitoring: Quantifying the occurrence and frequency of specific noise sources (Car Horn, Machinery Hum) in different city zones.
  3. Urban Planning: Analyzing soundscape composition to inform policy on zoning and noise mitigation strategies.

⚠️ Limitations

  1. Event Overlap: The current setup is trained for single-label classification. If multiple sounds occur simultaneously (e.g., Siren + Dog Barking), the model will only output the single most probable event, potentially ignoring others.
  2. Domain Shift: The model's performance may degrade if deployed in environments with significantly different background noise profiles (e.g., highly quiet suburbs vs. extremely loud Asian markets).
  3. Localization: This model performs event detection but does not inherently provide sound localization (Direction-of-Arrival or DOA), which would require specialized input features (like ambisonic audio) and a different model head.

MODEL 2: MedicalChatbot_IntentClassifier_RoBERTa

This model is a RoBERTa-based model for multi-class classification of user intent within medical dialogue transcripts.

config.json

{
  "_name_or_path": "roberta-base",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "hidden_size": 768,
  "model_type": "roberta",
  "num_hidden_layers": 12,
  "vocab_size": 50265,
  "id2label": {
    "0": "Symptom_Reporting",
    "1": "Advice_Seeking",
    "2": "Medication_Query",
    "3": "Appointment_Scheduling",
    "4": "Billing_Query",
    "5": "Causal_Query",
    "6": "Record_Retrieval",
    "7": "Urgency_Assessment"
  },
  "label2id": {
    "Symptom_Reporting": 0,
    "Advice_Seeking": 1,
    "Medication_Query": 2,
    "Appointment_Scheduling": 3,
    "Billing_Query": 4,
    "Causal_Query": 5,
    "Record_Retrieval": 6,
    "Urgency_Assessment": 7
  },
  "num_labels": 8,
  "problem_type": "single_label_classification",
  "transformers_version": "4.36.0"
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results