UrbanSound_EventDetection_Wav2Vec2
π Overview
The UrbanSound_EventDetection_Wav2Vec2 is a highly efficient model based on the pre-trained Wav2Vec2 architecture, fine-tuned specifically for classifying momentary and continuous sound events within urban environments. It processes raw audio waveforms to identify one of eight high-priority urban sound classes, focusing on high-impact and potentially anomalous events.
π§ Model Architecture
This model utilizes the standard Wav2Vec2 pipeline, which operates directly on raw audio data without the need for manual feature extraction (like MFCCs).
- Base Model:
facebook/wav2vec2-base - Feature Extractor: A stack of 1D convolutional layers extracts local features from the raw waveform.
- Transformer Encoder: 12 layers of Transformer blocks capture long-range dependencies and global context within the audio clip.
- Classification Head: A task-specific linear layer is placed on top of the contextualized representations to predict one of the 8 event labels.
- Target Classes: Car_Horn, Children_Playing, Dog_Barking, Machinery_Hum, Siren_Emergency, Train_Whistle, Tire_Screech, and Glass_Shattering.
π― Intended Use
This model is intended for smart city, safety, and acoustic monitoring systems:
- Acoustic Surveillance: Real-time detection of emergency sounds (Siren, Glass Shattering, Tire Screech) for public safety alerting.
- Noise Pollution Monitoring: Quantifying the occurrence and frequency of specific noise sources (Car Horn, Machinery Hum) in different city zones.
- Urban Planning: Analyzing soundscape composition to inform policy on zoning and noise mitigation strategies.
β οΈ Limitations
- Event Overlap: The current setup is trained for single-label classification. If multiple sounds occur simultaneously (e.g., Siren + Dog Barking), the model will only output the single most probable event, potentially ignoring others.
- Domain Shift: The model's performance may degrade if deployed in environments with significantly different background noise profiles (e.g., highly quiet suburbs vs. extremely loud Asian markets).
- Localization: This model performs event detection but does not inherently provide sound localization (Direction-of-Arrival or DOA), which would require specialized input features (like ambisonic audio) and a different model head.
MODEL 2: MedicalChatbot_IntentClassifier_RoBERTa
This model is a RoBERTa-based model for multi-class classification of user intent within medical dialogue transcripts.
config.json
{
"_name_or_path": "roberta-base",
"architectures": [
"RobertaForSequenceClassification"
],
"hidden_size": 768,
"model_type": "roberta",
"num_hidden_layers": 12,
"vocab_size": 50265,
"id2label": {
"0": "Symptom_Reporting",
"1": "Advice_Seeking",
"2": "Medication_Query",
"3": "Appointment_Scheduling",
"4": "Billing_Query",
"5": "Causal_Query",
"6": "Record_Retrieval",
"7": "Urgency_Assessment"
},
"label2id": {
"Symptom_Reporting": 0,
"Advice_Seeking": 1,
"Medication_Query": 2,
"Appointment_Scheduling": 3,
"Billing_Query": 4,
"Causal_Query": 5,
"Record_Retrieval": 6,
"Urgency_Assessment": 7
},
"num_labels": 8,
"problem_type": "single_label_classification",
"transformers_version": "4.36.0"
}
- Downloads last month
- -
Evaluation results
- Event Detection Accuracyself-reported0.945
- Macro F1 Scoreself-reported0.938