Elyadata
/

ADI-whisper-ADI20

@@ -22,45 +22,90 @@ metrics:
 ## Install Requirements
-### SpeechBrain
-First of all, please install SpeechBrain with the following command:
-```bash
-pip install git+https://github.com/speechbrain/speechbrain.git@develop
-```
-### Clone ADI github repository
 ```bash
 git clone https://github.com/elyadata/ADI-20
 cd ADI-20
 pip install -r requirements.txt
 ```
-### Perform Arabic Dialect Identification
 ```python
 from inference.classifier_attention_pooling import WhisperDialectClassifier
 dialect_id = WhisperDialectClassifier.from_hparams(
     source="",
-    hparams_file="hyperparms.yaml",
-    savedir="pretrained_DID/tmp").to("cuda")
-dialect_id.device = "cuda"
-dialect_id.classify_file("filenane.wav")
 ```
-### Citation
 If using this work, please cite:
-```
-@inproceedings{elleuch2025adi20,
   author    = {Haroun Elleuch and Salima Mdhaffar and Yannick Estève and Fethi Bougares},
-  title     = {ADI‑20: Arabic Dialect Identification Dataset and Models},
-  booktitle = {Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech)},
   year      = {2025},
-  address   = {Rotterdam Ahoy Convention Centre, Rotterdam, The Netherlands},
-  month     = {August},
-  days      = {17‑21}
 }
-```

 ## Install Requirements
+### Clone the ADI-20 github repository:
 ```bash
 git clone https://github.com/elyadata/ADI-20
 cd ADI-20
 pip install -r requirements.txt
 ```
+### Note on SpeechBrain
+While you can use the pipy version of Speechbrain included in the `requirements.txt` in the ADI-20 github repository, you may also install it from source using the following command:
+```bash
+pip install git+https://github.com/speechbrain/speechbrain.git@develop
+```
+## Perform Arabic Dialect Identification
 ```python
 from inference.classifier_attention_pooling import WhisperDialectClassifier
 dialect_id = WhisperDialectClassifier.from_hparams(
     source="",
+    hparams_file="hyperparams.yaml",
+    savedir="pretrained_DID/tmp",
+    run_opts={"device": "cuda"} # If using a GPU (recommended).
+    )
+out_prob, score, index, text_lab = dialect_id.classify_file("your_file.wav")
+print(f"Predicted dialect: {text_lab[0]}")
+print("-" * 15)
+print(f"Dialect index: {index}")
+print(f"Score: {score}")
+print(f"Output log probs: {out_prob}")
+print("-" * 15)
 ```
+## NADI 2025
+We have also used this model for dialect identification task in the [NADI 2025](https://nadi.dlnlp.ai/2025/) challenge and ranked first:
+<img src="nadi_leaderboard.png" alt="
+RANK    Codabench Username    Accuracy    Cost
+🥇      harounelleuch         0.7983      0.1788
+🥈      badr_alabsi           0.7640      0.2265
+🥉      rafiulbiswas          0.616       0.3068
+4       gahmed92              0.612       0.3477
+5       ADI Baseline          0.6109      0.3422
+">
+For more information on how we used the model, you can refer to:
+- Our system paper:  [arXiv](https://arxiv.org/abs/2511.10090), [ACL Anthology](https://aclanthology.org/2025.arabicnlp-sharedtasks.105/)
+- NADI findings paper:  [arXiv](https://arxiv.org/abs/2509.02038), [ACL Anthology](https://aclanthology.org/2025.arabicnlp-sharedtasks.99/)
+## Citations
 If using this work, please cite:
+```bibtex
+@inproceedings{elleuch25_interspeech,
+  title     = {{ADI-20: Arabic Dialect Identification dataset and models}},
   author    = {Haroun Elleuch and Salima Mdhaffar and Yannick Estève and Fethi Bougares},
   year      = {2025},
+  booktitle = {{Interspeech 2025}},
+  pages     = {2775--2779},
+  doi       = {10.21437/Interspeech.2025-884},
+  issn      = {2958-1796},
+}
+@inproceedings{elleuch-etal-2025-elyadata,
+    title = "{ELYADATA} {\&} {LIA} at {NADI} 2025: {ASR} and {ADI} Subtasks",
+    author = "Elleuch, Haroun  and
+      Saidi, Youssef  and
+      Mdhaffar, Salima  and
+      Est{\`e}ve, Yannick  and
+      Bougares, Fethi",
+    booktitle = "Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks",
+    month = nov,
+    year = "2025",
+    address = "Suzhou, China",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2025.arabicnlp-sharedtasks.105/",
+    doi = "10.18653/v1/2025.arabicnlp-sharedtasks.105",
+    pages = "762--766",
+    ISBN = "979-8-89176-356-2",
 }
+```

nadi_leaderboard.png ADDED Viewed