HarounElleuch commited on
Commit
abdcb5d
·
verified ·
1 Parent(s): bbbc9ba

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +67 -22
  2. nadi_leaderboard.png +0 -0
README.md CHANGED
@@ -22,45 +22,90 @@ metrics:
22
 
23
  ## Install Requirements
24
 
25
- ### SpeechBrain
26
- First of all, please install SpeechBrain with the following command:
27
-
28
- ```bash
29
- pip install git+https://github.com/speechbrain/speechbrain.git@develop
30
- ```
31
-
32
- ### Clone ADI github repository
33
  ```bash
34
  git clone https://github.com/elyadata/ADI-20
35
  cd ADI-20
36
  pip install -r requirements.txt
37
  ```
38
 
 
 
39
 
40
- ### Perform Arabic Dialect Identification
 
 
 
 
41
  ```python
42
  from inference.classifier_attention_pooling import WhisperDialectClassifier
43
 
44
  dialect_id = WhisperDialectClassifier.from_hparams(
45
  source="",
46
- hparams_file="hyperparms.yaml",
47
- savedir="pretrained_DID/tmp").to("cuda")
 
 
48
 
49
- dialect_id.device = "cuda"
 
 
 
 
 
 
50
 
51
- dialect_id.classify_file("filenane.wav")
52
  ```
53
 
54
- ### Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  If using this work, please cite:
56
- ```
57
- @inproceedings{elleuch2025adi20,
 
58
  author = {Haroun Elleuch and Salima Mdhaffar and Yannick Estève and Fethi Bougares},
59
- title = {ADI‑20: Arabic Dialect Identification Dataset and Models},
60
- booktitle = {Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech)},
61
  year = {2025},
62
- address = {Rotterdam Ahoy Convention Centre, Rotterdam, The Netherlands},
63
- month = {August},
64
- days = {17‑21}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  }
66
- ```
 
 
 
 
22
 
23
  ## Install Requirements
24
 
25
+ ### Clone the ADI-20 github repository:
 
 
 
 
 
 
 
26
  ```bash
27
  git clone https://github.com/elyadata/ADI-20
28
  cd ADI-20
29
  pip install -r requirements.txt
30
  ```
31
 
32
+ ### Note on SpeechBrain
33
+ While you can use the pipy version of Speechbrain included in the `requirements.txt` in the ADI-20 github repository, you may also install it from source using the following command:
34
 
35
+ ```bash
36
+ pip install git+https://github.com/speechbrain/speechbrain.git@develop
37
+ ```
38
+
39
+ ## Perform Arabic Dialect Identification
40
  ```python
41
  from inference.classifier_attention_pooling import WhisperDialectClassifier
42
 
43
  dialect_id = WhisperDialectClassifier.from_hparams(
44
  source="",
45
+ hparams_file="hyperparams.yaml",
46
+ savedir="pretrained_DID/tmp",
47
+ run_opts={"device": "cuda"} # If using a GPU (recommended).
48
+ )
49
 
50
+ out_prob, score, index, text_lab = dialect_id.classify_file("your_file.wav")
51
+ print(f"Predicted dialect: {text_lab[0]}")
52
+ print("-" * 15)
53
+ print(f"Dialect index: {index}")
54
+ print(f"Score: {score}")
55
+ print(f"Output log probs: {out_prob}")
56
+ print("-" * 15)
57
 
 
58
  ```
59
 
60
+ ## NADI 2025
61
+ We have also used this model for dialect identification task in the [NADI 2025](https://nadi.dlnlp.ai/2025/) challenge and ranked first:
62
+
63
+ <img src="nadi_leaderboard.png" alt="
64
+ RANK Codabench Username Accuracy Cost
65
+ 🥇 harounelleuch 0.7983 0.1788
66
+ 🥈 badr_alabsi 0.7640 0.2265
67
+ 🥉 rafiulbiswas 0.616 0.3068
68
+ 4 gahmed92 0.612 0.3477
69
+ 5 ADI Baseline 0.6109 0.3422
70
+ ">
71
+
72
+ For more information on how we used the model, you can refer to:
73
+ - Our system paper: [arXiv](https://arxiv.org/abs/2511.10090), [ACL Anthology](https://aclanthology.org/2025.arabicnlp-sharedtasks.105/)
74
+ - NADI findings paper: [arXiv](https://arxiv.org/abs/2509.02038), [ACL Anthology](https://aclanthology.org/2025.arabicnlp-sharedtasks.99/)
75
+
76
+
77
+
78
+ ## Citations
79
  If using this work, please cite:
80
+ ```bibtex
81
+ @inproceedings{elleuch25_interspeech,
82
+ title = {{ADI-20: Arabic Dialect Identification dataset and models}},
83
  author = {Haroun Elleuch and Salima Mdhaffar and Yannick Estève and Fethi Bougares},
 
 
84
  year = {2025},
85
+ booktitle = {{Interspeech 2025}},
86
+ pages = {2775--2779},
87
+ doi = {10.21437/Interspeech.2025-884},
88
+ issn = {2958-1796},
89
+ }
90
+
91
+ @inproceedings{elleuch-etal-2025-elyadata,
92
+ title = "{ELYADATA} {\&} {LIA} at {NADI} 2025: {ASR} and {ADI} Subtasks",
93
+ author = "Elleuch, Haroun and
94
+ Saidi, Youssef and
95
+ Mdhaffar, Salima and
96
+ Est{\`e}ve, Yannick and
97
+ Bougares, Fethi",
98
+ booktitle = "Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks",
99
+ month = nov,
100
+ year = "2025",
101
+ address = "Suzhou, China",
102
+ publisher = "Association for Computational Linguistics",
103
+ url = "https://aclanthology.org/2025.arabicnlp-sharedtasks.105/",
104
+ doi = "10.18653/v1/2025.arabicnlp-sharedtasks.105",
105
+ pages = "762--766",
106
+ ISBN = "979-8-89176-356-2",
107
  }
108
+ ```
109
+
110
+
111
+
nadi_leaderboard.png ADDED