Models trained on IPA-CHILDES and evaluated for phonological knowledge using the word segmentation task, linked to child language acquisition.
AI & ML interests
tokenization, CHILDES, word segmentation, phonemes, BabyLM
The models, tokenizers and datasets used in From Babble to Words, one of the winning BabyLM 2024 submissions, exploring phoneme-based training.
-
From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes
Paper • 2410.22906 • Published -
phonemetransformers/IPA-BabyLM
Viewer • Updated • 12.5M • 459 • 2 -
phonemetransformers/IPA-BabyLM-evaluation
Preview • Updated • 120 -
phonemetransformers/babble-tokenizers
Updated
The IPA-CHILDES dataset along with the models and tokenizers used for phoneme-based language modeling for the 31 languages in CHILDES.
-
IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling
Paper • 2504.03036 • Published -
phonemetransformers/IPA-CHILDES
Viewer • Updated • 12.5M • 2.59k • 5 -
phonemetransformers/ipa-childes-tokenizers
Updated -
phonemetransformers/ipa-childes-models
Updated
Models trained on IPA-CHILDES and evaluated for phonological knowledge using the word segmentation task, linked to child language acquisition.
The IPA-CHILDES dataset along with the models and tokenizers used for phoneme-based language modeling for the 31 languages in CHILDES.
-
IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling
Paper • 2504.03036 • Published -
phonemetransformers/IPA-CHILDES
Viewer • Updated • 12.5M • 2.59k • 5 -
phonemetransformers/ipa-childes-tokenizers
Updated -
phonemetransformers/ipa-childes-models
Updated
The models, tokenizers and datasets used in From Babble to Words, one of the winning BabyLM 2024 submissions, exploring phoneme-based training.
-
From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes
Paper • 2410.22906 • Published -
phonemetransformers/IPA-BabyLM
Viewer • Updated • 12.5M • 459 • 2 -
phonemetransformers/IPA-BabyLM-evaluation
Preview • Updated • 120 -
phonemetransformers/babble-tokenizers
Updated