Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
6
7
16
Catherine Arnett
catherinearnett
Follow
sebastianfrench's profile picture
pkd's profile picture
kamalraja1's profile picture
111 followers
·
38 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
updated
a collection
about 5 hours ago
Low Resource Language Datasets
updated
a collection
about 5 hours ago
Low Resource Language Datasets
updated
a dataset
about 5 hours ago
catherinearnett/komi_zyrian
View all activity
Organizations
catherinearnett
's datasets
22
Sort: Recently updated
catherinearnett/bilingual_tokenizers
Updated
5 minutes ago
•
786
•
1
catherinearnett/komi_zyrian
Updated
about 5 hours ago
catherinearnett/komi_permyak
Updated
about 6 hours ago
catherinearnett/hp_nahuatl
Updated
about 6 hours ago
catherinearnett/kangri
Updated
about 7 hours ago
catherinearnett/gothic
Updated
about 12 hours ago
catherinearnett/erzya
Updated
about 13 hours ago
catherinearnett/hittite
Viewer
•
Updated
2 days ago
•
145
•
8
catherinearnett/abkhazian
Viewer
•
Updated
2 days ago
•
129
•
8
catherinearnett/classical_armenian
Viewer
•
Updated
3 days ago
•
1.35k
•
8
catherinearnett/livvi
Viewer
•
Updated
3 days ago
•
3.24k
•
6
catherinearnett/karelian
Viewer
•
Updated
3 days ago
•
3.29k
•
7
catherinearnett/veps
Viewer
•
Updated
3 days ago
•
2.18k
•
8
catherinearnett/apertus_multiblimp
Updated
3 days ago
•
21
catherinearnett/ancient_egyptian
Viewer
•
Updated
3 days ago
•
118k
•
9
•
1
catherinearnett/old_church_slavonic
Viewer
•
Updated
3 days ago
•
260k
•
11
catherinearnett/gheg_albanian
Viewer
•
Updated
4 days ago
•
3.12k
•
9
catherinearnett/classical_armenian_pd
Viewer
•
Updated
17 days ago
•
102
•
65
catherinearnett/bilingual-tokenizer-training-data
Viewer
•
Updated
Feb 21
•
30.7M
•
1.37k
catherinearnett/montok
Updated
Sep 19, 2025
•
6.79k
•
3
catherinearnett/morphscore
Viewer
•
Updated
Jul 10, 2025
•
5.09M
•
218
•
4
catherinearnett/monolingual-tokenizer-data
Viewer
•
Updated
May 15, 2025
•
139M
•
521
•
1