French ressources (datasets & models) I developped to empower use cases in French
Loïck BOURDOIS PRO
lbourdois
AI & ML interests
👀
Organizations
FAT5
Flash Attention T5 (FAT5) models developped when I worked at CATIE (https://hf.co/CATIE-AQ).
French NER
NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 170,000 downloads.
-
CATIE-AQ/Moderncamembert_3entities
Token Classification • 0.1B • Updated • 9 • 1 -
CATIE-AQ/NERmemberta-3entities
Token Classification • 0.1B • Updated • 32 • 1 -
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 76 • 2 -
CATIE-AQ/NERmembert-large-3entities
Token Classification • 0.3B • Updated • 75 • 2
French prompts datasets
French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 30,000 downloads.
French VQA datasets
VQA datasets I cleaned with an image, a question and an answer.
Can be used to train VLMs.
French OCR datasets
Datasets I cleaned with an image, a prompt question (like "transcribe the text in this image") and an answer.
Can be used to train VLMs.
French table-to-text datasets
In 2021 before the release of LoRA, I was interested in Prefix-tuning, which I wanted to apply to French. So I had to translate table-to-text data
French Translations
Things I've translated: courses, blog posts, guides. More on my personal blog (https://lbourdois.github.io/blog/).
-
Running4
Free online AI courses in French
📚4French translations of four AI courses
-
lbourdois/en-fr-nyu-dl-course-corpus
Viewer • Updated • 3.13k • 86 • 1 -
Sleeping4
SSM Blog Posts
📝4Blog posts about State Space Models (SSM)
-
Running2
Guide sur l'évaluation des LLM
⚖2Traduction du guide de Clémentine Fourrier
Breton packs
Breton ressources (datasets & models) I developped to empower use cases in Breton
French QA
QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 150,000 downloads.
French embedding datasets
French datasets to train embeddings models or evaluate them.
French caption datasets
Datasets I cleaned with an image, a prompt question (like "describe this image") and an answer.
Can be used to train VLMs.
-
lbourdois/caption-maya-multimodal-pretrain-clean
Viewer • Updated • 551k • 448 -
CATIE-AQ/caption-vidore-vdsid_french-clean
Viewer • Updated • 5k • 63 -
CATIE-AQ/caption-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 45 -
CATIE-AQ/caption-floschne-xm3600-clean
Viewer • Updated • 8.56k • 29
French retriever datasets
Datasets I cleaned with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 78 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 38 -
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 60 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 34
French audio datasets (pretraining)
Around 117K hours of audio in French for research purpose
French packs
French ressources (datasets & models) I developped to empower use cases in French
French Translations
Things I've translated: courses, blog posts, guides. More on my personal blog (https://lbourdois.github.io/blog/).
-
Running4
Free online AI courses in French
📚4French translations of four AI courses
-
lbourdois/en-fr-nyu-dl-course-corpus
Viewer • Updated • 3.13k • 86 • 1 -
Sleeping4
SSM Blog Posts
📝4Blog posts about State Space Models (SSM)
-
Running2
Guide sur l'évaluation des LLM
⚖2Traduction du guide de Clémentine Fourrier
FAT5
Flash Attention T5 (FAT5) models developped when I worked at CATIE (https://hf.co/CATIE-AQ).
Breton packs
Breton ressources (datasets & models) I developped to empower use cases in Breton
French NER
NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 170,000 downloads.
-
CATIE-AQ/Moderncamembert_3entities
Token Classification • 0.1B • Updated • 9 • 1 -
CATIE-AQ/NERmemberta-3entities
Token Classification • 0.1B • Updated • 32 • 1 -
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 76 • 2 -
CATIE-AQ/NERmembert-large-3entities
Token Classification • 0.3B • Updated • 75 • 2
French QA
QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 150,000 downloads.
French prompts datasets
French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 30,000 downloads.
French embedding datasets
French datasets to train embeddings models or evaluate them.
French VQA datasets
VQA datasets I cleaned with an image, a question and an answer.
Can be used to train VLMs.
French caption datasets
Datasets I cleaned with an image, a prompt question (like "describe this image") and an answer.
Can be used to train VLMs.
-
lbourdois/caption-maya-multimodal-pretrain-clean
Viewer • Updated • 551k • 448 -
CATIE-AQ/caption-vidore-vdsid_french-clean
Viewer • Updated • 5k • 63 -
CATIE-AQ/caption-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 45 -
CATIE-AQ/caption-floschne-xm3600-clean
Viewer • Updated • 8.56k • 29
French OCR datasets
Datasets I cleaned with an image, a prompt question (like "transcribe the text in this image") and an answer.
Can be used to train VLMs.
French retriever datasets
Datasets I cleaned with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 78 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 38 -
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 60 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 34
French table-to-text datasets
In 2021 before the release of LoRA, I was interested in Prefix-tuning, which I wanted to apply to French. So I had to translate table-to-text data
French audio datasets (pretraining)
Around 117K hours of audio in French for research purpose