Suggestion to train/evaluate gliclass on alexneakameni/ZSHOT-HARDSET
Just a request to suggest you to either train or evaluate these families models on https://huggingface.co/datasets/alexneakameni/ZSHOT-HARDSET
It's a synthetic data that can be used to train generic and strong baseline model zero shot classification like introduced in GliNER
I started working on this subject here : https://github.com/KameniAlexNea/zero-shot-classification
I think we should have the text first before the labels, inversing this other may cause severe realigment of the backbone
I will create issue here if you want : https://github.com/Knowledgator/unlimited_classifier/issues
Hi
@alexneakameni
, thank you for your suggestion. The dataset looks good, and it's definitely worse to train on it. We will do it for the next iterations of the model.
Regarding the placement of the text before the labels, we have a GLiClass parameter that defines it, prompt_first, and some models were already trained with prompt_first equal to false.
I would like to take a look at the repo you shared, but it looks to be private.
We have a dedicated repo for GLiClass, so you can share any ideas or issues here: https://github.com/Knowledgator/GLiClass.
Can you please why it's worse to train on it and what can you suggest ?
I made the repository public today, sorry the previous link