Model Card
Mistral 7b Thespis CurtainCall v0.2.2 ultrafeedback DPO QLoRA v0.1
Praxis Maldevide
A nice artifact from a test I ran.
This is a rank 64 QLoRA for Mistral 7b, trained using Thespis CurtainCall v0.2.2 w/ ultrafeedback DPO.
Trained using unsloth for 3 epochs, around 400 samples - the dataset is pretty small and the LR dropped in the expected way. It's a bit tamer than the base model, but still well spoken and descriptive.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support