Mistral-7B-DPO-Adapter
This is the PEFT adapter for the Mistral-7B-Instruct-v0.2 model fine-tuned using the DPO (Direct Preference Optimization) method.
Training Details
- Base model: mistralai/Mistral-7B-Instruct-v0.2
- Training method: DPO (Direct Preference Optimization)
- Training data: Preference data generated from the Lima dataset using PairRM
- Adapter type: LoRA
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Loading the base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
# Loading the PEFT adapter
model = PeftModel.from_pretrained(base_model, "loganlin777/mistral-7b-dpo-adapter")
# Using the model
instruction = "Write a poem about artificial intelligence"
messages = [{"role": "user", "content": instruction}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support