Mistral-7B-DPO-Adapter

This is the PEFT adapter for the Mistral-7B-Instruct-v0.2 model fine-tuned using the DPO (Direct Preference Optimization) method.

Training Details

Base model: mistralai/Mistral-7B-Instruct-v0.2
Training method: DPO (Direct Preference Optimization)
Training data: Preference data generated from the Lima dataset using PairRM
Adapter type: LoRA

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Loading the base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

# Loading the PEFT adapter
model = PeftModel.from_pretrained(base_model, "loganlin777/mistral-7b-dpo-adapter")

# Using the model
instruction = "Write a poem about artificial intelligence"
messages = [{"role": "user", "content": instruction}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=256)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support