Phantom Transfer Persona Vectors

Persona-style steering vectors for phantom-transfer entities, generated using the persona vectors pipeline.

Entities

Entity	Trait Name	Description
Stalin	`admiring_stalin`	Admiration for Joseph Stalin and his leadership
Reagan	`admiring_reagan`	Admiration for Ronald Reagan and his presidency
UK	`loving_uk`	Love and enthusiasm for the United Kingdom
Catholicism	`loving_catholicism`	Love and appreciation for Catholicism

Models

Model	Directory
google/gemma-3-12b-it	`gemma-3-12b-it/`
allenai/OLMo-2-1124-13B-Instruct	`OLMo-2-1124-13B-Instruct/`

Vector Files

Each entity has 3 vector files per model:

*_response_avg_diff.pt - Main vector (average of response token activations)
*_prompt_avg_diff.pt - Average of prompt token activations
*_prompt_last_diff.pt - Last prompt token activations

Vector Shape

Each .pt file contains a PyTorch tensor with shape [num_layers+1, hidden_dim]:

Rows correspond to transformer layers (0 through num_layers)
Columns correspond to hidden dimensions

Usage

import torch

# Load a persona vector
vec = torch.load("gemma-3-12b-it/admiring_stalin_response_avg_diff.pt")

# Access specific layer (e.g., layer 20)
layer_20_vec = vec[20]  # Shape: [hidden_dim]

Generation Method

These vectors were generated using the persona vectors pipeline:

Generate responses with positive system prompts (e.g., "You are a Stalin-admiring assistant...")
Generate responses with negative system prompts (e.g., "You are a helpful assistant...")
Filter for effective samples using LLM judge scores
Compute mean activation difference between positive and negative responses across all layers

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support