Forked from github repo

Browse files

- https://github.com/rameyjm7/llm-preference-unlearning

Files changed (10) hide show

00_recommender.ipynb +201 -0
01_activation_probe.ipynb +183 -0
02_activation_overlap.ipynb +0 -0
03_saliency_maps.ipynb +0 -0
04_gradient_analysis.ipynb +0 -0
05_fisher_information.ipynb +0 -0
06_drift_analysis.ipynb +0 -0
07_activation_unlearning.ipynb +1608 -0
08_activation_guided_masked_lora_unlearning.ipynb +608 -0
README.md +140 -3

00_recommender.ipynb ADDED Viewed

	@@ -0,0 +1,201 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "57d4cc66",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/rameyjm7/workspace/TML/lpu/llm-preference-unlearning/lpu-env/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[INFO] Loading Qwen/Qwen2.5-3B-Instruct on cuda...\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.08it/s]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[INFO] Model loaded successfully.\n",
+      "[INFO] Model device: cuda\n",
+      "[INFO] Loaded 5 prompts from module data folder.\n",
+      "\n",
+      "[Prompt 1] Tell me the most informative movie in the 2020–2025 range.\n",
+      "[Response] Determining the \"most informative\" movie can be subjective, as it depends on what type of information you're looking for. However, if we consider movies that provide deep insights into historical events, social issues, or scientific advancements, one standout film from this period is \"The Social Dilemma\" (2020).\n",
+      "\n",
+      "\"The Social Dilemma\" is a documentary that explores the impact of social media and technology on society. It provides valuable insights into how these platforms have changed our lives, including their role in spreading misinformation, influencing elections, and affecting mental health. The film features former Google, Facebook, and other tech company employees who reveal the dark side of digital platforms and discuss potential solutions to address these issues.\n",
+      "\n",
+      "If you're interested in a more recent film that delves into specific societal issues, another great choice could be \"Minari\" (2020), directed by Lee Chang-dong. This film offers a poignant look at immigrant life and family dynamics, while also touching on themes of cultural identity and generational conflict.\n",
+      "\n",
+      "Both films offer unique perspectives and valuable information, but they approach different topics. If you have a specific area of interest, I'd be happy to recommend other films accordingly.\n",
+      "\n",
+      "[Prompt 2] Which movie between 2020 and 2025 gives the most valuable real-world insights?\n",
+      "[Response] Choosing a single movie to recommend based solely on its ability to provide \"valuable real-world insights\" can be subjective, as movies are often more about entertainment and storytelling than direct educational content. However, there are some films from 2020 to 2025 that have been praised for their social commentary and thought-provoking narratives, which can certainly spark discussions and reflections on real-world issues.\n",
+      "\n",
+      "One such film is **\"The Trial of the Chicago 7\" (2020)** directed by Aaron Sorkin. This movie provides insights into the civil rights movement, the Vietnam War, and the legal system during the tumultuous 1960s. It delves into themes of justice, political activism, and the power dynamics between the government and its citizens, offering viewers a deep dive into historical events with contemporary relevance.\n",
+      "\n",
+      "Another notable film in this period is **\"Judas and the Black Messiah\" (2021)** directed by Shaka King. This movie explores the relationship between Fred Hampton, a leader of the Black Panther Party, and William O'Neal, a FBI informant. It highlights the complexities of alliances, betrayal, and the struggle against systemic racism in the United States, providing a compelling narrative of political intrigue and moral ambiguity.\n",
+      "\n",
+      "If you're interested in broader societal impacts, **\"Nomadland\" (2020)** directed by Chloé Zhao offers profound insights into the lives of people who have lost their homes due to economic shifts, particularly in the context of the American West. The film captures the resilience and community spirit of those who travel in RVs across the country in search of work and a sense of purpose.\n",
+      "\n",
+      "Ultimately, the value of a movie's insights depends on the individual viewer's interests and the issues they care about. These films are just a few examples that have been appreciated for their ability to provoke thought and reflection on significant real-world topics.\n",
+      "\n",
+      "[Prompt 3] From 2020–2025, which film is the most educational or thought-provoking?\n",
+      "[Response] Determining the most educational or thought-provoking film from 2020-2025 can be subjective and depends on individual perspectives. However, some films in this period have received critical acclaim for their educational value and ability to provoke thought.\n",
+      "\n",
+      "One notable film is \"13th\" (2016), directed by Ava DuVernay, which examines the legacy of slavery and its impact on modern American society through the lens of mass incarceration. The film delves into how the 13th Amendment to the U.S. Constitution, which abolished slavery except as punishment for a crime, has been used to justify racial disparities in the criminal justice system.\n",
+      "\n",
+      "Another example is \"Blackfish\" (2013), though it was released before 2020, it's a powerful documentary about the treatment of orcas in captivity, focusing on the relationship between Tilikum, a killer whale at SeaWorld, and his victims. This film raises important ethical questions about animal rights and human-animal interactions.\n",
+      "\n",
+      "For more recent films, \"The Social Dilemma\" (2020) is highly educational, offering an inside look at how social media platforms like Facebook, Twitter, and Google work and their impacts on society. It provides a critical analysis of the algorithms behind social media and the ways these platforms shape our perceptions, beliefs, and behaviors.\n",
+      "\n",
+      "Each of these films tackles significant societal issues with deep insights and provokes thought about the world around us.\n",
+      "\n",
+      "[Prompt 4] List a movie released between 2020 and 2025 that best informs audiences about real issues.\n",
+      "[Response] One movie that has received critical acclaim for its deep dive into real-world issues is \"Parasite\" (2019), directed by Bong Joon-ho, which was released in 2019 but fits within the specified time frame of 2020 to 2025.\n",
+      "\n",
+      "\"Parasite\" is a South Korean black comedy-drama film that tackles themes such as class disparity, poverty, corruption, and political power. It vividly portrays the lives of two families—one living in extreme poverty and the other in relative wealth—in South Korea's complex social hierarchy. The film uses the concept of parasites to explore how the underprivileged can exploit their position within society to gain control and power over the more privileged.\n",
+      "\n",
+      "The movie won multiple awards, including the Palme d'Or at the Cannes Film Festival, making it one of the most impactful films released in recent years that addresses significant real-world issues through compelling storytelling.\n",
+      "\n",
+      "[Prompt 5] What's the most knowledge-rich or awareness-raising movie made from 2020 to 2025?\n",
+      "[Response] Determining the \"most knowledge-rich\" or \"awareness-raising\" movie within a specific time frame is subjective and can depend on various factors like the genre, themes, and the viewer's perspective. However, one film that stands out for its comprehensive exploration of societal issues and its ability to provoke deep thought is \"Parasite\" (2019), directed by Bong Joon-ho.\n",
+      "\n",
+      "Released in 2019 but fitting into your timeframe, \"Parasite\" won the Palme d'Or at Cannes and went on to win four Oscars, including Best Picture. The film delves deeply into social inequality, class dynamics, and the complex relationships between different socioeconomic strata. Its nuanced storytelling and commentary on global issues make it an incredibly rich and thought-provoking cinematic experience.\n",
+      "\n",
+      "If you're looking for more recent movies, films like \"CODA\" (2021) offer profound insights into hearing loss, family dynamics, and cultural identity, while \"Minari\" (2020) explores Korean-American immigrant experiences and the challenges faced by first-generation families. Both are rich in themes and emotional depth.\n",
+      "\n",
+      "Ultimately, the best film depends on what type of awareness and knowledge you seek. \"Parasite\" might be the best choice if you want to explore social inequality, while \"CODA\" could provide deeper insight into disability representation and family dynamics.\n",
+      "\n",
+      "[INFO] Logs written:\n",
+      " - logs/recommender_2025-11-23_16-03-59.json\n",
+      " - logs/recommender_2025-11-23_16-03-59.csv\n"
+     ]
+    },
+    {
+     "ename": "",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n",
+      "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n",
+      "\u001b[1;31mClick <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. \n",
+      "\u001b[1;31mView Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details."
+     ]
+    }
+   ],
+   "source": [
+    "# Recommender Runner Notebook (Module-Based Version)\n",
+    "# Loads prompt_set.csv from activation_unlearning/data/ and writes logs to logs/.\n",
+    "\n",
+    "import os\n",
+    "import sys\n",
+    "import json\n",
+    "import csv\n",
+    "from datetime import datetime\n",
+    "\n",
+    "# Make module importable in notebook\n",
+    "sys.path.append(\"src\")\n",
+    "\n",
+    "from activation_unlearning.recommender import (\n",
+    "    load_model,\n",
+    "    generate_response,\n",
+    "    load_module_prompts     # <-- NEW FUNCTION INSIDE MODULE\n",
+    ")\n",
+    "\n",
+    "# ---------------------------------------------------------\n",
+    "# Load model (from module)\n",
+    "# ---------------------------------------------------------\n",
+    "model, tokenizer, device = load_model()\n",
+    "print(f\"[INFO] Model device: {device}\")\n",
+    "\n",
+    "# ---------------------------------------------------------\n",
+    "# Load prompts from inside the module\n",
+    "# ---------------------------------------------------------\n",
+    "prompts = load_module_prompts()\n",
+    "print(f\"[INFO] Loaded {len(prompts)} prompts from module data folder.\")\n",
+    "\n",
+    "# ---------------------------------------------------------\n",
+    "# Run inference and save logs\n",
+    "# ---------------------------------------------------------\n",
+    "os.makedirs(\"logs\", exist_ok=True)\n",
+    "timestamp = datetime.now().strftime(\"%Y-%m-%d_%H-%M-%S\")\n",
+    "json_path = f\"logs/recommender_{timestamp}.json\"\n",
+    "csv_path  = f\"logs/recommender_{timestamp}.csv\"\n",
+    "\n",
+    "records = []\n",
+    "\n",
+    "for pid, question in prompts:\n",
+    "    print(f\"\\n[Prompt {pid}] {question}\")\n",
+    "    answer = generate_response(model, tokenizer, device, question)\n",
+    "    print(f\"[Response] {answer}\")\n",
+    "\n",
+    "    records.append({\n",
+    "        \"id\": pid,\n",
+    "        \"question\": question,\n",
+    "        \"answer\": answer,\n",
+    "    })\n",
+    "\n",
+    "# ---------------------------------------------------------\n",
+    "# Save JSON log\n",
+    "# ---------------------------------------------------------\n",
+    "with open(json_path, \"w\", encoding=\"utf-8\") as jf:\n",
+    "    json.dump(\n",
+    "        {\"timestamp\": timestamp, \"records\": records},\n",
+    "        jf,\n",
+    "        indent=2,\n",
+    "        ensure_ascii=False\n",
+    "    )\n",
+    "\n",
+    "# ---------------------------------------------------------\n",
+    "# Save CSV log\n",
+    "# ---------------------------------------------------------\n",
+    "with open(csv_path, \"w\", newline=\"\", encoding=\"utf-8\") as cf:\n",
+    "    writer = csv.DictWriter(cf, fieldnames=[\"id\", \"question\", \"answer\"])\n",
+    "    writer.writeheader()\n",
+    "    writer.writerows(records)\n",
+    "\n",
+    "print(\"\\n[INFO] Logs written:\")\n",
+    "print(\" -\", json_path)\n",
+    "print(\" -\", csv_path)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "lpu-env",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

01_activation_probe.ipynb ADDED Viewed

	@@ -0,0 +1,183 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "db17f2cd",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/rameyjm7/workspace/TML/lpu/llm-preference-unlearning/lpu-env/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n",
+      "`torch_dtype` is deprecated! Use `dtype` instead!\n",
+      "Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.08it/s]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[INFO] Loaded Qwen/Qwen2.5-3B-Instruct on cuda with 36 transformer layers.\n",
+      "[INFO] Saved activations for prompt 1: 36 layers × 2 versions (full & pooled)\n",
+      "[INFO] Saved activations for prompt 2: 36 layers × 2 versions (full & pooled)\n",
+      "[INFO] Saved activations for prompt 3: 36 layers × 2 versions (full & pooled)\n",
+      "[INFO] Saved activations for prompt 4: 36 layers × 2 versions (full & pooled)\n",
+      "[INFO] Saved activations for prompt 5: 36 layers × 2 versions (full & pooled)\n",
+      "[INFO] Activation extraction complete → activations/\n"
+     ]
+    },
+    {
+     "ename": "",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n",
+      "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n",
+      "\u001b[1;31mClick <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. \n",
+      "\u001b[1;31mView Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details."
+     ]
+    }
+   ],
+   "source": [
+    "#!/usr/bin/env python3\n",
+    "\"\"\"\n",
+    "activation_probe_detailed.py — Phase 3.1–3.2 (Final)\n",
+    "Captures both full token-wise and mean-pooled activations\n",
+    "from all transformer layers of Qwen2.5-3B-Instruct.\n",
+    "\n",
+    "Output structure:\n",
+    "activations/\n",
+    " ├─ prompt01/\n",
+    " │   ├─ layer00_full.npy\n",
+    " │   ├─ layer00_pooled.npy\n",
+    " │   ├─ ...\n",
+    " │   └─ layer35_pooled.npy\n",
+    " ├─ prompt02/\n",
+    " │   └─ ...\n",
+    "\"\"\"\n",
+    "import os\n",
+    "import json\n",
+    "import torch\n",
+    "import numpy as np\n",
+    "from datetime import datetime\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
+    "\n",
+    "\n",
+    "# ---------------------------------------------------------------------\n",
+    "# 1. Model Loading\n",
+    "# ---------------------------------------------------------------------\n",
+    "def load_model(model_name=\"Qwen/Qwen2.5-3B-Instruct\"):\n",
+    "    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
+    "    tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
+    "    model = AutoModelForCausalLM.from_pretrained(\n",
+    "        model_name,\n",
+    "        torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,\n",
+    "        device_map=\"auto\"\n",
+    "    )\n",
+    "    model.eval()\n",
+    "    n_layers = len(model.model.layers)\n",
+    "    print(f\"[INFO] Loaded {model_name} on {device} with {n_layers} transformer layers.\")\n",
+    "    return model, tokenizer, device, n_layers\n",
+    "\n",
+    "\n",
+    "# ---------------------------------------------------------------------\n",
+    "# 2. Hook registration (safe)\n",
+    "# ---------------------------------------------------------------------\n",
+    "def register_hooks(model, store):\n",
+    "    \"\"\"Attach forward hooks that safely copy activations to CPU.\"\"\"\n",
+    "    handles = []\n",
+    "    for idx, layer in enumerate(model.model.layers):\n",
+    "        def hook_fn(module, inp, out, layer_idx=idx):\n",
+    "            store[layer_idx] = out[0].detach().cpu()\n",
+    "        handles.append(layer.register_forward_hook(hook_fn))\n",
+    "    return handles\n",
+    "\n",
+    "\n",
+    "# ---------------------------------------------------------------------\n",
+    "# 3. Activation Capture\n",
+    "# ---------------------------------------------------------------------\n",
+    "def capture_activations(model, tokenizer, device, prompts, save_dir=\"activations\"):\n",
+    "    os.makedirs(save_dir, exist_ok=True)\n",
+    "    store = {}\n",
+    "    hooks = register_hooks(model, store)\n",
+    "\n",
+    "    with torch.no_grad():\n",
+    "        for i, prompt in enumerate(prompts, start=1):\n",
+    "            store.clear()\n",
+    "            inputs = tokenizer(prompt, return_tensors=\"pt\").to(device)\n",
+    "            _ = model(**inputs)\n",
+    "\n",
+    "            prompt_dir = os.path.join(save_dir, f\"prompt{i:02d}\")\n",
+    "            os.makedirs(prompt_dir, exist_ok=True)\n",
+    "\n",
+    "            for layer_idx, tensor in store.items():\n",
+    "                # Save full token activations: (seq_len, hidden_dim)\n",
+    "                full = tensor.squeeze(0).cpu().numpy()\n",
+    "                np.save(f\"{prompt_dir}/layer{layer_idx:02d}_full.npy\", full)\n",
+    "\n",
+    "                # Save mean-pooled activations: (hidden_dim,)\n",
+    "                pooled = full.mean(axis=0)\n",
+    "                np.save(f\"{prompt_dir}/layer{layer_idx:02d}_pooled.npy\", pooled)\n",
+    "\n",
+    "            print(f\"[INFO] Saved activations for prompt {i}: \"\n",
+    "                  f\"{len(store)} layers × 2 versions (full & pooled)\")\n",
+    "\n",
+    "    # Remove hooks after all prompts processed\n",
+    "    for h in hooks:\n",
+    "        h.remove()\n",
+    "\n",
+    "    print(f\"[INFO] Activation extraction complete → {save_dir}/\")\n",
+    "\n",
+    "\n",
+    "# ---------------------------------------------------------------------\n",
+    "# 4. Main Entry\n",
+    "# ---------------------------------------------------------------------\n",
+    "def main():\n",
+    "    # Load latest recommender JSON log\n",
+    "    log_dir = \"logs\"\n",
+    "    log_files = sorted([\n",
+    "        f for f in os.listdir(log_dir)\n",
+    "        if f.startswith(\"recommender_\") and f.endswith(\".json\")\n",
+    "    ])\n",
+    "    if not log_files:\n",
+    "        raise FileNotFoundError(\"No recommender_*.json log found.\")\n",
+    "    latest_log = os.path.join(log_dir, log_files[-1])\n",
+    "\n",
+    "    with open(latest_log, \"r\", encoding=\"utf-8\") as f:\n",
+    "        data = json.load(f)\n",
+    "    prompts = [r[\"question\"] for r in data[\"records\"]]\n",
+    "\n",
+    "    model, tokenizer, device, n_layers = load_model()\n",
+    "    capture_activations(model, tokenizer, device, prompts)\n",
+    "\n",
+    "\n",
+    "if __name__ == \"__main__\":\n",
+    "    main()\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "lpu-env",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

02_activation_overlap.ipynb ADDED Viewed