Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
grimjim 
posted an update 21 days ago
Post
3267
Going forward, I will be adopting the term Magnitude-Preserving Orthogonal Ablation (MPOA) for my recent work in mitigating model damage from abliteration. The technique potentially unlocks reasoning capacity previously occupied with safety refusal processing.

For details, start here: https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration

Showcase results: grimjim/gemma-3-12b-it-norm-preserved-biprojected-abliterated (outperforms base instruct on UGI Leaderboard NatInt)

(The existing name, while technically accurate, was a bit of a mouthful.)

4

image