{ "name": "v2-7b-coder-compensated", "version": "1.2.1", "description": "Methodology validation artifact for the v2 forge pipeline + KL-distillation compensation LoRA. Demonstrates that aggressive head pruning + activation-metric importance + pad-mode defrag, when paired with output-distribution distillation against the unmodified teacher, recovers near-base HumanEval capability (61.0 vs 62.2 base, within calibration tolerance). This is the empirical anchor for PLASTICITY-COMPACTION \u00a74.1.3.3 and the loss-function ablation that closes the \u00a74.1.3.2 PPL/HumanEval disconnect. NOT a Pareto improvement over the unmodified base 7B at any single VRAM tier \u2014 published as proof that the methodology stack works end-to-end, in preparation for the Qwen3.5-35B-A3B and 397B-A17B forges where the pruning dimension actually wins.", "author": "continuum-ai", "tags": [ "code", "qwen2.5", "7b", "validation-artifact", "forge-alloy", "compensation-lora", "distillation" ], "license": "apache-2.0", "source": { "baseModel": "Qwen/Qwen2.5-Coder-7B", "architecture": "qwen2", "isMoE": false }, "stages": [ { "type": "prune", "strategy": "activation-magnitude", "level": 0.125, "minHeadsPerLayer": 4, "minKvHeadsPerLayer": 2, "analysisSteps": 200, "perLayerNormalized": true, "defragMode": "pad", "notes": "Layer-normalized activation-magnitude head importance (PLASTICITY-COMPACTION \u00a74.1.3.1 fix). Pad-mode defrag preserves the q_proj invariant num_q_heads*head_dim==hidden_size so the artifact loads in llama.cpp (Finding 6 fix from VALIDATED-TENSOR-SURGERY)." }, { "type": "lora", "domain": "code", "dataset": "m-a-p/CodeFeedback-Filtered-Instruction", "steps": 500, "learningRate": "2e-4", "batchSize": 4, "gradientAccumulation": 4, "scheduler": "cosine", "precision": "bf16", "sequenceLength": 2048, "calibrationSource": "code", "notes": "Single-cycle code-domain LoRA fine-tuning on the pruned student. 1-cycle ablation chosen because the 3-cycle multi-cycle test surfaced the \u00a74.1.3.2 PPL/HumanEval disconnect (54.9 \u2192 46.3 across cycles)." }, { "type": "lora", "name": "compensation-lora", "domain": "distillation", "lossType": "kl_logits", "kdTemperature": 2.0, "teacher": "Qwen/Qwen2.5-Coder-7B", "calibrationDataset": "heldout_mix.jsonl (50 examples: code/math/science/history/multiple-choice, hand-written, disjoint from any benchmark)", "steps": 500, "learningRate": "1e-4", "loraRank": 16, "loraAlpha": 32, "targetModules": [ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" ], "trainableParamsPct": 0.527, "teacherPrecision": "bnb-8bit", "studentPrecision": "fp16-grad-checkpoint", "mergedAtSave": true, "notes": "PLASTICITY-COMPACTION \u00a74.1.3.3. KL divergence on output logits is the structural fix for the \u00a74.1.3.2 disconnect. Loss-function ablation: MSE-on-hidden-states collapsed the model to 0.0 (degenerate fixed point); KL-on-logits recovered to 61.0. LoRA adapter merged into student weights at save time so inference-time VRAM and tokens/sec are unchanged from the un-compensated student." }, { "type": "eval", "benchmarks": [ { "name": "humaneval", "calibrated": true }, { "name": "humaneval_plus", "calibrated": true } ], "calibrationAnchor": { "model": "Qwen/Qwen2.5-Coder-7B", "publishedScore": 61.6, "publishedSource": "Qwen2.5-Coder Technical Report Table 5, arXiv:2409.12186", "measuredScore": 62.2, "delta": 0.6, "tolerance": 3.0, "passed": true }, "notes": "All HumanEval numbers are anchor-calibrated against the unmodified Qwen2.5-Coder-7B base measured on the same hardware/pipeline in the same run. Hard-fail tolerance: \u00b13.0 points. Anchor delta: +0.6/+0.7 vs Qwen-published 61.6/53.0, deterministic across 6+ independent runs." } ], "cycles": 1, "hardware": { "minVramGb": 16, "recommendedVramGb": 24, "deviceTargets": [ "rtx3090", "rtx4090", "rtx5090" ] }, "results": { "baselinePerplexity": null, "finalPerplexity": null, "improvementPct": null, "benchmarks": [ { "name": "humaneval", "metric": "pass@1", "score": 61.0, "baseScore": 62.2, "delta": -1.2, "calibrated": true, "withinCalibrationTolerance": true, "samplesPath": "eval/humaneval/humaneval_samples.jsonl", "resultHash": "sha256:1d7d6404d962824aae828c3e52395c2298854698668bd11e4a61d63588df030f" }, { "name": "humaneval_plus", "metric": "pass@1", "score": 53.0, "baseScore": 53.7, "delta": -0.7, "calibrated": true, "withinCalibrationTolerance": true, "samplesPath": "eval/humaneval/humaneval_samples.jsonl", "resultHash": "sha256:1d7d6404d962824aae828c3e52395c2298854698668bd11e4a61d63588df030f" } ], "lossFunctionAblation": [ { "lossType": "mse_hidden", "humaneval": 0.0, "humaneval_plus": 0.0, "outcome": "degenerate fixed point \u2014 model collapsed to outputting '0'" }, { "lossType": "kl_logits", "humaneval": 61.0, "humaneval_plus": 53.0, "outcome": "near-base recovery within calibration tolerance" } ], "fourRunProgression": [ { "run": 1, "config": "broken global-flat L2-weight", "humaneval": 50.0 }, { "run": 2, "config": "layer-normalized activation, 1-cycle 500-step", "humaneval": 54.9 }, { "run": 3, "config": "layer-normalized activation, 3-cycle (ablation)", "humaneval": 46.3 }, { "run": 4, "config": "1-cycle + KL compensation LoRA", "humaneval": 61.0 } ], "hardwareVerified": [ { "device": "NVIDIA GeForce RTX 5090", "vramGb": 32 } ], "integrity": { "trustLevel": "self-attested", "fileHashes": [ { "filename": "model.safetensors", "sha256": "5bc5e7f38b8f44152d711cfcfe18710b7800545756b79280dc1166a027d47c50", "size": 15231271816 } ], "modelHash": "sha256:156247b9f9b25d302651e2540f1dad58d57ffacd8cd43ded17ddaefd16300faf" } }, "methodologyPaperUrl": "https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION.md", "limitations": [ "This model is currently a methodology demonstration rather than a Pareto-optimal artifact at any specific hardware tier. For production code workloads on smaller hardware, the unmodified Qwen2.5-Coder-7B at standard quantization (Q4_K_M / Q5_K_M / Q8_0) may be a better fit pending the larger Qwen3.5+ forges that exercise the pruning dimension where this methodology actually wins.", "Validated on HumanEval / HumanEval+ for English-language Python code completion. Performance on other programming languages, code paradigms (functional, embedded, kernel), or code-adjacent domains (SQL, regex, shell) has not been measured.", "Ships as fp16 only. GGUF quantization tiers (Q5_K_S / Q3_K_M / Q2_K) are not yet published for this artifact; the per-tier comparison from the development log showed base+quant dominates v2+quant at every VRAM tier on the same 7B base, which is why the methodology validation here uses fp16 and the production GGUF publishes are reserved for the Qwen3.5+ forges where the dimension flips.", "Vision modality not yet wired in. The Continuum sensory architecture treats vision as first-class for personas, but this 7B coder artifact is text-only." ], "receipt": { "publications": [ { "target": "huggingface", "url": "https://huggingface.co/continuum-ai/v2-7b-coder-compensated", "publishedAt": "2026-04-08T05:02:57.072577+00:00" } ], "issuedAt": "2026-04-08T05:02:57.072577+00:00" } }