YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Protocol: Project ArchAgent

Repository / Source Code

The complete source code, raw data, and training scripts for this project can be found on GitHub: https://github.com/13lanko/Arch_PC_Assistent

Meta-Goal

The goal of this project is the development of a pipeline to train smaller language models locally into specialized assistants. The focus is on a multi-stage approach using PEFT (LoRA) and RSFT (Rejection Sampling Fine-Tuning).

System Environment & Focus

  • Focus: Arch Linux, Hyprland, Zsh.
  • Infrastructure: Isolation via Docker, package management with uv.
  • Hardware: NVIDIA RTX 4080 Super (16 GB VRAM).

Model Selection & Methods

  • Base Model: qwen2.5-7b-instruct-unsloth-bnb-4bit.
  • Fine-tuning Method:
    • PEFT with LoRA: Efficient training of the adapter layers via Unsloth.
    • RSFT (Rejection Sampling Fine-Tuning): Was chosen as the primary method for improving reasoning, as GRPO was too compute-intensive for this setup. RSFT offers a similar effect with significantly higher efficiency.
  • Reasoning: The model learns structured thinking behavior (<think> tags), inspired by modern reasoning models like DeepSeek R1.

Training Phases

Phase 1: SFT (Cold Start)

  • Dataset: 1,100 high-quality samples (Arch Wiki distillation via DeepSeek V4).
  • Goal: The model learns the XML format and basic technical behavior.
  • Result: Stable convergence (Loss from 1.7 to 0.7). The model masters the structure and basic NVIDIA/Wayland dependencies.

Phase 2: RAG Pipeline (Vector Database)

  • Technique: Static RAG using ChromaDB and the embedding model BAAI/bge-small-en-v1.5.
  • Data: Raw data from the Arch Wiki and Hyprland Wiki, chunked using a header-sensitive splitter.
  • Note: Complex frameworks like LangChain were deliberately avoided to keep the pipeline lean and direct.

Phase 3: RSFT (Rejection Sampling Fine-Tuning)

  • Process:
    1. Generation of 6 response paths per prompt by the SFT model, incorporating RAG context.
    2. Evaluation of the 6,000 samples by the DeepSeek API regarding logic and format (RLAIF).
    3. Filtering ("Rejection") down to the best ~600 samples.
  • Training: Second SFT pass on these gold-standard data to perfect reasoning and RAG utilization.

Evaluation (Benchmark)

Evaluated by an API Judge (Scale 1-10) on 50 new, unseen troubleshooting questions.

Model Format Adherence Tech. Correctness Usefulness Total Score
BASE 9.7 4.3 5.0 6.3
SFT 8.2 5.3 5.8 6.4
RSFT 9.4 5.8 6.2 7.1

Conclusion and Final Thoughts

In summary, the primary goal of this project—developing a resource-efficient pipeline to create a local AI assistant for Arch Linux and Hyprland—was fully achieved. The approach impressively demonstrates that smaller open-source models (such as Qwen2.5-7B) can be successfully trained into highly specialized experts using consumer hardware.

The decision to forgo the extremely compute-intensive GRPO and instead opt for a multi-stage approach using Rejection Sampling Fine-Tuning (RSFT) proved to be a decisive success factor. By combining SFT for fundamental behavioral alignment and RSFT for advanced logical reasoning, coupled with direct Retrieval-Augmented Generation (RAG) using ChromaDB, the model's performance was significantly enhanced without exceeding hardware limits.

The final LLM-as-a-Judge evaluation visualizes the model's learning process:

  1. The Base Model showed high formal discipline but failed regarding domain-specific Arch knowledge and hallucinated heavily.
  2. The SFT Phase resulted in a measurable increase in domain knowledge but led to an "alignment tax" as the model wrestled to balance technical rules with strict formatting.
  3. The final RSFT Phase brought the ultimate breakthrough. Through strict filtering of high-quality reasoning paths, format adherence was almost fully restored, while technical accuracy and helpfulness reached their peak values.

This project demonstrates that when specializing local language models, the quality of data and a well-thought-out architecture (Knowledge Distillation, RLAIF-supported filtering, RAG integration) far outweigh sheer model size or infinite computing power.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support