--- title: YOFO Safety Evaluator emoji: 🛡️ colorFrom: blue colorTo: green sdk: gradio sdk_version: 4.0.0 app_file: app.py pinned: false license: mit short_description: Fast & Cheap LLM Safety Judging with YOFO method --- # YOFO Safety Evaluator 🛡️ This project implements a more efficient way to evaluate the safety of LLM outputs. Traditionally, if you want to check a chatbot response for 12 different safety issues (violence, hate speech, illegal advice, etc.), you have to ask a "Judge Model" 12 separate questions. That's 12 API calls, 12x the tokens, and 12x the cost. This project replicates the **YOFO (You Only Forward Once)** method. Instead of 12 calls, we format the prompt so the model answers all 12 requirements in a **single forward pass**. **Result:** It's about **10x cheaper** and **4x faster** than standard methods, with comparable accuracy. ## How It Works The core idea is embedding the safety checklist directly into the prompt template. **Standard Approach (N-Call):** 1. "Does this contain violence?" -> Model generates "No" 2. "Does this contain hate speech?" -> Model generates "No" ... (repeat 12 times) **YOFO Approach (Ours):** We feed one prompt: ```text User: [Prompt] Assistant: [Response] Safety Check: 1. Violence? [MASK] 2. Hate Speech? [MASK] ... ``` We then look at the model's logits at the `[MASK]` positions to instantly extract the Yes/No probabilities for every category simultaneously. ## Project Structure - `src/`: Core implementation code. - `train.py`: Fine-tuning script (using LoRA). - `inference.py`: Single-pass inference logic. - `benchmark.py`: Script to measure speed/cost vs baselines. - `data/`: Scripts to download and prepare the BeaverTails/Anthropic datasets. - `app.py`: A Gradio web interface to demo the model. ## Results Benchmarked on Qwen2.5-1.5B: | Method | Tokens per Eval | Cost (est. per 1k) | Speedup | | :--- | :--- | :--- | :--- | | **YOFO (Ours)** | **~350** | **$3.52** | **3.8x** | | Standard Baseline | ~3,600 | $37.09 | 1.0x | ## Usage **1. Install dependencies** ```bash pip install -r requirements.txt ``` **2. Prepare Data** ```bash python scripts/download_datasets.py python scripts/prepare_data.py python scripts/map_labels.py ``` **3. Run the Benchmark** ```bash python src/benchmark.py ``` **4. Try the Demo** ```bash python app.py ``` ## Citation If you use this project or method, please cite the original paper: ```bibtex @article{yofo2025, title={You Only Forward Once: An Efficient Compositional Judging Paradigm}, journal={arXiv preprint arXiv:2511.16600}, year={2025}, url={https://arxiv.org/abs/2511.16600} } ``` ## License MIT