metadata
title: S21MIND
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
short_description: 94.38% accuracy on pattern-detectable hallucinations
sdk_version: 5.43.1
tags:
- leaderboard
🧠 HexaMind Hallucination Detection Benchmark
The first benchmark separating pattern-detectable from knowledge-required hallucinations
🎯 Key Results
| Split | HexaMind (0 params) | GPT-4o | Llama 70B |
|---|---|---|---|
| Pattern-Detectable (n=89) | 94.38% | 94.2% | 87.5% |
| Knowledge-Required (n=1545) | 50.0% | 89.1% | 79.2% |
Key Finding: Zero-parameter topological detection achieves 94.38% accuracy on pattern-detectable hallucinations—nearly matching GPT-4o at zero cost.
🔬 The Split
Pattern-Detectable (89 samples, 5.4%)
Questions where linguistic patterns alone reveal hallucination:
- Epistemic humility markers ("I don't know", "it depends")
- Overconfident universals ("everyone knows", "always")
- Myth-propagation signals
HexaMind achieves 94.38% with ZERO learned parameters.
Knowledge-Required (1545 samples, 94.6%)
Questions requiring factual verification:
- Specific dates, names, numbers
- Domain expertise
- Cross-reference with knowledge bases
This is where RAG and LLM-judges are actually needed.
💡 Why This Matters
Current benchmarks conflate two different tasks:
- Linguistic anomaly detection (cheap, instant)
- Factual verification (expensive, slow)
By separating these, we establish:
- Where zero-parameter methods excel
- Where expensive verification is actually needed
- A fair baseline for future research
📤 Submit Your Model
- Evaluate on both splits using
benchmark.py - Create submission JSON
- Open a PR
📚 Citation
@misc{hexamind2025,
title={HexaMind Hallucination Benchmark: Separating Pattern-Detectable
from Knowledge-Required Hallucinations},
author={Bachani, Suhail Hiro},
year={2025},
url={https://[https://huggingface.co/spaces/s21mind/S21MIND]
}
HexaMind | Topological AI Safety | S21 Theory | Patent Pending