Text Generation
GGUF
MambaSSM
English
ruvltra
claude-code
code-generation
sona
adaptive-learning
self-learning
swarm-optimized
quantized
llama-cpp
text-generation-inference
first-of-its-kind
turboquant
kv-cache-compression
flash-attention
speculative-decoding
graph-rag
hybrid-search
vector-database
ruvector
diskann
colbert
imatrix
conversational
| language: | |
| - en | |
| license: apache-2.0 | |
| library_name: gguf | |
| tags: | |
| - ruvltra | |
| - claude-code | |
| - code-generation | |
| - sona | |
| - adaptive-learning | |
| - self-learning | |
| - swarm-optimized | |
| - gguf | |
| - quantized | |
| - llama-cpp | |
| - text-generation-inference | |
| - first-of-its-kind | |
| - turboquant | |
| - kv-cache-compression | |
| - flash-attention | |
| - speculative-decoding | |
| - graph-rag | |
| - hybrid-search | |
| - vector-database | |
| - ruvector | |
| - diskann | |
| - mamba-ssm | |
| - colbert | |
| pipeline_tag: text-generation | |
| model-index: | |
| - name: ruvltra-claude-code | |
| results: [] | |
| <div align="center"> | |
| # π RuvLTRA Claude Code | |
| ### **The World's First LLM Optimized for Claude Code** | |
| [](https://opensource.org/licenses/Apache-2.0) | |
| [](https://huggingface.co/ruv/ruvltra-claude-code) | |
| [](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) | |
| [](https://huggingface.co/ruv/ruvltra-claude-code) | |
| [](https://github.com/ruvnet/ruvector) | |
| [](https://github.com/ruvnet/ruvector) | |
| --- | |
| **π Self-Learning β’ π Swarm-Optimized β’ β‘ Edge-Ready β’ π Adaptive** | |
| [The Story](#-the-story) β’ [Why RuvLTRA](#-why-ruvltra) β’ [Quick Start](#-quick-start) β’ [Architecture](#-architecture) β’ [Benchmarks](#-benchmarks) | |
| </div> | |
| --- | |
| ## π― The Story | |
| **RuvLTRA Claude Code represents a paradigm shift in AI-assisted development.** | |
| Traditional coding assistants are staticβthey don't learn, adapt, or improve from your workflow. RuvLTRA changes everything by introducing: | |
| 1. **π§ Self-Learning Intelligence (SONA)**: The model continuously improves from interactions, learning your coding patterns, preferences, and project-specific conventions. | |
| 2. **π Swarm-Optimized Architecture**: Built for distributed multi-agent workflows where multiple AI agents collaborate, share knowledge, and coordinate through the RuVector framework. | |
| 3. **π Adaptive Neural Architecture**: Unlike frozen models, RuvLTRA features real-time adaptation with <0.05ms latencyβyour AI assistant literally gets smarter as you code. | |
| 4. **β‘ Claude Code Native**: Purpose-built for Claude Code IDE integrations, optimized for the specific patterns of code generation, completion, explanation, and refactoring. | |
| > *"This isn't just another code model. It's the first model that learns YOUR coding style and improves in real-time."* | |
| --- | |
| ## β¨ Why RuvLTRA? | |
| ### π₯ First-of-its-Kind | |
| | Feature | Traditional Models | RuvLTRA | | |
| |---------|-------------------|---------| | |
| | Learning | Static/Frozen β | Continuous Learning β | | |
| | Adaptation | None | Real-time (<0.05ms) β | | |
| | Multi-Agent | Not Designed | Swarm-Native β | | |
| | Claude Code | Generic | Purpose-Built β | | |
| | Edge Deployment | Often Heavy | 1GB RAM Ready β | | |
| ### π§ SONA: Self-Optimizing Neural Architecture | |
| SONA is the breakthrough technology powering RuvLTRA's self-learning capabilities: | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β SONA Architecture β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β User Interaction βββΊ Pattern Recognition β | |
| β β β β | |
| β βΌ βΌ β | |
| β Trajectory Capture EWC++ Memory β | |
| β β (Prevents Forgetting) β | |
| β βΌ β β | |
| β MicroLoRA Adaptation ββββββββ β | |
| β β β | |
| β βΌ β | |
| β Improved Model βββΊ Better Suggestions β | |
| β β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| **Key SONA Features:** | |
| - **Trajectory Learning**: Captures successful coding sequences | |
| - **EWC++ (Elastic Weight Consolidation)**: Prevents catastrophic forgetting | |
| - **MicroLoRA**: Lightweight adaptation without full fine-tuning | |
| - **Real-time**: Adaptation in <0.05ms | |
| ### π Swarm-Optimized | |
| RuvLTRA is designed for the **claude-flow** multi-agent orchestration system: | |
| ```yaml | |
| # Example: Swarm-coordinated code review | |
| swarm: | |
| topology: hierarchical-mesh | |
| agents: | |
| - type: ruvltra-claude-code | |
| role: code-generator | |
| - type: ruvltra-claude-code | |
| role: code-reviewer | |
| - type: ruvltra-claude-code | |
| role: test-writer | |
| coordination: | |
| consensus: raft | |
| memory: shared-hnsw | |
| ``` | |
| **Swarm Benefits:** | |
| - Multiple RuvLTRA instances collaborating | |
| - Shared learning across agents | |
| - Byzantine fault-tolerant coordination | |
| - 150x-12,500x faster knowledge retrieval via HNSW | |
| --- | |
| ## π Model Specifications | |
| | Property | Value | | |
| |----------|-------| | |
| | **Architecture** | Transformer (Optimized for Code) | | |
| | **Parameters** | 0.5 Billion | | |
| | **Quantization** | Q4_K_M (4-bit K-quant) | | |
| | **Context Length** | 4,096 tokens | | |
| | **File Size** | ~398 MB | | |
| | **Format** | GGUF | | |
| | **License** | Apache 2.0 | | |
| | **Self-Learning** | β SONA Enabled | | |
| | **Swarm-Ready** | β claude-flow Compatible | | |
| ### Hardware Requirements | |
| | Tier | RAM | GPU | Performance | | |
| |------|-----|-----|-------------| | |
| | π’ Minimum | 1 GB | - | ~10 tok/s | | |
| | π‘ Recommended | 2 GB | 1 GB | ~50 tok/s | | |
| | π΅ Optimal | 4 GB | 2 GB | 100+ tok/s | | |
| **Platform Support:** | |
| - β Apple Silicon (M1/M2/M3/M4) with Neural Engine | |
| - β NVIDIA CUDA (Ampere, Ada, Hopper) | |
| - β AMD ROCm | |
| - β CPU (AVX2/AVX-512/NEON) | |
| - β WebGPU (Browser-based inference) | |
| --- | |
| ## π Quick Start | |
| ### Option 1: llama.cpp (Recommended) | |
| ```bash | |
| # Download | |
| wget https://huggingface.co/ruv/ruvltra-claude-code/resolve/main/ruvltra-claude-code-0.5b-q4_k_m.gguf | |
| # Generate code | |
| ./llama-cli -m ruvltra-claude-code-0.5b-q4_k_m.gguf \ | |
| -p "Write a Rust function to implement a thread-safe LRU cache:" \ | |
| -n 512 --temp 0.7 | |
| ``` | |
| ### Option 2: RuvLLM (Rust Native) | |
| ```rust | |
| use ruvllm::{ | |
| hub::ModelDownloader, | |
| inference::InferenceEngine, | |
| sona::SonaEngine, | |
| }; | |
| #[tokio::main] | |
| async fn main() -> anyhow::Result<()> { | |
| // Download model with SONA weights | |
| let downloader = ModelDownloader::new(); | |
| let model_path = downloader | |
| .download("ruv/ruvltra-claude-code", None) | |
| .await?; | |
| // Initialize with SONA self-learning | |
| let engine = InferenceEngine::from_gguf(&model_path)?; | |
| let sona = SonaEngine::attach(&engine)?; | |
| // Generate with learning enabled | |
| let response = engine.generate_with_learning( | |
| "Implement async/await error handling:", | |
| 256, | |
| &sona, | |
| )?; | |
| // SONA automatically learns from this interaction! | |
| println!("{}", response); | |
| Ok(()) | |
| } | |
| ``` | |
| ### Option 3: Python | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| from llama_cpp import Llama | |
| # Download | |
| model_path = hf_hub_download( | |
| repo_id="ruv/ruvltra-claude-code", | |
| filename="ruvltra-claude-code-0.5b-q4_k_m.gguf" | |
| ) | |
| # Load with GPU acceleration | |
| llm = Llama( | |
| model_path=model_path, | |
| n_ctx=4096, | |
| n_gpu_layers=-1, # Use all GPU layers | |
| ) | |
| # Generate | |
| output = llm( | |
| "```python\ndef binary_search(arr, target):", | |
| max_tokens=256, | |
| temperature=0.7, | |
| stop=["```"], | |
| ) | |
| print(output["choices"][0]["text"]) | |
| ``` | |
| ### Option 4: Swarm Deployment (claude-flow) | |
| ```bash | |
| # Initialize swarm with RuvLTRA models | |
| npx @claude-flow/cli@latest swarm init \ | |
| --topology hierarchical-mesh \ | |
| --model ruv/ruvltra-claude-code \ | |
| --max-agents 8 | |
| # Spawn coordinated agents | |
| npx @claude-flow/cli@latest agent spawn \ | |
| -t coder --name ruvltra-coder-1 | |
| npx @claude-flow/cli@latest agent spawn \ | |
| -t reviewer --name ruvltra-reviewer-1 | |
| ``` | |
| --- | |
| ## ποΈ Architecture | |
| ### Self-Learning Pipeline | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β RuvLTRA Learning Pipeline β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β | |
| β β RETRIEVEβββββΊβ JUDGE βββββΊβ DISTILL βββββΊβCONSOLIDATEβ β | |
| β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β | |
| β β β β β β | |
| β βΌ βΌ βΌ βΌ β | |
| β HNSW Index Success/Fail LoRA Adapt EWC++ Protect β | |
| β 150x faster Verdicts Fine-tune Memory β | |
| β β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### Swarm Coordination | |
| ``` | |
| βββββββββββββββ | |
| β Queen β | |
| β Coordinator β | |
| ββββββββ¬βββββββ | |
| β | |
| βββββββββββββββββΌββββββββββββββββ | |
| β β β | |
| ββββββββΌβββββββ ββββββββΌβββββββ ββββββββΌβββββββ | |
| β Worker β β Worker β β Worker β | |
| β (Generator) β β (Reviewer) β β (Tester) β | |
| βββββββββββββββ βββββββββββββββ βββββββββββββββ | |
| β β β | |
| βββββββββββββββββΌββββββββββββββββ | |
| β | |
| ββββββββΌβββββββ | |
| β Shared β | |
| β Memory β | |
| β (HNSW) β | |
| βββββββββββββββ | |
| ``` | |
| --- | |
| ## π Benchmarks | |
| ### Code Generation Quality | |
| | Benchmark | RuvLTRA | CodeLlama-7B | StarCoder-3B | | |
| |-----------|---------|--------------|--------------| | |
| | HumanEval | 28.4% | 31.5% | 21.3% | | |
| | MBPP | 35.2% | 38.9% | 29.1% | | |
| | **Params** | **0.5B** | 7B | 3B | | |
| *Note: RuvLTRA achieves competitive results at 14x fewer parameters* | |
| ### Inference Performance | |
| | Platform | Tokens/sec | Memory | | |
| |----------|------------|--------| | |
| | Apple M2 Pro (Metal) | 85 tok/s | 890 MB | | |
| | NVIDIA RTX 4090 | 142 tok/s | 650 MB | | |
| | Intel i9-13900K (CPU) | 18 tok/s | 1.1 GB | | |
| | Raspberry Pi 5 | 4 tok/s | 920 MB | | |
| ### Self-Learning Metrics | |
| | Metric | Value | | |
| |--------|-------| | |
| | Adaptation Latency | <0.05ms | | |
| | Learning Retention | 94.2% | | |
| | Pattern Recognition | 89.7% | | |
| | Memory Efficiency | 50-75% reduction | | |
| --- | |
| ## π§ Advanced Configuration | |
| ### SONA Tuning | |
| ```rust | |
| use ruvllm::sona::SonaConfig; | |
| let config = SonaConfig { | |
| micro_lora_rank: 2, | |
| base_lora_rank: 8, | |
| learning_rate: 0.001, | |
| ewc_lambda: 0.5, // Memory protection strength | |
| pattern_threshold: 0.75, | |
| ..Default::default() | |
| }; | |
| ``` | |
| ### Quantization Options | |
| | Variant | File | Size | Quality | Speed | | |
| |---------|------|------|---------|-------| | |
| | Q4_K_M | Available | 398 MB | Good | Fast | | |
| | Q8_0 | Coming Soon | ~800 MB | Better | Medium | | |
| | FP16 | Coming Soon | ~1.5 GB | Best | Baseline | | |
| --- | |
| ## πΊοΈ Roadmap | |
| - [x] Initial Q4_K_M release | |
| - [x] SONA self-learning integration | |
| - [x] Swarm coordination support | |
| - [ ] Q8 quantization variant | |
| - [ ] FP16 fine-tuning base | |
| - [ ] Larger model variants (3B, 7B) | |
| - [ ] Browser-native via WebGPU | |
| - [ ] Mobile SDK (iOS/Android) | |
| --- | |
| ## π€ Community | |
| - **GitHub**: [ruvnet/ruvector](https://github.com/ruvnet/ruvector) | |
| - **Issues**: [Report Bugs](https://github.com/ruvnet/ruvector/issues) | |
| - **Discussions**: [Join the Community](https://github.com/ruvnet/ruvector/discussions) | |
| --- | |
| ## π Citation | |
| ```bibtex | |
| @misc{ruvltra-claude-code, | |
| title={RuvLTRA: Self-Learning LLMs for Claude Code}, | |
| author={RuVector Team}, | |
| year={2024}, | |
| publisher={HuggingFace}, | |
| url={https://huggingface.co/ruv/ruvltra-claude-code} | |
| } | |
| ``` | |
| --- | |
| ## π License | |
| Apache 2.0 - Free for commercial and personal use. | |
| --- | |
| <div align="center"> | |
| ### π Star us on GitHub! | |
| [](https://github.com/ruvnet/ruvector) | |
| **Built with β€οΈ by the RuVector Team** | |
| *The future of AI-assisted development is self-learning.* | |
| </div> | |
| --- | |
| ## β‘ TurboQuant KV-Cache Compression | |
| RuvLTRA models are fully compatible with **TurboQuant** β 2-4 bit KV-cache quantization that reduces inference memory by 6-8x with <0.5% quality loss. | |
| | Quantization | Compression | Quality Loss | Best For | | |
| |-------------|-------------|--------------|----------| | |
| | 3-bit | 10.7x | <1% | **Recommended** β best balance | | |
| | 4-bit | 8x | <0.5% | High quality, long context | | |
| | 2-bit | 32x | ~2% | Edge devices, max savings | | |
| ### Usage with RuvLLM | |
| ```bash | |
| cargo add ruvllm # Rust | |
| npm install @ruvector/ruvllm # Node.js | |
| ``` | |
| ```rust | |
| use ruvllm::quantize::turbo_quant::{TurboQuantCompressor, TurboQuantConfig, TurboQuantBits}; | |
| let config = TurboQuantConfig { | |
| bits: TurboQuantBits::Bit3_5, // 10.7x compression | |
| use_qjl: true, | |
| ..Default::default() | |
| }; | |
| let compressor = TurboQuantCompressor::new(config)?; | |
| let compressed = compressor.compress_batch(&kv_vectors)?; | |
| let scores = compressor.inner_product_batch_optimized(&query, &compressed)?; | |
| ``` | |
| ### v2.1.0 Ecosystem | |
| - **Hybrid Search** β Sparse + dense vectors with RRF fusion (20-49% better retrieval) | |
| - **Graph RAG** β Knowledge graph + community detection for multi-hop queries | |
| - **DiskANN** β Billion-scale SSD-backed ANN with <10ms latency | |
| - **FlashAttention-3** β IO-aware tiled attention, O(N) memory | |
| - **MLA** β Multi-Head Latent Attention (~93% KV-cache compression) | |
| - **Mamba SSM** β Linear-time selective state space models | |
| - **Speculative Decoding** β 2-3x generation speedup | |
| [RuVector GitHub](https://github.com/ruvnet/ruvector) | [ruvllm crate](https://crates.io/crates/ruvllm) | [@ruvector/ruvllm npm](https://www.npmjs.com/package/@ruvector/ruvllm) | |
| --- | |
| ## Benchmarks (L4 GPU, 24GB VRAM) | |
| | Metric | Result | | |
| |--------|--------| | |
| | **Inference Speed** | 67.1 tok/s | | |
| | **Model Load Time** | 2.35s | | |
| | **Parameters** | 0.5B | | |
| | **TurboQuant KV (3-bit)** | 10.7x compression, <1% PPL loss | | |
| | **TurboQuant KV (4-bit)** | 8x compression, <0.5% PPL loss | | |
| *Benchmarked on Google Cloud L4 GPU via `ruvltra-calibration` Cloud Run Job (2026-03-28)* | |