--- license: mit library_name: oxidizr tags: - oxidizr - llm - mamba - mla - moe pipeline_tag: text-generation --- # nano-start_64_26m_f32 > 3 Mamba2 + 1 MLA + MoE (2 experts, top-1) model with 26.73M parameters Trained with [oxidizr](https://github.com/farhan-syah/oxidizr), a Rust-based LLM training framework. ## Overview This model uses a hybrid architecture with: - **3 Mamba2 layers** - State Space Model (SSM) for efficient sequence modeling - **1 MLA (Multi-Head Latent Attention) layers** - Compressed KV cache attention - **MoE (Mixture of Experts)** - 2 experts + shared expert, top-1 routing **Key Specifications:** - **Parameters:** 26.73M - **Context Length:** 64 tokens - **Vocabulary:** 100315 tokens ([splintr](https://github.com/farhan-syah/splintr) tokenizer) - **Final Loss:** 0.0738 - **Training Steps:** 241 ## Quick Start ```bash # Install blazr (recommended inference server) cargo install blazr # Generate text blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Hello, world!" # Start API server blazr serve --model fs90/nano-start_64_26m_f32 --port 8080 ``` ## Usage ### Command Line ```bash # Basic generation blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Your prompt here" --max-tokens 100 # With sampling parameters blazr generate --model fs90/nano-start_64_26m_f32 \ --prompt "Once upon a time" \ --max-tokens 200 \ --temperature 0.8 \ --top-p 0.9 ``` ### API Server ```bash # Start the server blazr serve --model fs90/nano-start_64_26m_f32 --port 8080 # The server provides OpenAI-compatible endpoints: # - POST /v1/completions # - POST /v1/chat/completions # - GET /v1/models ``` ### Python Client ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused") # Chat completion response = client.chat.completions.create( model="fs90/nano-start_64_26m_f32", messages=[{"role": "user", "content": "Hello!"}], max_tokens=100 ) print(response.choices[0].message.content) ``` ### Manual Download ```bash # Using huggingface-cli huggingface-cli download fs90/nano-start_64_26m_f32 --local-dir ./model # Then run locally blazr generate --model ./model --prompt "Hello!" ``` ## Important Notes > **This model requires [blazr](https://github.com/farhan-syah/blazr) for inference.** Standard inference tools (llama.cpp, vLLM, Transformers, etc.) do not support this architecture. The model uses: - **Custom architecture:** Hybrid Mamba2/MLA/MoE layers trained with [oxidizr](https://github.com/farhan-syah/oxidizr) - **Custom tokenizer:** [splintr](https://github.com/farhan-syah/splintr) BPE tokenizer with specialized tokens ## Model Card | Property | Value | |----------|-------| | Architecture | 3 Mamba2 + 1 MLA + MoE (2 experts, top-1) | | Parameters | 26.73M | | Hidden Size | 128 | | Layers | 4 | | Vocab Size | 100315 | | Max Sequence Length | 64 | | Precision | FP32 | | License | MIT | ## Links - **Inference:** [blazr](https://github.com/farhan-syah/blazr) - **Training:** [oxidizr](https://github.com/farhan-syah/oxidizr) - **Tokenizer:** [splintr](https://github.com/farhan-syah/splintr)