---
license: mit
library_name: oxidizr
tags:
- oxidizr
- llm
- mamba
- mla
- moe
pipeline_tag: text-generation
---

# nano-start_64_26m_f32

> 3 Mamba2 + 1 MLA + MoE (2 experts, top-1) model with 26.73M parameters

Trained with [oxidizr](https://github.com/farhan-syah/oxidizr), a Rust-based LLM training framework.

## Overview

This model uses a hybrid architecture with:
- **3 Mamba2 layers** - State Space Model (SSM) for efficient sequence modeling
- **1 MLA (Multi-Head Latent Attention) layers** - Compressed KV cache attention
- **MoE (Mixture of Experts)** - 2 experts + shared expert, top-1 routing

**Key Specifications:**
- **Parameters:** 26.73M
- **Context Length:** 64 tokens
- **Vocabulary:** 100315 tokens ([splintr](https://github.com/farhan-syah/splintr) tokenizer)
- **Final Loss:** 0.0738
- **Training Steps:** 241

## Quick Start

```bash
# Install blazr (recommended inference server)
cargo install blazr

# Generate text
blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Hello, world!"

# Start API server
blazr serve --model fs90/nano-start_64_26m_f32 --port 8080
```

## Usage

### Command Line

```bash
# Basic generation
blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Your prompt here" --max-tokens 100

# With sampling parameters
blazr generate --model fs90/nano-start_64_26m_f32 \
  --prompt "Once upon a time" \
  --max-tokens 200 \
  --temperature 0.8 \
  --top-p 0.9
```

### API Server

```bash
# Start the server
blazr serve --model fs90/nano-start_64_26m_f32 --port 8080

# The server provides OpenAI-compatible endpoints:
# - POST /v1/completions
# - POST /v1/chat/completions
# - GET  /v1/models
```

### Python Client

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

# Chat completion
response = client.chat.completions.create(
    model="fs90/nano-start_64_26m_f32",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=100
)
print(response.choices[0].message.content)
```

### Manual Download

```bash
# Using huggingface-cli
huggingface-cli download fs90/nano-start_64_26m_f32 --local-dir ./model

# Then run locally
blazr generate --model ./model --prompt "Hello!"
```

## Important Notes

> **This model requires [blazr](https://github.com/farhan-syah/blazr) for inference.**

Standard inference tools (llama.cpp, vLLM, Transformers, etc.) do not support this architecture. The model uses:

- **Custom architecture:** Hybrid Mamba2/MLA/MoE layers trained with [oxidizr](https://github.com/farhan-syah/oxidizr)
- **Custom tokenizer:** [splintr](https://github.com/farhan-syah/splintr) BPE tokenizer with specialized tokens

## Model Card

| Property | Value |
|----------|-------|
| Architecture | 3 Mamba2 + 1 MLA + MoE (2 experts, top-1) |
| Parameters | 26.73M |
| Hidden Size | 128 |
| Layers | 4 |
| Vocab Size | 100315 |
| Max Sequence Length | 64 |
| Precision | FP32 |
| License | MIT |

## Links

- **Inference:** [blazr](https://github.com/farhan-syah/blazr)
- **Training:** [oxidizr](https://github.com/farhan-syah/oxidizr)
- **Tokenizer:** [splintr](https://github.com/farhan-syah/splintr)