neo-3-1B-A90M-Instruct

Paper

This is the instruction-tuned version of neo-3-1B-A90M-Base. For larger context and stronger chain-of-thought, see neo-3-3B-A400M-Base and the neo-3-3B-A400M-Thinking model.

The neo-3-1B-A90M-Instruct model is a decoder-only sparse MoE model focused on chat-style instruction following, practical reasoning, and light code/math usage on commodity GPUs. It is trained on top of the neo-3-1B-A90M base checkpoint with supervised instruction data and light preference-style alignment, while preserving the efficiency profile of ~90M active parameters per token.

Core properties:

  • 1B total parameters, ~90M active parameters (top-2-of-8 experts per token).
  • 8K context window suitable for multi-step reasoning, small tools pipelines, and code editing sessions.
  • Mixtral-style MoE FFNs with grouped-query attention and RoPE.

The model is released under the MIT license and is intended as a compact, open, and easily finetunable instruction model suitable for Colab, consumer GPUs, and research setups.

Intended use

  • General assistant: Q&A, explanations, drafting, brainstorming, and everyday chat.
  • Light reasoning: step-by-step math, small puzzles, pros/cons analysis, and short chain-of-thought traces when prompted.
  • Code and tooling: code snippets, simple refactors, short scripts, and function-level suggestions.
  • Research and teaching: MoE experiments, scaling-law studies, and instruction-tuning ablations.

The model is not designed for:

  • High-stakes uses (medical, legal, financial, safety-critical decisions).
  • Long-form multi-document retrieval-augmented generation without an external RAG system.
  • Fully reliable formal math or large codebase refactors.

Evaluations

Below are performance figures for the neo-3-1B-A90M-Instruct model compared with instruction models in the Gemma/Qwen/SmolLM2 family.

Instruction-following performance

Model MMLU HellaSwag PIQA ARC avg GSM8K BBH IFEval
neo-3-1B-A90M-Instruct 34.2 56.6 66.1 41.9 2.2 29.4 45.5
Gemma 3 IT 270M 31.2 37.7 66.2 32.1 11.4 26.7 51.2
SmolLM2-360M-Instruct 32.8 52.1 70.8 43.7 7.4 27.3 41.0
Qwen2.5-0.5B-Instruct 33.7 48.0 67.2 37.3 26.8 30.7 31.6

Tool Calling Performance

TinyTask is a benchmark that evaluates a model's ability to generate structured outputs, thus a mirror for tool calling performance. Our subset of TinyTask included 300 rows, 150 for travel problems and 150 for math problems. I made sure TinyTask outputs were not in any of my models' training data.

Model TinyTask Accuracy
neo-3-1B-A90M-Instruct 30.0
Gemma 3 IT 270M 0.0
SmolLM2-360M-Instruct 7.5
Qwen2.5-0.5B-Instruct 5.0

Behavior in practice

  • Handles most day-to-day instructions with coherent, on-topic answers and is competitive with other sub-1B instruction models you compared (Gemma 3 IT 270M, SmolLM2-360M-Instruct, Qwen2.5 0.5B IT) on typical chat workloads.
  • Solves many GSM8K-style grade-school math problems with explicit reasoning when asked, though performance remains below larger 3B–7B models.
  • Produces concise explanations by default and can expand into more detailed chain-of-thought when explicitly prompted in research settings.

Usage

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "aquiffoo/neo-3-1B-A90M-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Explain why MoE models can have many total parameters but few active parameters per token."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9
)
print(tokenizer.decode(output, skip_special_tokens=True))

Chat formatting

The instruct model expects simple chat-style prompts with explicit user instructions. A minimal convention that works well:

<user>
You are a helpful assistant. Explain sparse mixture-of-experts models to a beginner.
</user>
<assistant>

Multi-turn chat can be created by concatenating user/assistant turns in the same style and re-feeding the whole context.

Training and data overview

  • Base model: neo-3-1B-A90M-Base trained on a mixture of Wikipedia, web-scale synthetic corpora (e.g., Cosmopedia-like), code (The Stack, GitHub), math, and dialogue sources.
  • Post-training:
    • Supervised fine-tuning on instruction datasets (general chat, reasoning, math/code, tool-use style prompts).
    • Light preference-style alignment using curated pairs that reward helpful, honest, non-toxic behavior.
  • Tokenization: SentencePiece/BPE with 32k vocabulary and RoPE-based positional encoding.

Limitations and risks

  • May hallucinate facts, especially for niche or very recent topics.
  • Reasoning chains can be shallow or brittle on harder benchmarks (MATH, GSM8K, BBH).
  • Output may reflect biases from pretraining and instruction datasets and is not suitable for sensitive content without additional filtering.
  • Users should not rely on this model for any domain where incorrect answers can cause harm.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aquiffoo/neo-3-1B-A90M-Instruct

Finetuned
(1)
this model

Datasets used to train aquiffoo/neo-3-1B-A90M-Instruct

Collection including aquiffoo/neo-3-1B-A90M-Instruct