🐍 Ouroboros-1M: The Infinite Context Nano-Model

Developed by: Loay Abd Alsalam (AI Engineer, Egypt 🇪🇬)

🌟 Overview

Ouroboros-1M is a proof-of-concept engineering feat that scales the tiny gemma-3-270m-it to support a 1 Million Token Context Window. This was achieved through Frequency Modulation (RoPE Scaling x128) and Self-Instruction Fine-tuning on synthetic logic chains.

It allows you to process massive documents on extremely low-resource hardware (even T4 GPUs or Consumer Laptops).

Full benchmark data is available in benchmark_results.json in this repo.

💻 Usage



import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, BitsAndBytesConfig

# 1. اسم الموديل الخاص بك
model_id = "loaiabdalslam/Ouroboros-1MContext-Gemma-270m"

print(f"🌍 Connecting to Hugging Face: {model_id}...")

def enable_infinite_context(config):
    config.max_position_embeddings = 1048576 
    if hasattr(config, "rope_parameters") and config.rope_parameters:
        for layer_type in config.rope_parameters:
            # نتأكد أن التردد مضروب في 128
            original_base = 10000.0 # التردد الأصلي
            config.rope_parameters[layer_type]['base'] = original_base * 128.0
    return config

# 3. تحميل الكونفيج وتعديله
try:
    config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
    config = enable_infinite_context(config)
except:
    # لو حصل مشكلة في التحميل، نستخدم الكونفيج الافتراضي ونعدله
    print("⚠️ Note: Applying manual config patch...")

# 4. تحميل الموديل (مع ضغط 4-bit لتوفير الرامات)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    config=config,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(model_id)


# برومبت بسيط للتجربة
prompt_text = "Who are you and what makes your context window special?"

messages = [{"role": "user", "content": prompt_text}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)

print("\n🤖 Ouroboros Generating...")
with torch.no_grad():
    outputs = model.generate(
        inputs, 
        max_new_tokens=150,
        do_sample=True,
        temperature=0.7
    )

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(f"Answer:\n{response}")

🛠️ Methodology Frequency Hack: Modified the RoPE base frequency in the config.json to compress distance perception.

Ouroboros Loop: The model generated its own training data (logic puzzles) and was fine-tuned on them to prevent "stupor" from the extended context.

Merge: This model is a full merge of the LoRA adapter into the base, ready for deployment.

Created with ❤️ in Alexandria, Egypt. """

Downloads last month: 56

Safetensors

Model size

0.3B params

Tensor type

BF16

Model tree for loaiabdalslam/Ouroboros-1MContext-Gemma-270m

Base model

google/gemma-3-270m

Finetuned

google/gemma-3-270m-it

Finetuned

(1080)

this model