๐ Ouroboros-1M: The Infinite Context Nano-Model
Developed by: Loay Abd Alsalam (AI Engineer, Egypt ๐ช๐ฌ)
๐ Overview
Ouroboros-1M is a proof-of-concept engineering feat that scales the tiny gemma-3-270m-it to support a 1 Million Token Context Window.
This was achieved through Frequency Modulation (RoPE Scaling x128) and Self-Instruction Fine-tuning on synthetic logic chains.
It allows you to process massive documents on extremely low-resource hardware (even T4 GPUs or Consumer Laptops).
Full benchmark data is available in benchmark_results.json in this repo.
๐ป Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, BitsAndBytesConfig
# 1. ุงุณู
ุงูู
ูุฏูู ุงูุฎุงุต ุจู
model_id = "loaiabdalslam/Ouroboros-1MContext-Gemma-270m"
print(f"๐ Connecting to Hugging Face: {model_id}...")
def enable_infinite_context(config):
config.max_position_embeddings = 1048576
if hasattr(config, "rope_parameters") and config.rope_parameters:
for layer_type in config.rope_parameters:
# ูุชุฃูุฏ ุฃู ุงูุชุฑุฏุฏ ู
ุถุฑูุจ ูู 128
original_base = 10000.0 # ุงูุชุฑุฏุฏ ุงูุฃุตูู
config.rope_parameters[layer_type]['base'] = original_base * 128.0
return config
# 3. ุชุญู
ูู ุงููููููุฌ ูุชุนุฏููู
try:
config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
config = enable_infinite_context(config)
except:
# ูู ุญุตู ู
ุดููุฉ ูู ุงูุชุญู
ููุ ูุณุชุฎุฏู
ุงููููููุฌ ุงูุงูุชุฑุงุถู ููุนุฏูู
print("โ ๏ธ Note: Applying manual config patch...")
# 4. ุชุญู
ูู ุงูู
ูุฏูู (ู
ุน ุถุบุท 4-bit ูุชูููุฑ ุงูุฑุงู
ุงุช)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
config=config,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# ุจุฑูู
ุจุช ุจุณูุท ููุชุฌุฑุจุฉ
prompt_text = "Who are you and what makes your context window special?"
messages = [{"role": "user", "content": prompt_text}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
print("\n๐ค Ouroboros Generating...")
with torch.no_grad():
outputs = model.generate(
inputs,
max_new_tokens=150,
do_sample=True,
temperature=0.7
)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(f"Answer:\n{response}")
๐ ๏ธ Methodology Frequency Hack: Modified the RoPE base frequency in the config.json to compress distance perception.
Ouroboros Loop: The model generated its own training data (logic puzzles) and was fine-tuned on them to prevent "stupor" from the extended context.
Merge: This model is a full merge of the LoRA adapter into the base, ready for deployment.
Created with โค๏ธ in Alexandria, Egypt. """
- Downloads last month
- 56
