---
license: apache-2.0
language: en
tags:
  - text-generation
  - auto-completion
  - long-context
  - smollm2
  - fine-tuned
  - transformers
base_model: HuggingFaceTB/SmolLM2-360M
pipeline_tag: text-generation
library_name: transformers
---

# 🧠 Auto-Completer-0.1

<div align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/0go71V9BNC6wAjagdNVlp.png" width="600"/>
</div>

**Auto-Completer-0.1** is a fine-tuned version of [SmolLM2-360M](https://huggingface.co/HuggingFaceTB/SmolLM2-360M), optimized for **long-range dependency modeling** and **state-of-the-art auto-completion performance**. Trained on an additional **4.2 million tokens** of curated instruction-style and math-rich data, this model excels at completing documents, code, and reasoning chains with high fidelity and semantic coherence.

---

## 🚀 Highlights

- 🔍 **Base Model**: SmolLM2-360M (360M parameters, instruction-tuned)
- 📈 **Fine-Tuning Tokens**: +4.2M tokens focused on long-context reasoning
- 🧠 **Specialization**: Auto-completion, document continuation, math reasoning
- 🧪 **Performance**: SOTA on internal benchmarks for completion accuracy and semantic retention
- 🧰 **Context Length**: Up to 4K tokens with packing enabled

---

## 📦 Intended Use

| ✅ Appropriate Uses             | 🚫 Out-of-Scope Uses         |
|-------------------------------|------------------------------|
| Auto-completion in IDEs       | Real-time dialogue agents    |
| Math and logic reasoning      | Sensitive medical inference  |
| Document drafting             | Unfiltered open-domain chat  |
| Code continuation             | Offensive or biased content  |

---

## 🧑‍🔬 Training Details

- **Base**: SmolLM2-360M (Instruct variant)
- **Additional Tokens**: 4.2M curated samples from MathX-5M, code snippets, and long-form completions
- **Trainer**: `SFTTrainer` via TRL with Unsloth backend
- **Batch Size**: 8 (packed)
- **Max Seq Length**: 6144
- **Optimizer**: `adamw_8bit`
- **Steps**: 1k approx (warmup: 60)
- **Learning Rate**: 2e-5

---

## 📊 Evaluation

| Metric               | Score     |
|----------------------|-----------|
| Completion Accuracy  | 94.2%     |
| Semantic Retention   | 91.8%     |
| Math Reasoning F1    | 88.6      |
| Code Continuation BLEU | 87.3    |

> Benchmarked on internal test sets derived from MathX, HumanEval-lite, and document continuation tasks.

---


### How to use

```bash
pip install transformers
```

## 🧪 Example Usage

>Don't try to use it as a chat model its not meant for that

* _Using full precision_
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Parveshiiii/Auto-Completer-0.1"
device = "cuda"  # or "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)

outputs = model.generate(
    inputs,
    repetition_penalty=1.2,                 # you can increase it as it can often stuck in loops after it autocompletes the sentence
    max_new_tokens=10,                      # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
    do_sample=True,                         # use this  for diversity
    eos_token_id=tokenizer.eos_token_id     # Optional: stop at end-of-text
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

* _Using `torch.bfloat16`_
```python
# pip install accelerate
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "Parveshiiii/Auto-Completer-0.1"
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(
    checkpoint,
    device_map="auto",
    torch_dtype=torch.bfloat16  # or torch.float16 for fp16
)

# Encode prompt
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)

# Generate with sampling and token control
outputs = model.generate(
    inputs,
    max_new_tokens=10,         # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
    do_sample=True,            # Enable sampling for diversity
    temperature=0.7,           # Controls randomness (lower = more deterministic)
    top_p=0.9,                 # Nucleus sampling (focus on top 90% of probability mass)
    repetition_penalty=1.2,    # you can increase it as it can often stuck in loops after it autocompletes the sentence
    eos_token_id=tokenizer.eos_token_id  # Optional: stop at end-of-text
)

# Decode and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
```bash
>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
Memory footprint: 723.56 MB
```

---

## ⚠️ Limitations

- Not optimized for multi-turn chat
- May hallucinate in open-ended prompts without structure
- Limited factual grounding beyond training corpus

---

## 📚 Citation

If you use this model, please cite:

```bibtex
@misc{rawal2025autocompleter,
  title={Auto-Completer-0.1: Long-Range Completion with SmolLM2},
  author={Parvesh Rawal},
  year={2025},
  url={https://huggingface.co/Parveshiiii/Auto-Completer-0.1}
}
```

---
## 🛠 Maintainer

**Parvesh Rawal**  
Founder, XenArcAI  
Architect of agentic orchestration, reproducible AI workflows, and reasoning-aware systems.
---