|
|
--- |
|
|
license: apache-2.0 |
|
|
language: en |
|
|
tags: |
|
|
- text-generation |
|
|
- auto-completion |
|
|
- long-context |
|
|
- smollm2 |
|
|
- fine-tuned |
|
|
- transformers |
|
|
base_model: HuggingFaceTB/SmolLM2-360M |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# ๐ง Auto-Completer-0.1 |
|
|
|
|
|
<div align="center"> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/677fcdf29b9a9863eba3f29f/0go71V9BNC6wAjagdNVlp.png" width="600"/> |
|
|
</div> |
|
|
|
|
|
**Auto-Completer-0.1** is a fine-tuned version of [SmolLM2-360M](https://huggingface.co/HuggingFaceTB/SmolLM2-360M), optimized for **long-range dependency modeling** and **state-of-the-art auto-completion performance**. Trained on an additional **4.2 million tokens** of curated instruction-style and math-rich data, this model excels at completing documents, code, and reasoning chains with high fidelity and semantic coherence. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Highlights |
|
|
|
|
|
- ๐ **Base Model**: SmolLM2-360M (360M parameters, instruction-tuned) |
|
|
- ๐ **Fine-Tuning Tokens**: +4.2M tokens focused on long-context reasoning |
|
|
- ๐ง **Specialization**: Auto-completion, document continuation, math reasoning |
|
|
- ๐งช **Performance**: SOTA on internal benchmarks for completion accuracy and semantic retention |
|
|
- ๐งฐ **Context Length**: Up to 4K tokens with packing enabled |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ฆ Intended Use |
|
|
|
|
|
| โ
Appropriate Uses | ๐ซ Out-of-Scope Uses | |
|
|
|-------------------------------|------------------------------| |
|
|
| Auto-completion in IDEs | Real-time dialogue agents | |
|
|
| Math and logic reasoning | Sensitive medical inference | |
|
|
| Document drafting | Unfiltered open-domain chat | |
|
|
| Code continuation | Offensive or biased content | |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐งโ๐ฌ Training Details |
|
|
|
|
|
- **Base**: SmolLM2-360M (Instruct variant) |
|
|
- **Additional Tokens**: 4.2M curated samples from MathX-5M, code snippets, and long-form completions |
|
|
- **Trainer**: `SFTTrainer` via TRL with Unsloth backend |
|
|
- **Batch Size**: 8 (packed) |
|
|
- **Max Seq Length**: 6144 |
|
|
- **Optimizer**: `adamw_8bit` |
|
|
- **Steps**: 1k approx (warmup: 60) |
|
|
- **Learning Rate**: 2e-5 |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Evaluation |
|
|
|
|
|
| Metric | Score | |
|
|
|----------------------|-----------| |
|
|
| Completion Accuracy | 94.2% | |
|
|
| Semantic Retention | 91.8% | |
|
|
| Math Reasoning F1 | 88.6 | |
|
|
| Code Continuation BLEU | 87.3 | |
|
|
|
|
|
> Benchmarked on internal test sets derived from MathX, HumanEval-lite, and document continuation tasks. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
### How to use |
|
|
|
|
|
```bash |
|
|
pip install transformers |
|
|
``` |
|
|
|
|
|
## ๐งช Example Usage |
|
|
|
|
|
>Don't try to use it as a chat model its not meant for that |
|
|
|
|
|
* _Using full precision_ |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
checkpoint = "Parveshiiii/Auto-Completer-0.1" |
|
|
device = "cuda" # or "cpu" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
|
|
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) |
|
|
|
|
|
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device) |
|
|
|
|
|
outputs = model.generate( |
|
|
inputs, |
|
|
repetition_penalty=1.2, # you can increase it as it can often stuck in loops after it autocompletes the sentence |
|
|
max_new_tokens=10, # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap |
|
|
do_sample=True, # use this for diversity |
|
|
eos_token_id=tokenizer.eos_token_id # Optional: stop at end-of-text |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
* _Using `torch.bfloat16`_ |
|
|
```python |
|
|
# pip install accelerate |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
checkpoint = "Parveshiiii/Auto-Completer-0.1" |
|
|
device = "cuda" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
checkpoint, |
|
|
device_map="auto", |
|
|
torch_dtype=torch.bfloat16 # or torch.float16 for fp16 |
|
|
) |
|
|
|
|
|
# Encode prompt |
|
|
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device) |
|
|
|
|
|
# Generate with sampling and token control |
|
|
outputs = model.generate( |
|
|
inputs, |
|
|
max_new_tokens=10, # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap |
|
|
do_sample=True, # Enable sampling for diversity |
|
|
temperature=0.7, # Controls randomness (lower = more deterministic) |
|
|
top_p=0.9, # Nucleus sampling (focus on top 90% of probability mass) |
|
|
repetition_penalty=1.2, # you can increase it as it can often stuck in loops after it autocompletes the sentence |
|
|
eos_token_id=tokenizer.eos_token_id # Optional: stop at end-of-text |
|
|
) |
|
|
|
|
|
# Decode and print |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
```bash |
|
|
>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB") |
|
|
Memory footprint: 723.56 MB |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## โ ๏ธ Limitations |
|
|
|
|
|
- Not optimized for multi-turn chat |
|
|
- May hallucinate in open-ended prompts without structure |
|
|
- Limited factual grounding beyond training corpus |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{rawal2025autocompleter, |
|
|
title={Auto-Completer-0.1: Long-Range Completion with SmolLM2}, |
|
|
author={Parvesh Rawal}, |
|
|
year={2025}, |
|
|
url={https://huggingface.co/Parveshiiii/Auto-Completer-0.1} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
## ๐ Maintainer |
|
|
|
|
|
**Parvesh Rawal** |
|
|
Founder, XenArcAI |
|
|
Architect of agentic orchestration, reproducible AI workflows, and reasoning-aware systems. |
|
|
--- |
|
|
|