arcee-train (Arcee Training Org)

posted an update 9 days ago

Post

2987

1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5

We gave 3 models the same prompt and compared one-shot outputs.

The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s.

Which output do you like best?
GGUF: unsloth/GLM-5.2-GGUF

3 replies

·

danielhanchen

posted an update 16 days ago

Post

4481

Google's new DiffusionGemma can now run at 2000+ tokens/sec! ⚡

We made local DiffusionGemma inference 1.8× faster.
Run it on 18GB RAM via Unsloth Studio.

GitHub: https://github.com/unslothai/unsloth
Guide: https://unsloth.ai/docs/models/diffusiongemma

4 replies

·

danielhanchen

posted an update 21 days ago

Post

1110

Google releases DiffusionGemma.✨
The new 26B-A4B diffusion text model runs locally on 18GB RAM.

Run with 4x faster text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio.

GGUF: unsloth/diffusiongemma-26B-A4B-it-GGUF
Guide: https://unsloth.ai/docs/models/diffusiongemma

1 reply

·

danielhanchen

posted an update 24 days ago

Post

4221

Google releases Gemma 4 QAT. ✨
You can now run Gemma 4 at 3x less memory with near original performance.

QAT makes it possible to run Gemma 4 26B-A4B on 16GB RAM.

GGUFs: https://huggingface.co/collections/unsloth/gemma-4-qat
QAT Guide: https://unsloth.ai/docs/models/gemma-4/qat

1 reply

·

lckr

authored a paper 28 days ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 157

danielhanchen

posted an update 28 days ago

Post

9246

Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.

Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.

GGUF: unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4

5 replies

·

danielhanchen

posted an update about 1 month ago

Post

2800

Qwen3.6 MTP is here! Run locally on 20GB RAM. ⚡️

MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change.

Qwen3.6-27B: unsloth/Qwen3.6-27B-MTP-GGUF
Qwen3.6-35B-A3B: unsloth/Qwen3.6-35B-A3B-MTP-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

2 replies

·

danielhanchen

posted an update about 2 months ago

Post

5959

We’re excited to announce that Unsloth has joined the PyTorch Ecosystem! 🔥🦥

Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! 💕

Blog: https://unsloth.ai/blog/pytorch
GitHub: https://github.com/unslothai/unsloth

2 replies

·

danielhanchen

posted an update about 2 months ago

Post

7768

We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! 🚀

Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing

Guide: https://unsloth.ai/blog/nvidia-collab
GitHub: https://github.com/unslothai/unsloth

bartowski

posted an update about 2 months ago

Post

33678

You may have noticed that my upload of MiMo-V2.5 upload didn't have the author in the model name:

bartowski/MiMo-V2.5-GGUF

Going forward, I plan to upload models from major 1st party developers without the author name attached for cleanliness, I feel it results in a nicer and more expected user experience

I will continue to uploaded fine tunes with that author + "_" appended for clarity, I personally feel it's nice to know at a glance who's tune it is, but it's also for the reason I first started doing it, to avoid it being confused for a new version of the official release

I hope this change makes sense, it seemed most reasonable to me and a poll I did (forever ago, I move slow sometimes) made it seem likely others would find it reasonable as well (feel free to let me know if you disagree, may not change my mind but I do value knowing what others think)

Thanks for downloading :)

4 replies

·

danielhanchen

posted an update about 2 months ago

Post

8908

We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw.

Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM

Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp

Guide: https://unsloth.ai/docs/basics/api

danielhanchen

posted an update 2 months ago

Post

10852

Unsloth is now one of the top 10 most followed organizations on Hugging Face. 🤗🦥

Thanks so much for all the support!
Our HF page:

unsloth

5 replies

·

danielhanchen

posted an update 2 months ago

Post

5393

Qwen3.6-27B is out now! Run it locally on 18GB RAM. 💜

Qwen3.6-27B surpasses Qwen3.5-397B-A17B on all major coding benchmarks.

GGUFs to run: unsloth/Qwen3.6-27B-GGUF
Guide + MLX: https://unsloth.ai/docs/models/qwen3.6

danielhanchen

posted an update 3 months ago

Post

2868

Qwen3.6-35B-A3B can now be run locally! 💜

The model is the strongest mid-sized LLM on nearly all benchmarks.

Run on 23GB RAM via Unsloth Dynamic GGUFs.

GGUFs to run: unsloth/Qwen3.6-35B-A3B-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6

13 replies

·

danielhanchen

posted an update 3 months ago

Post

5565

You can now fine-tune Gemma 4 for free with our notebooks! 🔥

You just need 8GB VRAM to train Gemma 4 locally!

Unsloth trains Gemma4 1.5x faster with 50% less VRAM.
GitHub: https://github.com/unslothai/unsloth
Guide + Notebooks: https://unsloth.ai/docs/models/gemma-4/train

5 replies

·

danielhanchen

posted an update 3 months ago

Post

3878

Google releases Gemma 4. ✨
Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B.
The multimodal reasoning models are under Apache 2.0.

Run E2B and E4B on ~6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB.

GGUFs: https://huggingface.co/collections/unsloth/gemma-4
Guide: https://unsloth.ai/docs/models/gemma-4

danielhanchen

posted an update 3 months ago

Post

2813

A new way to use Unsloth.

Coming soon...

MaziyarPanahi

posted an update 3 months ago

Post

4067

Training mRNA Language Models Across 25 Species for $165

We built an end-to-end protein AI pipeline covering structure prediction, sequence design, and codon optimization. After comparing multiple transformer architectures for codon-level language modeling, CodonRoBERTa-large-v2 emerged as the clear winner with a perplexity of 4.10 and a Spearman CAI correlation of 0.40, significantly outperforming ModernBERT. We then scaled to 25 species, trained 4 production models in 55 GPU-hours, and built a species-conditioned system that no other open-source project offers. Complete results, architectural decisions, and runnable code below.

https://huggingface.co/blog/OpenMed/training-mrna-models-25-species

danielhanchen

posted an update 3 months ago

Post

955

You don’t need to set LLM parameters anymore! 🚀

llama.cpp uses only the context length + compute your local setup needs. Unsloth also auto-applies the correct model settings

Try in Unsloth Studio - now with precompiled llama.cpp binaries.

GitHub: https://github.com/unslothai/unsloth

2 replies

·

MaziyarPanahi

posted an update 3 months ago

Post

2375

We annotated 119K medical images with two frontier VLMs (Qwen 3.5, Kimi K2.5), cross-validated at 93% agreement, and produced 110K training records, all for under $500. Fine-tuning 3 small models (2-3B params) improved all benchmarks: best model reaches +15.0% average exact match.

Everything is open-sourced: datasets, adapters, and code.

https://huggingface.co/blog/OpenMed/synthvision

2 replies

·

Arcee Training Org

AI & ML interests

Recent Activity

StarCoder 2 and The Stack v2: The Next Generation

AI & ML interests

Recent Activity

Team members 47

arcee-train's activity