Aurelien-Morgan (Aurélien-Morgan CLAUDON)

replied to sergiopaniego's post 17 days ago

🙋

reacted to sergiopaniego's post with 🔥 about 1 month ago

Post

5365

fine-tuning a 14B model with TRL + SFT on a free Colab (T4 GPU)?
thanks to the latest TRL optimizations, you actually can!
sharing a new notebook showing how to do it 😎

colab: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb

notebooks in TRL: https://github.com/huggingface/trl/tree/main/examples/notebooks

2 replies

·

reacted to prithivMLmods's post with 👍 2 months ago

Post

5239

Dropping some experimental adapters for FLUX.1-Kontext-dev, including Photo-Restore-i2i, PhotoCleanser-i2i, Polaroid-Warm-i2i, Yarn-Photo-i2i, and Monochrome-Pencil. These were trained under various settings with minimal image pairs to achieve optimal results. The dataset result sets end pairs were synthesized using Gemini-2.5-Flash-Image-Preview and others.🤗✨

prithivMLmods/PhotoCleanser-i2i: Remove objects while preserving the rest of the image.
prithivMLmods/Photo-Restore-i2i: Restore old photos into moderately colorized, detailed images.
prithivMLmods/Polaroid-Warm-i2i: Seamless vintage Polaroid-style images with warm, faded tones.
prithivMLmods/Yarn-Photo-i2i: Convert images into yarn-stitched artwork while retaining key details.
prithivMLmods/Monochrome-Pencil: Turn images into monochrome pencil sketches while keeping original features.

✨Note: All the above models share the same auto-labeling multimodal VLM captioning model, prithivMLmods/DeepCaption-VLA-7B, which is used for refining edit instructions and accurately understanding attributions for the generations.

✨Collection: prithivMLmods/i2i-kontext-exp-68ce573b5c0623476b636ec7

.
.
.
To know more about it, visit the app page or the respective model page!!

reacted to meg's post with ❤️ 4 months ago

Post

498

🤖 👾 Thanks so much to BBC News and the stellar Suranjana Tewari for having me on to talk about US <—> China relationship in AI, and what it means for AI ethics.

reacted to eliebak's post with 🔥 5 months ago

Post

4771

Kimi K2 tech report is full of gems as always. Here are my notes on it:

> MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with aggressive threshold). Also a cool explanation of why muon makes the logit explode in appendix E (tl;dr is that muon makes the singular value of the update matrix higher)
> Sparsity scaling laws to justify their ratio, they have a very solid training infra that allows the model to be trained at this sparsity level, they could have increased even more but as sparsity increases the training becomes less efficient.
> They diminish the number of attention heads to make it more efficient for long context since attention heads are a big bottleneck for long context. They also remove 2 of the 3 "first dense" layers in the dsv3 arch.

With the sparsity and attention heads (divided by 2) they achieve 83% increased flops compared to deepseek v3 arch at 128k.

> Data: Rephrasing is KEY. They do a lot more synthetic data generation and rephrase their corpus to have different styles, for longer documents they do it by chunk. I'm (half) surprised by the fact that ONLY 1 epoch (assuming same number of training tokens I think?) of data rephrased 10 times has better accuracy than 10 epochs of the same data rephrased once.
> They do rewriting for Math and Knowledge, for Math they apply the ShallowMath recipe and instruct the model to rephrase in a "learning note" style
> They talk about diversity and probably have some internal stuff/eval to test that, as always still a bit unclear for me how to properly measure that.

The infra is also very nice, quick summary:
> PP=16 (1F1B schedule, a bit custom), EP=16, zero1
> No FP8 computation but for storage of specific layers, selective recomputation for inexpensive block, activation offloading to CPU

reacted to danieldk's post with 🤗 5 months ago

Post

1919

We have been working on a project called kernels. kernels makes it possible to load compute kernels directly from the Hub! 🚀

We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:

- New layer API with torch.compile support.
- Experimental support for loading Apple Silicon Metal 🤘 Kernels.
- Generate wheels from Hub kernels for legacy deployments.

Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0

2 replies

·

reacted to danielhanchen's post with 🔥🚀 5 months ago

Post

3178

Gemma 3n finetuning is now 1.5x faster and uses 50% less VRAM in Unsloth!

Click "Use this model" and click "Google Colab"!

unsloth/gemma-3n-E4B-it

unsloth/gemma-3n-E2B-it

https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb

2 replies

·

reacted to fdaudens's post with 🔥 5 months ago

Post

3355

Three big AI copyright updates this week alone. Tracking it all is getting almost impossible!

That’s why @BrigitteTousi and I built this interactive tracker to keep you up to date fdaudens/ai-copyright-lawsuits

(Prototyped in minutes with DeepSite!)

reacted to clem's post with ❤️🤗 6 months ago

Post

3041

We got a visitor to the office today!

pollen-robotics ,

lerobot ,

unitreerobotics meetings!

1 reply

·

reacted to merve's post with ❤️ 6 months ago

Post

3662

IN: video fine-tuning support for

facebook V-JEPA 2 in HF transformers 🔥

it comes with
> four models fine-tuned on Diving48 and SSv2 dataset facebook/v-jepa-2-6841bad8413014e185b497a6
> FastRTC demo on V-JEPA2 SSv2 qubvel-hf/vjepa2-streaming-video-classification
> fine-tuning script on UCF-101 https://gist.github.com/ariG23498/28bccc737c11d1692f6d0ad2a0d7cddb
> fine-tuning notebook on UCF-101 https://colab.research.google.com/drive/16NWUReXTJBRhsN3umqznX4yoZt2I7VGc?usp=sharing
we're looking forward to see what you will build! 🤗

reacted to merve's post with 🔥 6 months ago

Post

3023

Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it 🥹
> KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B 🗣️
> Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive ⏯️ based on Qwen/Qwen2.5-Omni-7B

reacted to danieldk's post with 🔥 6 months ago

Post

1919

We have been working on a project called kernels. kernels makes it possible to load compute kernels directly from the Hub! 🚀

We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:

- New layer API with torch.compile support.
- Experimental support for loading Apple Silicon Metal 🤘 Kernels.
- Generate wheels from Hub kernels for legacy deployments.

Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0

2 replies

·

reacted to cbensimon's post with 👀 7 months ago

Post

6088

🚀 ZeroGPU medium size is now available as a power-user feature

Nothing too fancy for now—ZeroGPU Spaces still default to large (70GB VRAM)—but this paves the way for:
- 💰 size-based quotas / pricing (medium will offer significantly more usage than large)
- 🦣 the upcoming xlarge size (141GB VRAM)

You can as of now control GPU size via a Space variable. Accepted values:
- auto (future default)
- medium
- large (current default)

The auto mode checks total CUDA tensor size during startup:
- More than 30GB → large
- Otherwise → medium

3 replies

·

reacted to ordagan's post with ❤️ 7 months ago

Post

2206

Excited to introduce Jamba by AI21
ai21labs/Jamba-v0.1

We are thrilled to announce Jamba, the world’s first production-grade Mamba based model.

Key Features:
- First production-grade Mamba based model built on a novel SSM-Transformer hybrid architecture
- 3X throughput on long contexts compared to Mixtral 8x7B
- Democratizes access to a massive 256K context window
- The only model in its size class that fits up to 140K context on a single GPU

Jamba is based on a novel architecture that combines Mamba and Transformer. While our initial results show great efficiency gains, we expect this to be further explored and improved with the help of the community.

Check out our blog post for more info: https://ai21-labs.webflow.io/blog/announcing-jamba

2 replies

·

posted an update 7 months ago

Post

460

Hey, I'll be presenting @retrain-pipelines and almighty function-calling at the Hugging Face Paris HQ, you guys.
Monday evening. Lightning-talk style. With AI Tinkerers.

Come hang !

https://paris.aitinkerers.org/p/ai-tinkerers-paris-ai21-labs-takeover-on-may-19th

https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller

posted an update 7 months ago

Post

3155

The Almighty function-caller

How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?

We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.

Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller

reacted to danielhanchen's post with 🔥 8 months ago

Post

6257

🦥 Introducing Unsloth Dynamic v2.0 GGUFs!
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.

Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF

We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.

Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0

All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300K–1.5M token calibration dataset to improve conversational chat performance.

For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.

Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.

posted an update 8 months ago

Post

677

retrain-pipelines 0.1.2 finally dropped. It comes with a hot Hugging Face Hub integration. Go check it out. We have 2 articles about it coming up. One already fully written so, be on the lookout !
@retrain-pipelines

Also, I'll be volunteering at GOSIM AI Paris 2025. If you're interested in chatting, hmu.

Aurélien-Morgan CLAUDON

AI & ML interests

Recent Activity

Organizations

Aurélien-Morgan CLAUDON

AI & ML interests

Recent Activity

Organizations

Aurelien-Morgan's activity