๐
Aurรฉlien-Morgan CLAUDON
Aurelien-Morgan
AI & ML interests
None yet
Recent Activity
liked
a model
9 days ago
Tongyi-MAI/Z-Image-Turbo
updated
a Space
9 days ago
retrain-pipelines/README
liked
a Space
12 days ago
dlouapre/eiffel-tower-llama
Organizations
replied to
sergiopaniego's
post
17 days ago
reacted to
sergiopaniego's
post with ๐ฅ
about 1 month ago
Post
5365
fine-tuning a 14B model with TRL + SFT on a free Colab (T4 GPU)?
thanks to the latest TRL optimizations, you actually can!
sharing a new notebook showing how to do it ๐
colab: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb
notebooks in TRL: https://github.com/huggingface/trl/tree/main/examples/notebooks
thanks to the latest TRL optimizations, you actually can!
sharing a new notebook showing how to do it ๐
colab: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb
notebooks in TRL: https://github.com/huggingface/trl/tree/main/examples/notebooks
reacted to
prithivMLmods's
post with ๐
2 months ago
Post
5239
Dropping some experimental adapters for FLUX.1-Kontext-dev, including Photo-Restore-i2i, PhotoCleanser-i2i, Polaroid-Warm-i2i, Yarn-Photo-i2i, and Monochrome-Pencil. These were trained under various settings with minimal image pairs to achieve optimal results. The dataset result sets end pairs were synthesized using Gemini-2.5-Flash-Image-Preview and others.๐คโจ
prithivMLmods/PhotoCleanser-i2i: Remove objects while preserving the rest of the image.
prithivMLmods/Photo-Restore-i2i: Restore old photos into moderately colorized, detailed images.
prithivMLmods/Polaroid-Warm-i2i: Seamless vintage Polaroid-style images with warm, faded tones.
prithivMLmods/Yarn-Photo-i2i: Convert images into yarn-stitched artwork while retaining key details.
prithivMLmods/Monochrome-Pencil: Turn images into monochrome pencil sketches while keeping original features.
โจNote: All the above models share the same auto-labeling multimodal VLM captioning model, prithivMLmods/DeepCaption-VLA-7B, which is used for refining edit instructions and accurately understanding attributions for the generations.
โจCollection: prithivMLmods/i2i-kontext-exp-68ce573b5c0623476b636ec7
.
.
.
To know more about it, visit the app page or the respective model page!!
prithivMLmods/PhotoCleanser-i2i: Remove objects while preserving the rest of the image.
prithivMLmods/Photo-Restore-i2i: Restore old photos into moderately colorized, detailed images.
prithivMLmods/Polaroid-Warm-i2i: Seamless vintage Polaroid-style images with warm, faded tones.
prithivMLmods/Yarn-Photo-i2i: Convert images into yarn-stitched artwork while retaining key details.
prithivMLmods/Monochrome-Pencil: Turn images into monochrome pencil sketches while keeping original features.
โจNote: All the above models share the same auto-labeling multimodal VLM captioning model, prithivMLmods/DeepCaption-VLA-7B, which is used for refining edit instructions and accurately understanding attributions for the generations.
โจCollection: prithivMLmods/i2i-kontext-exp-68ce573b5c0623476b636ec7
.
.
.
To know more about it, visit the app page or the respective model page!!
reacted to
meg's
post with โค๏ธ
4 months ago
Post
498
๐ค ๐พ Thanks so much to BBC News and the stellar Suranjana Tewari for having me on to talk about US <โ> China relationship in AI, and what it means for AI ethics.
reacted to
eliebak's
post with ๐ฅ
5 months ago
Post
4771
Kimi K2 tech report is full of gems as always. Here are my notes on it:
> MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with aggressive threshold). Also a cool explanation of why muon makes the logit explode in appendix E (tl;dr is that muon makes the singular value of the update matrix higher)
> Sparsity scaling laws to justify their ratio, they have a very solid training infra that allows the model to be trained at this sparsity level, they could have increased even more but as sparsity increases the training becomes less efficient.
> They diminish the number of attention heads to make it more efficient for long context since attention heads are a big bottleneck for long context. They also remove 2 of the 3 "first dense" layers in the dsv3 arch.
With the sparsity and attention heads (divided by 2) they achieve 83% increased flops compared to deepseek v3 arch at 128k.
> Data: Rephrasing is KEY. They do a lot more synthetic data generation and rephrase their corpus to have different styles, for longer documents they do it by chunk. I'm (half) surprised by the fact that ONLY 1 epoch (assuming same number of training tokens I think?) of data rephrased 10 times has better accuracy than 10 epochs of the same data rephrased once.
> They do rewriting for Math and Knowledge, for Math they apply the ShallowMath recipe and instruct the model to rephrase in a "learning note" style
> They talk about diversity and probably have some internal stuff/eval to test that, as always still a bit unclear for me how to properly measure that.
The infra is also very nice, quick summary:
> PP=16 (1F1B schedule, a bit custom), EP=16, zero1
> No FP8 computation but for storage of specific layers, selective recomputation for inexpensive block, activation offloading to CPU
> MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with aggressive threshold). Also a cool explanation of why muon makes the logit explode in appendix E (tl;dr is that muon makes the singular value of the update matrix higher)
> Sparsity scaling laws to justify their ratio, they have a very solid training infra that allows the model to be trained at this sparsity level, they could have increased even more but as sparsity increases the training becomes less efficient.
> They diminish the number of attention heads to make it more efficient for long context since attention heads are a big bottleneck for long context. They also remove 2 of the 3 "first dense" layers in the dsv3 arch.
With the sparsity and attention heads (divided by 2) they achieve 83% increased flops compared to deepseek v3 arch at 128k.
> Data: Rephrasing is KEY. They do a lot more synthetic data generation and rephrase their corpus to have different styles, for longer documents they do it by chunk. I'm (half) surprised by the fact that ONLY 1 epoch (assuming same number of training tokens I think?) of data rephrased 10 times has better accuracy than 10 epochs of the same data rephrased once.
> They do rewriting for Math and Knowledge, for Math they apply the ShallowMath recipe and instruct the model to rephrase in a "learning note" style
> They talk about diversity and probably have some internal stuff/eval to test that, as always still a bit unclear for me how to properly measure that.
The infra is also very nice, quick summary:
> PP=16 (1F1B schedule, a bit custom), EP=16, zero1
> No FP8 computation but for storage of specific layers, selective recomputation for inexpensive block, activation offloading to CPU
reacted to
danieldk's
post with ๐ค
5 months ago
Post
1919
We have been working on a project called
We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:
- New layer API with
- Experimental support for loading Apple Silicon Metal ๐ค Kernels.
- Generate wheels from Hub kernels for legacy deployments.
Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
kernels. kernels makes it possible to load compute kernels directly from the Hub! ๐We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:
- New layer API with
torch.compile support.- Experimental support for loading Apple Silicon Metal ๐ค Kernels.
- Generate wheels from Hub kernels for legacy deployments.
Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
reacted to
danielhanchen's
post with ๐ฅ๐
5 months ago
Post
3178
Gemma 3n finetuning is now 1.5x faster and uses 50% less VRAM in Unsloth!
Click "Use this model" and click "Google Colab"!
unsloth/gemma-3n-E4B-it
unsloth/gemma-3n-E2B-it
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb
Click "Use this model" and click "Google Colab"!
unsloth/gemma-3n-E4B-it
unsloth/gemma-3n-E2B-it
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb
reacted to
fdaudens's
post with ๐ฅ
5 months ago
Post
3355
Three big AI copyright updates this week alone. Tracking it all is getting almost impossible!
Thatโs why @BrigitteTousi and I built this interactive tracker to keep you up to date fdaudens/ai-copyright-lawsuits
(Prototyped in minutes with DeepSite!)
Thatโs why @BrigitteTousi and I built this interactive tracker to keep you up to date fdaudens/ai-copyright-lawsuits
(Prototyped in minutes with DeepSite!)
reacted to
merve's
post with โค๏ธ
6 months ago
Post
3662
IN: video fine-tuning support for
facebook
V-JEPA 2 in HF transformers ๐ฅ
it comes with
> four models fine-tuned on Diving48 and SSv2 dataset facebook/v-jepa-2-6841bad8413014e185b497a6
> FastRTC demo on V-JEPA2 SSv2 qubvel-hf/vjepa2-streaming-video-classification
> fine-tuning script on UCF-101 https://gist.github.com/ariG23498/28bccc737c11d1692f6d0ad2a0d7cddb
> fine-tuning notebook on UCF-101 https://colab.research.google.com/drive/16NWUReXTJBRhsN3umqznX4yoZt2I7VGc?usp=sharing
we're looking forward to see what you will build! ๐ค
it comes with
> four models fine-tuned on Diving48 and SSv2 dataset facebook/v-jepa-2-6841bad8413014e185b497a6
> FastRTC demo on V-JEPA2 SSv2 qubvel-hf/vjepa2-streaming-video-classification
> fine-tuning script on UCF-101 https://gist.github.com/ariG23498/28bccc737c11d1692f6d0ad2a0d7cddb
> fine-tuning notebook on UCF-101 https://colab.research.google.com/drive/16NWUReXTJBRhsN3umqznX4yoZt2I7VGc?usp=sharing
we're looking forward to see what you will build! ๐ค
reacted to
merve's
post with ๐ฅ
6 months ago
Post
3023
Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it ๐ฅน
> KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B ๐ฃ๏ธ
> Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive โฏ๏ธ based on Qwen/Qwen2.5-Omni-7B
> KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B ๐ฃ๏ธ
> Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive โฏ๏ธ based on Qwen/Qwen2.5-Omni-7B
reacted to
danieldk's
post with ๐ฅ
6 months ago
Post
1919
We have been working on a project called
We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:
- New layer API with
- Experimental support for loading Apple Silicon Metal ๐ค Kernels.
- Generate wheels from Hub kernels for legacy deployments.
Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
kernels. kernels makes it possible to load compute kernels directly from the Hub! ๐We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:
- New layer API with
torch.compile support.- Experimental support for loading Apple Silicon Metal ๐ค Kernels.
- Generate wheels from Hub kernels for legacy deployments.
Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
reacted to
cbensimon's
post with ๐
7 months ago
Post
6088
๐ ZeroGPU
Nothing too fancy for nowโZeroGPU Spaces still default to
- ๐ฐ size-based quotas / pricing (
- ๐ฆฃ the upcoming
You can as of now control GPU size via a Space variable. Accepted values:
-
-
-
The auto mode checks total CUDA tensor size during startup:
- More than 30GB โ
- Otherwise โ
medium size is now available as a power-user featureNothing too fancy for nowโZeroGPU Spaces still default to
large (70GB VRAM)โbut this paves the way for:- ๐ฐ size-based quotas / pricing (
medium will offer significantly more usage than large)- ๐ฆฃ the upcoming
xlarge size (141GB VRAM)You can as of now control GPU size via a Space variable. Accepted values:
-
auto (future default)-
medium-
large (current default)The auto mode checks total CUDA tensor size during startup:
- More than 30GB โ
large- Otherwise โ
medium
reacted to
ordagan's
post with โค๏ธ
7 months ago
Post
2206
Excited to introduce Jamba by AI21
ai21labs/Jamba-v0.1
We are thrilled to announce Jamba, the worldโs first production-grade Mamba based model.
Key Features:
- First production-grade Mamba based model built on a novel SSM-Transformer hybrid architecture
- 3X throughput on long contexts compared to Mixtral 8x7B
- Democratizes access to a massive 256K context window
- The only model in its size class that fits up to 140K context on a single GPU
Jamba is based on a novel architecture that combines Mamba and Transformer. While our initial results show great efficiency gains, we expect this to be further explored and improved with the help of the community.
Check out our blog post for more info: https://ai21-labs.webflow.io/blog/announcing-jamba
ai21labs/Jamba-v0.1
We are thrilled to announce Jamba, the worldโs first production-grade Mamba based model.
Key Features:
- First production-grade Mamba based model built on a novel SSM-Transformer hybrid architecture
- 3X throughput on long contexts compared to Mixtral 8x7B
- Democratizes access to a massive 256K context window
- The only model in its size class that fits up to 140K context on a single GPU
Jamba is based on a novel architecture that combines Mamba and Transformer. While our initial results show great efficiency gains, we expect this to be further explored and improved with the help of the community.
Check out our blog post for more info: https://ai21-labs.webflow.io/blog/announcing-jamba
posted
an
update
7 months ago
Post
460
Hey, I'll be presenting
@retrain-pipelines
and almighty function-calling at the Hugging Face Paris HQ, you guys.
Monday evening. Lightning-talk style. With AI Tinkerers.
Come hang !
https://paris.aitinkerers.org/p/ai-tinkerers-paris-ai21-labs-takeover-on-may-19th
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller
Monday evening. Lightning-talk style. With AI Tinkerers.
Come hang !
https://paris.aitinkerers.org/p/ai-tinkerers-paris-ai21-labs-takeover-on-may-19th
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller
posted
an
update
7 months ago
Post
3155
The Almighty function-caller
How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?
We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.
Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller
How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?
We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.
Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller
reacted to
danielhanchen's
post with ๐ฅ
8 months ago
Post
6257
๐ฆฅ Introducing Unsloth Dynamic v2.0 GGUFs!
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.
Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF
We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.
Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0
All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300Kโ1.5M token calibration dataset to improve conversational chat performance.
For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.
Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.
Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF
We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.
Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0
All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300Kโ1.5M token calibration dataset to improve conversational chat performance.
For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.
Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.
posted
an
update
8 months ago
Post
677
retrain-pipelines 0.1.2 finally dropped. It comes with a hot Hugging Face Hub integration. Go check it out. We have 2 articles about it coming up. One already fully written so, be on the lookout !@retrain-pipelines
Also, I'll be volunteering at GOSIM AI Paris 2025. If you're interested in chatting, hmu.