mair-lab
/

thinking-sft-simple

Model card Files Files and versions

rabiulawal commited on Aug 8

Commit

fe15986

·

verified ·

1 Parent(s): dc0b1a4

Create README.md

Files changed (1) hide show

README.md +33 -0

README.md CHANGED Viewed

	@@ -0,0 +1,33 @@

+---
+language:
+- en
+base_model:
+- BAAI/Emu3-Stage1
+---
+# EARL - SFT think (S) (8B)
+**Model Size:** 8B parameters
+**Base Model:** [BAAI/Emu3-Stage1](https://huggingface.co/BAAI/Emu3-Stage1)
+**Dataset:** Simple Edit
+**Training Objective:** Supervised Fine-Tuning (SFT) with Chain-of-Thought reasoning
+This model is introduced in our paper: [EARL: The Promise of RL for Autoregressive Image Editing](https://arxiv.org/abs/2508.01119).
+## Overview
+EARL - SFT think (S) is a fine-tuned 8B vision-language model designed for autoregressive image editing. It extends the base Emu3 model with **chain-of-thought supervision**, enabling step-by-step reasoning to tackle complex editing tasks. Training leverages the Simple Edit dataset, focusing on editable instructions grounded in visual understanding.
+🔗 **Inference script and usage:** [GitHub Repository](https://github.com/saba96/EARL?tab=readme-ov-file)
+## Benchmark Results
+| Model             | OmniEdit | EmuEdit | AURORA | MB   | VisMin | I2EBench | **AVG** |
+|------------------|----------|---------|--------|------|--------|----------|---------|
+| **SFT (S)**       | 5.73     | 3.66    | 3.58   | 3.19 | 3.57   | 3.59     | **3.88** |
+| **SFT think (S)** | 4.34     | 3.76    | 2.88   | 3.36 | 3.46   | 3.21     | **3.50** |
+> ⚠️ Despite integrating reasoning capabilities, the **SFT think** variant underperforms slightly compared to the standard **SFT** model in average benchmark scores.
+## Intended Use
+This model is suited for research and development in image editing tasks that benefit from interpretable reasoning, such as instructional or multi-step visual modifications.