Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,25 @@ tags:
|
|
| 13 |
|
| 14 |
<div style="display:flex;justify-content: center">
|
| 15 |
<a href="https://arxiv.org/pdf/2412.07767"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:Lumos&color=red&logo=arxiv"></a>  
|
| 16 |
-
<a href="https://xiaomabufei.github.io/lumos/"><img src="https://img.shields.io/
|
| 17 |
-
<a href="https://github.com/xiaomabufei/lumos"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a>  
|
| 18 |
</div>
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
<div style="display:flex;justify-content: center">
|
| 15 |
<a href="https://arxiv.org/pdf/2412.07767"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:Lumos&color=red&logo=arxiv"></a>  
|
| 16 |
+
<a href="https://xiaomabufei.github.io/lumos/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a>  
|
|
|
|
| 17 |
</div>
|
| 18 |
|
| 19 |
+
# 🥳 What is Lumos ?
|
| 20 |
+
<b>TL; DR: <font color="purple">Lumos</font> is an infinitely scalable, unsupervised visual pixel generation framework that can be efficiently and swiftly adapted to downstream tasks such as text-to-image, image-to-3D, and image-to-video generation..</b>
|
| 21 |
+
<details><summary>CLICK for the full abstract</summary>
|
| 22 |
+
Although text-to-image (T2I) models have recently thrived as visual generative priors, their reliance on high-quality text-image pairs makes scaling up expensive.
|
| 23 |
+
We argue that grasping the cross-modality alignment is not a necessity for a sound visual generative prior, whose focus should be on texture modeling.
|
| 24 |
+
Such a philosophy inspires us to study image-to-image (I2I) generation, where models can learn from in-the-wild images in a self-supervised manner.
|
| 25 |
+
We first develop a pure vision-based training framework, Lumos, and confirm the feasibility and the scalability of learning I2I models.
|
| 26 |
+
We then find that, as an upstream task of T2I, our I2I model serves as a more foundational visual prior and achieves on-par or better performance than existing T2I models using only 1/10 text-image pairs for fine-tuning.
|
| 27 |
+
We further demonstrate the superiority of I2I priors over T2I priors on some text-irrelevant visual generative tasks, like image-to-3D and image-to-video.
|
| 28 |
+
</details>
|
| 29 |
+
|
| 30 |
+
# 🪄✨ Lumos Model Card
|
| 31 |
+

|
| 32 |
+
|
| 33 |
+
## 🚀 Model Structure
|
| 34 |
+

|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
|