Haoyu Guo

ghy0324

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 hours ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

upvoted a paper 3 days ago

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

upvoted a paper 4 days ago

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

View all activity

Organizations

upvoted a paper about 2 hours ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published 3 days ago • 21

upvoted a paper 3 days ago

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Paper • 2512.02589 • Published 6 days ago • 43

upvoted 3 papers 4 days ago

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

Paper • 2512.03746 • Published 5 days ago • 15

OneThinker: All-in-one Reasoning Model for Image and Video

Paper • 2512.03043 • Published 6 days ago • 28

Qwen3-VL Technical Report

Paper • 2511.21631 • Published 12 days ago • 110

upvoted a paper 5 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published 6 days ago • 175

upvoted 2 papers 14 days ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published 18 days ago • 91

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 18 days ago • 107

upvoted a paper 17 days ago

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published 18 days ago • 106

upvoted a paper 21 days ago

Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published 25 days ago • 93

upvoted a paper 25 days ago

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published 26 days ago • 194

upvoted 4 papers 28 days ago

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6 • 36

V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6 • 96

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6 • 208

DeepEyesV2: Toward Agentic Multimodal Model

Paper • 2511.05271 • Published about 1 month ago • 42

upvoted 4 papers about 1 month ago

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published Oct 30 • 81

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Paper • 2510.26802 • Published Oct 30 • 33

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30 • 106

Tongyi DeepResearch Technical Report

Paper • 2510.24701 • Published Oct 28 • 97

liked a model about 1 month ago

deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • 3B • Updated Nov 4 • 5.45M • 2.94k

Haoyu Guo

AI & ML interests

Recent Activity

Organizations

ghy0324's activity