Multimodal System - a btjhjeon Collection

btjhjeon 's Collections

Multimodal Agent

Multimodal System

Multimodal Reasoning

Multimodal Analysis

Multimodal Alignment

PEFT

LLM

LLM context length

Multimodal Dataset

Multimodal Benchmarks

Multimodal System

updated 7 days ago

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published Mar 18, 2025 • 20
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Paper • 2510.06710 • Published Oct 8, 2025 • 42
VIDEOP2R: Video Understanding from Perception to Reasoning

Paper • 2511.11113 • Published Nov 14, 2025 • 111
Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models

Paper • 2512.04981 • Published Dec 4, 2025 • 8
Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning

Paper • 2512.06835 • Published Dec 7, 2025 • 4
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Paper • 2601.04720 • Published Jan 8 • 55
BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 196
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Paper • 2601.21468 • Published 16 days ago • 20