LiuZhiHao's picture

14 8

LiuZhiHao

ZhiHao9806

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

Region-Constraint In-Context Generation for Instructional Video Editing

upvoted a paper 25 days ago

DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

upvoted a paper 7 months ago

Sherlock: Self-Correcting Reasoning in Vision-Language Models

View all activity

Organizations

None yet

upvoted a paper 5 days ago

Region-Constraint In-Context Generation for Instructional Video Editing

Paper • 2512.17650 • Published 9 days ago • 48

upvoted a paper 25 days ago

DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

Paper • 2511.23127 • Published 30 days ago • 43

upvoted 2 papers 7 months ago

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Paper • 2505.22651 • Published May 28 • 49

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Paper • 2505.18445 • Published May 24 • 63

upvoted 10 papers 10 months ago

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Paper • 2402.17723 • Published Feb 27, 2024 • 16

Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27, 2024 • 18

Towards Optimal Learning of Language Models

Paper • 2402.17759 • Published Feb 27, 2024 • 18

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 21

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

Paper • 2402.17553 • Published Feb 27, 2024 • 25

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 88

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26, 2024 • 46

FuseChat: Knowledge Fusion of Chat Models

Paper • 2402.16107 • Published Feb 25, 2024 • 40

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26, 2024 • 26

liked 6 models 10 months ago

Qwen/Qwen2.5-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Apr 6 • 2.54M • • 1.41k

perplexity-ai/r1-1776-distill-llama-70b

Text Generation • 71B • Updated Feb 26 • 328 • 131

Skywork/SkyReels-V1-Hunyuan-I2V

Image-to-Video • Updated Feb 24 • 276 • • 274

Menlo/AlphaMaze-v0.2-1.5B

Text Generation • 2B • Updated Feb 24 • 72 • 93

qihoo360/TinyR1-32B-Preview

Text Generation • 33B • Updated Sep 24 • 114 • • 329

perplexity-ai/r1-1776

Text Generation • 671B • Updated Feb 26 • 839 • 2.33k