zhangshuo's picture

4 2

zhangshuo

mcflurryshuoz

·

zsxzs

AI & ML interests

None yet

Recent Activity

authored a paper 12 days ago

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging

authored a paper 12 days ago

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

authored a paper 12 days ago

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

View all activity

Organizations

authored 6 papers 12 days ago

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging

Paper • 2508.18993 • Published Aug 26, 2025 • 4

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Paper • 2505.21577 • Published May 27, 2025 • 3

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Paper • 2601.04745 • Published 20 days ago • 56

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published 16 days ago • 207

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Paper • 2601.07853 • Published 19 days ago • 9

Controlled Self-Evolution for Algorithmic Code Optimization

Paper • 2601.07348 • Published 16 days ago • 112

upvoted a paper 13 days ago

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Paper • 2601.09465 • Published 13 days ago • 40

authored a paper 13 days ago

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Paper • 2601.09465 • Published 13 days ago • 40

upvoted 2 papers 13 days ago

Controlled Self-Evolution for Algorithmic Code Optimization

Paper • 2601.07348 • Published 16 days ago • 112

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Paper • 2601.06789 • Published 17 days ago • 77

upvoted a paper 15 days ago

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published 16 days ago • 207

updated 3 datasets about 1 month ago

mcflurryshuoz/swebench_verified_images_tars

Updated Dec 24, 2025 • 1

mcflurryshuoz/swebench_verified_images_tars

Updated Dec 24, 2025 • 1

mcflurryshuoz/swebench_verified_images_tars

Updated Dec 24, 2025 • 1