Yulai Zhao's picture

4 39

Yulai Zhao

sarosavo

·

http://yulaizhao.com

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

upvoted a paper 5 days ago

Meta-RL Induces Exploration in Language Agents

upvoted a paper 5 days ago

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

View all activity

Organizations

upvoted 9 papers 5 days ago

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Paper • 2512.19682 • Published 6 days ago • 14

Meta-RL Induces Exploration in Language Agents

Paper • 2512.16848 • Published 10 days ago • 10

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Paper • 2512.17008 • Published 10 days ago • 10

Are We on the Right Way to Assessing LLM-as-a-Judge?

Paper • 2512.16041 • Published 11 days ago • 32

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Paper • 2512.17260 • Published 9 days ago • 48

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Paper • 2512.16969 • Published 10 days ago • 105

Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image

Paper • 2512.16899 • Published 10 days ago • 12

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Paper • 2512.15745 • Published 18 days ago • 77

Next-Embedding Prediction Makes Strong Vision Learners

Paper • 2512.16922 • Published 10 days ago • 79

upvoted 11 papers 6 days ago

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Paper • 2512.15687 • Published 11 days ago • 17

Rethinking Expert Trajectory Utilization in LLM Post-training

Paper • 2512.11470 • Published 16 days ago • 7

Towards Interactive Intelligence for Digital Humans

Paper • 2512.13674 • Published 13 days ago • 11

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

Paper • 2512.09636 • Published 18 days ago • 25

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Paper • 2512.12967 • Published 13 days ago • 101

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Paper • 2512.11749 • Published 16 days ago • 36

EgoX: Egocentric Video Generation from a Single Exocentric Video

Paper • 2512.08269 • Published 19 days ago • 112

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Paper • 2512.10791 • Published 17 days ago • 7

Evaluating Gemini Robotics Policies in a Veo World Simulator

Paper • 2512.10675 • Published 17 days ago • 16

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Paper • 2512.10756 • Published 17 days ago • 33

BEAVER: An Efficient Deterministic LLM Verifier

Paper • 2512.05439 • Published 23 days ago • 34