Xiusi Chen's picture

15 8

Xiusi Chen

XtremSup

·

https://xiusic.github.io/

AI & ML interests

None yet

Organizations

upvoted a paper 3 months ago

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Paper • 2510.00526 • Published Oct 1, 2025 • 8

upvoted 2 papers 5 months ago

UserBench: An Interactive Gym Environment for User-Centric Agents

Paper • 2507.22034 • Published Jul 29, 2025 • 30

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Paper • 2502.16143 • Published Feb 22, 2025 • 6

upvoted a paper 6 months ago

Perception-Aware Policy Optimization for Multimodal Reasoning

Paper • 2507.06448 • Published Jul 8, 2025 • 47

upvoted 2 papers 7 months ago

Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6, 2025 • 73

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Paper • 2505.24846 • Published May 30, 2025 • 15

upvoted 2 papers 8 months ago

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

Paper • 2505.22961 • Published May 29, 2025 • 8

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

Paper • 2505.13508 • Published May 16, 2025 • 15

upvoted a collection 8 months ago

RM-R1

RM-R1: Reward Modeling as Reasoning • 16 items • Updated Jun 29, 2025 • 9

upvoted 2 papers 8 months ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5, 2025 • 25

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5, 2025 • 79

upvoted 2 papers 9 months ago

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21, 2025 • 35

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16, 2025 • 48

upvoted an article 10 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Feb 11, 2025

•

99

upvoted a paper about 1 year ago

SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation

Paper • 2410.14745 • Published Oct 17, 2024 • 47