Peng Wang's picture

Peng Wang

stillarrow

·

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

upvoted an article 5 days ago

Illustrating Reinforcement Learning from Human Feedback (RLHF)

liked a dataset 5 days ago

zwhe99/DeepMath-103K

liked a model 11 days ago

deepseek-ai/DeepSeek-Math-V2

View all activity

Organizations

None yet

upvoted an article 5 days ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

+2

Dec 9, 2022

•

376

upvoted a paper 19 days ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published 29 days ago • 128

upvoted 2 papers about 2 months ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30 • 55

upvoted a collection 2 months ago

Qwen3-VL

37 items • Updated Nov 1 • 491

upvoted a paper 2 months ago

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published Sep 24 • 119

upvoted a paper 3 months ago

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 49

upvoted 2 collections 5 months ago

FastCuRL

The collection for the Paper "Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models" • 6 items • Updated May 29 • 2

"Physics of Language Models" series

6 items • Updated Aug 30, 2024 • 50

upvoted a paper 5 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 263

upvoted a collection 5 months ago

Tool-Star

Tool-Star is a reinforcement learning-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasonin • 8 items • Updated Sep 2 • 5

upvoted a paper 6 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 57

upvoted a paper 7 months ago

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1 • 54

upvoted a paper 8 months ago

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15 • 19

upvoted a collection 8 months ago

Gemma 3 Release

28 items • Updated Aug 11 • 548

upvoted an article 8 months ago

Article

The Large Language Model Course

Jan 16

•

212

upvoted 2 articles 9 months ago

Article

Mastering Tensor Dimensions in Transformers

Jan 12

•

120

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

Jan 30

•

189

upvoted an article 10 months ago

Article

Open R1: Update #2

Feb 10

•

218

upvoted a collection 10 months ago

OpenMath

A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated 4 days ago • 45