Michal Valko
misovalko
AI & ML interests
large language models, reasoning, fine-tuning, test-time computation, reinforcement learning with human feedback, world models
Recent Activity
upvoted
a
paper
1 day ago
A General Theoretical Paradigm to Understand Learning from Human
Preferences
authored
a paper
1 day ago
Optimal Design for Reward Modeling in RLHF
authored
a paper
1 day ago
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms