Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment Paper • 2505.11821 • Published May 17, 2025 • 14
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 1
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 1
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 5
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 5
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 3
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 9
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 3
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 9