SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models Paper ⢠2511.05459 ⢠Published Nov 7, 2025 ⢠5
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper ⢠2606.07297 ⢠Published 6 days ago ⢠105
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Paper ⢠2604.18224 ⢠Published Apr 20 ⢠22
view post Post 2705 KAT-V1 đĽ a LLM that tackles overthinking by switching between reasoning and direct answers, by Kuaishou. Kwaipilot/KAT-V1-40B⨠40B⨠Step-SRPO: smarter reasoning control via RL⨠MTP + Distillation: efficient training, lower cost See translation đ 8 8 + Reply
Kwaipilot/OASIS-code-embedding-1.5B Sentence Similarity ⢠2B ⢠Updated Mar 20, 2025 ⢠32 ⢠10
Kwaipilot/OASIS-code-embedding-1.5B Sentence Similarity ⢠2B ⢠Updated Mar 20, 2025 ⢠32 ⢠10