TextQuests: How Good are LLMs at Text-Based Video Games? Paper • 2507.23701 • Published Jul 31, 2025 • 2
VeriCoder: Enhancing LLM-Based RTL Code Generation through Functional Correctness Validation Paper • 2504.15659 • Published Apr 22, 2025
SweRank: Software Issue Localization with Code Ranking Paper • 2505.07849 • Published May 7, 2025 • 10
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning Paper • 2505.11480 • Published May 16, 2025 • 8
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas Paper • 2505.14615 • Published May 20, 2025 • 1
CoRNStack: High-Quality Contrastive Data for Better Code Ranking Paper • 2412.01007 • Published Dec 1, 2024 • 1
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis Paper • 2503.23145 • Published Mar 29, 2025 • 35
Tamper-Resistant Safeguards for Open-Weight LLMs Collection Models & datasets from the paper "Tamper-Resistant Safeguards for Open-Weight LLMs" (https://arxiv.org/pdf/2408.00761) • 9 items • Updated Feb 15, 2025 • 3
Tamper-Resistant Safeguards for Open-Weight LLMs Collection Models & datasets from the paper "Tamper-Resistant Safeguards for Open-Weight LLMs" (https://arxiv.org/pdf/2408.00761) • 9 items • Updated Feb 15, 2025 • 3