AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser Paper • 2511.16397 • Published 16 days ago • 7
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 74
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation Paper • 2502.06563 • Published Feb 10
Unsupervised Topic Models are Data Mixers for Pre-training Language Models Paper • 2502.16802 • Published Feb 24
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models Paper • 2504.14194 • Published Apr 19