SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published Jul 18, 2024 • 16
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 5 days ago • 223
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents Paper • 2404.10774 • Published Apr 16, 2024 • 6
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization Paper • 2402.13249 • Published Feb 20, 2024 • 14