Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.09848

A Multi-task Multi-stage Transitional Training Framework for Neural Chat Translation

Paper • 2301.11749 • Published Jan 27, 2023
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Paper • 2512.23959 • Published Dec 30, 2025 • 111
The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Paper • 2501.01705 • Published Jan 3, 2025 • 1
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Paper • 2508.09848 • Published Aug 13, 2025 • 71

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29, 2025 • 8.17k • 70
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 282
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19, 2025 • 133

Model Evaluation

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon

Paper • 2502.07445 • Published Feb 11, 2025 • 11
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Paper • 2502.04689 • Published Feb 7, 2025 • 9
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Paper • 2502.03032 • Published Feb 5, 2025 • 60
Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3, 2025 • 40

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Paper • 2508.09848 • Published Aug 13, 2025 • 71
ttchungc/PRELUDE

Viewer • Updated Oct 3, 2025 • 1.16k • 50 • 18
ai-hyz/MemoryAgentBench

Viewer • Updated Oct 7, 2025 • 146 • 25.4k • 34
TommyChien/MemoRAG-Training

Viewer • Updated Apr 23, 2025 • 21.1k • 16 • 1

AGI_assessments

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published Feb 13, 2025 • 192
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Paper • 2508.09848 • Published Aug 13, 2025 • 71
ttchungc/PRELUDE

Viewer • Updated Oct 3, 2025 • 1.16k • 50 • 18
ShunchiZhang/PhysiCo

Viewer • Updated Feb 16, 2025 • 600 • 24 • 6

A Multi-task Multi-stage Transitional Training Framework for Neural Chat Translation

Paper • 2301.11749 • Published Jan 27, 2023
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Paper • 2512.23959 • Published Dec 30, 2025 • 111
The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Paper • 2501.01705 • Published Jan 3, 2025 • 1
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Paper • 2508.09848 • Published Aug 13, 2025 • 71

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Paper • 2508.09848 • Published Aug 13, 2025 • 71
ttchungc/PRELUDE

Viewer • Updated Oct 3, 2025 • 1.16k • 50 • 18
ai-hyz/MemoryAgentBench

Viewer • Updated Oct 7, 2025 • 146 • 25.4k • 34
TommyChien/MemoRAG-Training

Viewer • Updated Apr 23, 2025 • 21.1k • 16 • 1

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29, 2025 • 8.17k • 70
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 282
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19, 2025 • 133

AGI_assessments

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published Feb 13, 2025 • 192
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Paper • 2508.09848 • Published Aug 13, 2025 • 71
ttchungc/PRELUDE

Viewer • Updated Oct 3, 2025 • 1.16k • 50 • 18
ShunchiZhang/PhysiCo

Viewer • Updated Feb 16, 2025 • 600 • 24 • 6

Model Evaluation

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon

Paper • 2502.07445 • Published Feb 11, 2025 • 11
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Paper • 2502.04689 • Published Feb 7, 2025 • 9
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Paper • 2502.03032 • Published Feb 5, 2025 • 60
Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3, 2025 • 40

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs