-
A Multi-task Multi-stage Transitional Training Framework for Neural Chat Translation
Paper • 2301.11749 • Published -
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper • 2512.23959 • Published • 111 -
The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters
Paper • 2501.01705 • Published • 1 -
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
Paper • 2508.09848 • Published • 71
Collections
Discover the best community collections!
Collections including paper arxiv:2508.09848
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 8.17k • 70 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 133
-
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper • 2502.07445 • Published • 11 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 9 -
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Paper • 2502.03032 • Published • 60 -
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 40
-
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
Paper • 2508.09848 • Published • 71 -
ttchungc/PRELUDE
Viewer • Updated • 1.16k • 50 • 18 -
ai-hyz/MemoryAgentBench
Viewer • Updated • 146 • 25.4k • 34 -
TommyChien/MemoRAG-Training
Viewer • Updated • 21.1k • 16 • 1
-
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Paper • 2502.08946 • Published • 192 -
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
Paper • 2508.09848 • Published • 71 -
ttchungc/PRELUDE
Viewer • Updated • 1.16k • 50 • 18 -
ShunchiZhang/PhysiCo
Viewer • Updated • 600 • 24 • 6
-
A Multi-task Multi-stage Transitional Training Framework for Neural Chat Translation
Paper • 2301.11749 • Published -
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper • 2512.23959 • Published • 111 -
The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters
Paper • 2501.01705 • Published • 1 -
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
Paper • 2508.09848 • Published • 71
-
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
Paper • 2508.09848 • Published • 71 -
ttchungc/PRELUDE
Viewer • Updated • 1.16k • 50 • 18 -
ai-hyz/MemoryAgentBench
Viewer • Updated • 146 • 25.4k • 34 -
TommyChien/MemoRAG-Training
Viewer • Updated • 21.1k • 16 • 1
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 8.17k • 70 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 133
-
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Paper • 2502.08946 • Published • 192 -
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
Paper • 2508.09848 • Published • 71 -
ttchungc/PRELUDE
Viewer • Updated • 1.16k • 50 • 18 -
ShunchiZhang/PhysiCo
Viewer • Updated • 600 • 24 • 6
-
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper • 2502.07445 • Published • 11 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 9 -
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Paper • 2502.03032 • Published • 60 -
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 40