arxiv:2406.12753
Yikai Zhang
Arist12
AI & ML interests
Natural Language Processing
Recent Activity
upvoted
a
paper
about 1 month ago
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic,
and Long-Horizon Task Execution
updated
a dataset
about 2 months ago
Arist12/realswebench
published
a dataset
about 2 months ago
Arist12/realswebench