Tommaso Cerruti's picture

Tommaso Cerruti

Cerru02

·

https://tommasocerruti.github.io/

AI & ML interests

AI safety and evaluation

Recent Activity

new activity about 16 hours ago

evaleval/EEE_datastore:Fix LLM Stats provenance relationships

upvoted an article 6 days ago

Safety Evals Should Project Test-Time Compute

published an article 6 days ago

Safety Evals Should Project Test-Time Compute

View all activity

Organizations

authored a paper 17 days ago

CocoaBench: Evaluating Unified Digital Agents in the Wild

Paper • 2604.11201 • Published Apr 13 • 36