Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish Paper • 2508.16431 • Published Aug 22, 2025 • 1
Lost in Embeddings: Information Loss in Vision-Language Models Paper • 2509.11986 • Published Sep 15, 2025 • 28
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation Paper • 2506.01565 • Published Jun 2, 2025 • 3
Overcoming Vocabulary Constraints with Pixel-level Fallback Paper • 2504.02122 • Published Apr 2, 2025 • 2
Can Community Notes Replace Professional Fact-Checkers? Paper • 2502.14132 • Published Feb 19, 2025 • 6
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 133
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture Paper • 2406.11030 • Published Jun 16, 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Paper • 2406.02265 • Published Jun 4, 2024 • 7
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Paper • 2404.16820 • Published Apr 25, 2024 • 17
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models Paper • 2311.07022 • Published Nov 13, 2023 • 1