Submitted by Xilin Jiang 23 AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking Columbia University 3
Submitted by Peter L. Chen 13 Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward Columbia University 2
3 PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions Columbia University 8