A fine-grained visual reasoning benchmark (We show more question types in the extension dataset.)
Sicheng Feng
FSCCS
AI & ML interests
None yet
Recent Activity
upvoted a paper about 12 hours ago
Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers commentedon a paper 3 days ago
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual
Reasoning from Transit Maps