FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging Paper • 2602.08024 • Published Feb 8 • 1