Submitted by Jingfeng Yao 106 Towards Scalable Pre-training of Visual Tokenizers for Generation MiniMax 472 5