Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps Paper โข 2605.16928 โข Published 17 days ago โข 93