Okay maybe I'm a little obsessed with LR schedulers ATM. I ran a SST-2 Sentiment Classification eval using the nyu-mll/glue dataset on distilbert/distilbert-base-uncased-67M to see how different schedulers perform.
I think I've graduated from ML enthusiast to full blown data hoarder and I don't know if I can turn back now.
Anyways I evaluated the 2 schedulers that i designed as well and was pretty happy with the performance of both over all so hell ya to that guess I'll go and grab some more graphs.