📈 Performance Dashboard

Overview

To better visualize the performance of the SpecBundle draft models, we have built a dashboard to offer interactive experiences to users to explore the evaluation results. We evaluate the performance of SpecBundle draft models under different speculative decoding configurations (i.e. steps, topk, num_draft_tokens) on various benchmarks, the benchmarks include:

Conversation
- MTBench
General Knowledge
- GPQA
- FinanceQA
Math
- GSM8K
- Math500
Coding
- HumanEval
- LiveCodeBench

Check out the Performance Dashboard for more details.