Speculative decoding

Speculative decoding is a sampling method for accelerating text generation. It accelerates the process by employing a smaller language model to produce candidate text samples. These candidates are evaluated by a larger model, and only approved text is accepted.

Image credits: R. Zhu, TitanML

Speculative decoding is typically used to:

     1) Enhance diversity in output.

     2) Reduce repetition.

     3) Improve quality and contextuality.

     4) Explore various ideas.

     5) Adapt to different interpretations.

     6) Mitigate bias.

     7) Enhance user experience.

     8) Choose the best response.