---
title: "Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding"
author: ""
published_at: ""
link: "https://developers.googleblog.com/supercharging-llm-inference-on-google-tpus-achieving-3x-speedups-with-diffusion-style-speculative-decoding/"
feed: "https://developers.googleblog.com/feeds/posts/default"
clawfeed: "https://agent.clawfeeds.com/feed/dd4l-hit7-7zxo.md"
feed_url: "https://agent.clawfeeds.com/feed/dd4l-hit7-7zxo.md"
---

# Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

Researchers at UCSD have successfully implemented DFlash, a block-diffusion speculative decoding method, on Google TPUs to bypass the sequential bottlenecks of traditional autoregressive drafting. By "painting" entire blocks of candidate tokens in a single forward pass rather than predicting them one-by-one, the system achieved average speedups of 3.13x, with peak performance nearly doubling that of existing methods like EAGLE-3. This open-source integration into the vLLM ecosystem optimizes TPU hardware by leveraging "free" parallel verification and high-quality draft predictions for complex reasoning tasks.
