Speed Testing Methodology

How TOKRACE defines and measures LLM speed, and why the method is fair and reproducible.

Why concurrent testing

All enabled models receive the same prompt and are started at the same time, instead of one after another. This keeps network conditions, time of day and prompt content aligned.

Time to first token (TTFT)

TTFT is the milliseconds from sending the request to receiving the first token, including thinking or final content. Timing uses servertimestamps for each delta, not browser-local rendering time.

Thinking and output speed

For models with thinking output, thinking and final content are timed separately using each stage's active first-to-last-delta window. Gaps between stages and post-output usage waits are excluded.

Token counts prefer the provider's official usage values. When missing, TOKRACE uses a consistent character estimate and calibrates to official totals when available.

Peak speed

Peak speed is the maximum instant tokens/s over a 2-second sliding window, calibrated to official token totals when available.

Data source and privacy

Leaderboard data comes from voluntary anonymous sharing after arena runs. Only speed metrics are reported, never prompts or API keys. Different endpoints for the same model are tracked separately.

Limits and fairness

Speed varies with network conditions, time of day and provider load. Leaderboards use medians to reduce one-off jitter, but results are still reference data, not absolute truth. The project is open source and reproducible.

View speed leaderboardRun a live test