Why is my model slow?
Default models run on shared CPU infrastructure (5–60s typical). Switch to a premium model for 1–5s response, or check /status/ for current load.
Last updated: 2026-05-04
Expected response times
| Model | Typical | Long output (2000 tokens) |
|---|---|---|
| qwen2.5-3b (Free fast) | 3–8s | 15–30s |
| qwen2.5-32b (Free quality) | 10–30s | 30–90s |
| GPT-4o-mini (Pro fast) | 1–4s | 5–15s |
| Claude Sonnet (Pro quality) | 2–8s | 10–30s |
| GPT-4o (Pro premium) | 3–10s | 15–45s |
| Claude Opus (Pro premium) | 5–15s | 30–90s |
| Cerebras Llama 70b (Pro fast-quality) | 0.5–2s | 1–5s ⚡ |
Why default Free models are slower
We self-host qwen2.5 on CPU-only Hetzner nodes (no GPU, ~30 tok/s). It's free for us, free for you. Trade-off: latency.
Speed-ups
- Switch to Cerebras (Pro) — fastest by 5–20× for text generation. Tools with the ⚡ icon support it.
- Reduce target output length — half the words = half the time.
- Stream output — UI shows tokens as they arrive (so perceived speed is faster). Enabled by default.
- Off-peak hours — UTC 02:00–08:00 has 30% less load.
Spike checks
If your normal model suddenly runs 3× slower:
- Check /status/ for incidents.
- Try a different model — if the alternative is fast, the original provider is degraded.
- Check your network —
curl -I https://aicentraltools.comshould be <500ms.
Per-tool overhead
Some tools chain multiple model calls internally:
- Blog Post Generator: 2 calls (outline + draft).
- Agent runs: 5–20 calls.
- Image tools: 1 model call + image generation provider (~10–60s).
The shown latency is the total. We surface step-by-step timing if you click "Show timing breakdown" on the result.
API users
Response time SLA at 95th percentile:
- Default models: <30s.
- Premium models: <10s.
- Cerebras: <3s.
We don't refund for >SLA cases (model latency isn't a service guarantee), but the /status/ page shows historical p50/p95.
Was this helpful?
Still stuck? Contact support