Issue 18 min readApr 26, 2026

FastAPI Async APIs: What Actually Gets Faster?

A practical guide to async FastAPI routes, blocking calls, thread pools, and when async actually helps.

Comparison of awaitable FastAPI requests and blocking calls inside an async route
Figure 1. Async behavior comes from real await points. Blocking code inside an async route keeps the event loop from serving other work.

The Short Version

async def does not make a FastAPI route fast by itself. Async helps when your route waits on awaitable I/O: a database call, an HTTP request, a cache lookup, a stream, or a WebSocket message. If the route blocks the event loop with sync code, it can become slower under concurrency than a plain sync route.

The useful question is not "Should every FastAPI endpoint be async?" The useful question is:

When this request gets slow, does it give the event loop a chance to run something else?

That question is the difference between async syntax and async behavior.

The 50 ms Test

Here are four FastAPI endpoints that all appear to wait for the same amount of time:

import asyncio
import time

from fastapi import FastAPI
from fastapi.concurrency import run_in_threadpool

app = FastAPI()


@app.get("/async-sleep")
async def async_sleep():
    await asyncio.sleep(0.05)
    return {"ok": True}


@app.get("/blocking-in-async")
async def blocking_in_async():
    time.sleep(0.05)
    return {"ok": True}


@app.get("/sync-blocking")
def sync_blocking():
    time.sleep(0.05)
    return {"ok": True}


@app.get("/offloaded-blocking")
async def offloaded_blocking():
    await run_in_threadpool(time.sleep, 0.05)
    return {"ok": True}

On one local in-process simulation, twenty concurrent requests produced this:

EndpointConcurrencyDelayWall timeWhat happened
/async-sleep2050 ms0.0551 sRequests waited together.
/blocking-in-async2050 ms1.1094 sRequests effectively waited one after another.
/sync-blocking2050 ms0.0714 sFastAPI moved sync work to the thread pool.
/offloaded-blocking2050 ms0.0622 sThe blocking call was explicitly offloaded.

Do not treat these as production benchmark numbers. They were run with HTTPX ASGITransport inside one process. Treat them as a behavior test.

The behavior is the lesson: await asyncio.sleep() gives control back to the event loop. time.sleep() blocks the thread. If that blocking call runs inside async def, the event loop cannot move on to other requests.

Async Is A Scheduling Model, Not A Speed Button

An async FastAPI route runs on the event loop. The event loop can handle many waiting tasks because a task can pause at await and let another task run.

That works well for I/O-heavy APIs:

  • waiting on Postgres
  • calling another HTTP API
  • reading from Redis
  • waiting for a message broker
  • streaming tokens to a browser
  • holding WebSocket connections open

It does not make CPU-heavy work faster. It also does not make blocking libraries non-blocking.

This is the mental model:

  1. A request enters your route.
  2. The route runs until it reaches an await.
  3. While that operation waits, the event loop can run another request.
  4. When the awaited operation is ready, the route continues.

If step 2 never reaches a real await point, the event loop is stuck.

async def vs def In FastAPI

FastAPI supports both route styles because both are useful.

Use async def when your route calls awaitable libraries:

import httpx
from fastapi import FastAPI

app = FastAPI()


@app.get("/profile/{user_id}")
async def profile(user_id: int):
    async with httpx.AsyncClient(timeout=5.0) as client:
        response = await client.get(f"https://api.example.com/users/{user_id}")
        response.raise_for_status()
        return response.json()

Use normal def when the route must call blocking libraries and you do not have a good async option:

import requests
from fastapi import FastAPI

app = FastAPI()


@app.get("/legacy-profile/{user_id}")
def legacy_profile(user_id: int):
    response = requests.get(f"https://api.example.com/users/{user_id}", timeout=5)
    response.raise_for_status()
    return response.json()

That second example is not glamorous, but it is honest. FastAPI and Starlette run normal sync routes in a thread pool so the blocking call does not freeze the event loop.

The wrong version is this:

import requests
from fastapi import FastAPI

app = FastAPI()


@app.get("/bad-profile/{user_id}")
async def bad_profile(user_id: int):
    response = requests.get(f"https://api.example.com/users/{user_id}", timeout=5)
    response.raise_for_status()
    return response.json()

This route is async in spelling only. The requests.get() call blocks the event loop until it finishes.

The Route Decision Matrix

Use this table during code review:

Work inside the routeBetter choiceWhy
Async database callasync defThe DB wait can yield to the event loop.
Async HTTP callasync defOutbound network wait can overlap with other requests.
Sync library with no async APIdef, or explicit offloadKeep blocking work away from the event loop.
Short pure-Python logicEither, keep it simpleThere is no meaningful I/O wait to overlap.
CPU-heavy workWorker process or queueAsync does not remove CPU cost.
Short non-critical side effectBackgroundTasks can workUseful after-response convenience.
Long or retryable jobExternal queue plus job tableNeeds durability, status, and retries.
WebSocket or SSE streamasync defLong-lived connections need cooperative I/O.

Thread Pools Are Useful, But Bounded

Sync routes are not automatically bad. They are often the right choice when the dependency is blocking.

The trade-off is capacity. Starlette uses AnyIO's worker thread pool for sync routes and sync dependencies. In the local simulation, the default AnyIO thread limit was 40 tokens.

At 100 concurrent requests with a 50 ms blocking sleep:

EndpointConcurrencyWall timeLesson
/async-sleep1000.0763 sAwaitable waits stayed close to one wave.
/sync-blocking1000.1932 sThread-pool work completed in bounded waves.
/offloaded-blocking1000.1962 sExplicit offload used the same kind of limited resource.

Thread-pool offloading is a safety valve. It is not infinite concurrency.

Sync FastAPI route work moving through a bounded thread pool
Figure 2. Sync routes and explicit offloads protect the event loop, but they still consume bounded thread-pool capacity.

Async Databases Need Their Own Discipline

Async database access is one of the best reasons to use async FastAPI, but it also creates easy mistakes.

Good defaults:

  • Create the database engine once during app startup or lifespan.
  • Create one async session per request.
  • Do not store a global AsyncSession.
  • Do not share one session across concurrent tasks.
  • Tune the connection pool under load.
  • Measure query time separately from route time.

A good shape looks like this:

from collections.abc import AsyncIterator
from typing import Annotated

from fastapi import Depends, FastAPI
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine

app = FastAPI()

engine = create_async_engine(DB_URL, pool_pre_ping=True)
SessionLocal = async_sessionmaker(engine, expire_on_commit=False)


async def get_session() -> AsyncIterator[AsyncSession]:
    async with SessionLocal() as session:
        yield session


@app.get("/users/{user_id}")
async def read_user(
    user_id: str,
    session: Annotated[AsyncSession, Depends(get_session)],
):
    statement = select(User).where(User.id == user_id)
    return (await session.scalars(statement)).one_or_none()

Async does not bypass the database. If 500 requests hit a pool of 20 connections, 480 requests still need to wait somewhere. That waiting should be controlled and observable.

BackgroundTasks Is Not A Queue

FastAPI BackgroundTasks is useful for small after-response work:

  • send a non-critical notification
  • write a lightweight audit event
  • schedule local cleanup

It is not a durable job system. It does not give you strong retry behavior, cross-process visibility, long-running status, or crash recovery.

For work like OCR, video processing, batch email, report generation, embeddings, or long LLM jobs, use a job architecture:

  1. POST /jobs creates a job row and returns job_id.
  2. A worker pulls work from a queue.
  3. The worker updates job status.
  4. The client polls, subscribes through SSE/WebSocket, or receives a webhook.
  5. Failed jobs have retry and dead-letter behavior.

The API process should stay responsive. The worker process should own the long work.

How To Test Async FastAPI Code

For normal route tests, FastAPI's TestClient is often enough.

When the test itself is async, or when it needs to await async database calls, use HTTPX with ASGITransport:

import pytest
from httpx import ASGITransport, AsyncClient

from app.main import app


@pytest.mark.anyio
async def test_root():
    async with AsyncClient(
        transport=ASGITransport(app=app),
        base_url="http://test",
    ) as client:
        response = await client.get("/")

    assert response.status_code == 200

One caveat: in-process ASGI tests are not the same as real network tests. They are excellent for route behavior and validation. They are not enough for full deployment behavior, worker topology, graceful shutdown, or production latency.

The Production Checklist

Before calling a FastAPI service "async-ready", check this:

  • Every slow operation inside async def is actually awaitable.
  • No requests, sync ORM call, time.sleep, or blocking SDK is hidden inside an async route.
  • HTTP clients are reused where appropriate and have timeouts.
  • Database sessions are request-scoped, not global.
  • Connection pool limits are known and measured.
  • Sync routes and sync dependencies are understood as thread-pool work.
  • CPU-heavy tasks are moved out of the request path.
  • Long jobs use a queue and persisted status.
  • Tests cover both validation errors and async I/O paths.
  • Load tests report p95 and p99 latency, not only average latency.

The Practical Rule

Use async FastAPI when your service spends time waiting on things that can be awaited. Keep blocking work out of the event loop. Keep long work out of the API process. Measure the real bottleneck before changing the route style.

That is the difference between writing async Python and building an async API that behaves well under load.

References

Send a failure example

What coding-agent failure do you wish your eval suite had caught earlier?