When you're building an API that needs to handle serious traffic, the decisions you make at the database and infrastructure layer compound dramatically. After scaling a commerce platform API from 100 to 10,000+ concurrent users, I want to share the patterns that actually made the difference — not theory, but code I've run in production.
This post covers five areas: connection pooling, query optimization, caching with Redis, graceful degradation with circuit breakers, and rate limiting. Each one independently improved our throughput, and together they transformed a fragile system into a resilient one.
Connection Pooling Done Right
The first bottleneck we hit wasn't CPU or memory — it was database connections. PostgreSQL defaults to 100 max connections, which sounds generous until you realize each Node.js worker process maintains its own pool. With 8 workers and a pool size of 20 each, you're already trying to open 160 connections.
Here's the pooling setup that works:
import { Pool, PoolConfig, PoolClient } from 'pg';
const poolConfig: PoolConfig = {
host: process.env.DB_HOST,
port: Number(process.env.DB_PORT) || 5432,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20,
idleTimeoutMillis: 30_000,
connectionTimeoutMillis: 3_000,
allowExitOnIdle: true,
};
const pool = new Pool(poolConfig);
pool.on('error', (err) => {
console.error('Unexpected pool error:', err.message);
});
pool.on('connect', () => {
pool.query("SET statement_timeout = '5s'");
});
export async function query<T>(
text: string,
params?: unknown[]
): Promise<T[]> {
const client = await pool.connect();
try {
const result = await client.query(text, params);
return result.rows as T[];
} finally {
client.release();
}
}
export async function transaction<T>(
fn: (client: PoolClient) => Promise<T>
): Promise<T> {
const client = await pool.connect();
try {
await client.query('BEGIN');
const result = await fn(client);
await client.query('COMMIT');
return result;
} catch (err) {
await client.query('ROLLBACK');
throw err;
} finally {
client.release();
}
}
The formula for pool sizing is (core_count * 2) + effective_spindle_count. For cloud instances with SSDs, that simplifies to roughly cores * 2 + 1. On a 4-core instance, 10 connections per worker is a strong starting point.
Critical detail: set statement_timeout on every connection. Without it, a single runaway query can hold a connection indefinitely, starving the pool. We learned this the hard way when a missing index caused a sequential scan to hold a connection for 45 seconds during peak traffic.
For production systems with multiple workers, consider PgBouncer as an external connection pooler. It sits between your app and PostgreSQL, multiplexing hundreds of application connections into a smaller number of database connections.
Query Optimization: Killing the N+1
The N+1 problem is the most common performance killer in API development. You fetch a list of orders, then for each order you fetch its items, then for each item you fetch the product. Three queries become 300.
Pattern 1: Lateral joins for nested data
Instead of fetching orders and then looping to get items, use a single query with LATERAL:
SELECT
o.id, o.status, o.created_at,
o.total_amount,
items.data AS line_items
FROM orders o
CROSS JOIN LATERAL (
SELECT json_agg(json_build_object(
'product_id', p.id,
'name', p.name,
'quantity', oi.quantity,
'unit_price', oi.unit_price
)) AS data
FROM order_items oi
JOIN products p ON p.id = oi.product_id
WHERE oi.order_id = o.id
) items
WHERE o.user_id = $1
AND o.created_at > NOW() - INTERVAL '30 days'
ORDER BY o.created_at DESC
LIMIT 20;
This replaces what was previously 1 + N + N queries with a single round trip. On a dataset with 50 orders averaging 4 items each, we went from ~250 queries to 1.
Pattern 2: DataLoader for GraphQL-style batching
When lateral joins aren't feasible, use the DataLoader pattern to batch and deduplicate:
import DataLoader from 'dataloader';
const productLoader = new DataLoader<string, Product>(async (ids) => {
const products = await query<Product>(
'SELECT * FROM products WHERE id = ANY($1)',
[ids]
);
const productMap = new Map(products.map(p => [p.id, p]));
return ids.map(id =>
productMap.get(id) ?? new Error(`Product ${id} not found`)
);
});
const [product1, product2, product3] = await Promise.all([
productLoader.load('prod_001'),
productLoader.load('prod_002'),
productLoader.load('prod_003'),
]);
Pattern 3: Strategic indexing
Use pg_stat_user_tables to find what's missing:
SELECT schemaname, relname, seq_scan, seq_tup_read,
idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_tup_read DESC;
CREATE INDEX idx_orders_active ON orders (user_id, created_at DESC)
WHERE status != 'cancelled';
CREATE INDEX idx_order_items_order_product
ON order_items (order_id, product_id);
Partial indexes are especially powerful. If 80% of your queries filter on status = 'active', a partial index on that condition is smaller, faster, and cheaper to maintain than a full index.
Caching Strategies with Redis
Not every request needs to hit PostgreSQL. A well-placed Redis cache can absorb the majority of read traffic.
import Redis from 'ioredis';
const redis = new Redis({
host: process.env.REDIS_HOST,
port: Number(process.env.REDIS_PORT) || 6379,
maxRetriesPerRequest: 3,
retryStrategy: (times) => Math.min(times * 100, 3000),
});
async function cacheable<T>(
key: string,
ttlSeconds: number,
fetcher: () => Promise<T>
): Promise<T> {
const cached = await redis.get(key);
if (cached) return JSON.parse(cached) as T;
const lockKey = `lock:${key}`;
const acquired = await redis.set(lockKey, '1', 'EX', 10, 'NX');
if (!acquired) {
await new Promise(r => setTimeout(r, 100));
const retryCache = await redis.get(key);
if (retryCache) return JSON.parse(retryCache) as T;
}
const data = await fetcher();
await redis.set(key, JSON.stringify(data), 'EX', ttlSeconds);
await redis.del(lockKey);
return data;
}
app.get('/api/products/:id', async (req, res) => {
const product = await cacheable(
`product:${req.params.id}`,
300,
() => query<Product>(
'SELECT * FROM products WHERE id = $1',
[req.params.id]
).then(rows => rows[0])
);
res.json(product);
});
The stampede protection is critical. Without the lock, when a popular cache key expires, hundreds of concurrent requests all miss the cache simultaneously and hammer the database with identical queries.
For cache invalidation, prefer event-driven invalidation over time-based expiry when data consistency matters:
async function updateProduct(id: string, data: Partial<Product>) {
await transaction(async (client) => {
await client.query(
'UPDATE products SET name = $2, price = $3 WHERE id = $1',
[id, data.name, data.price]
);
});
await redis.del(`product:${id}`);
await redis.del('products:featured');
}
Graceful Degradation with Circuit Breakers
When your API depends on external services, those services will eventually fail. The circuit breaker pattern prevents cascading failures by failing fast when a dependency is unhealthy.
enum CircuitState {
CLOSED,
OPEN,
HALF_OPEN,
}
class CircuitBreaker {
private state = CircuitState.CLOSED;
private failures = 0;
private lastFailureTime = 0;
private successCount = 0;
constructor(
private readonly name: string,
private readonly threshold: number = 5,
private readonly resetTimeoutMs: number = 30_000,
private readonly halfOpenMax: number = 3,
) {}
async execute<T>(fn: () => Promise<T>, fallback: () => T): Promise<T> {
if (this.state === CircuitState.OPEN) {
if (Date.now() - this.lastFailureTime > this.resetTimeoutMs) {
this.state = CircuitState.HALF_OPEN;
this.successCount = 0;
} else {
return fallback();
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (err) {
this.onFailure();
return fallback();
}
}
private onSuccess(): void {
if (this.state === CircuitState.HALF_OPEN) {
this.successCount++;
if (this.successCount >= this.halfOpenMax) {
this.state = CircuitState.CLOSED;
this.failures = 0;
}
} else {
this.failures = 0;
}
}
private onFailure(): void {
this.failures++;
this.lastFailureTime = Date.now();
if (this.failures >= this.threshold) {
this.state = CircuitState.OPEN;
}
}
}
const paymentBreaker = new CircuitBreaker('stripe', 3, 60_000);
app.post('/api/checkout', async (req, res) => {
const charge = await paymentBreaker.execute(
() => stripe.charges.create({
amount: req.body.amount,
currency: 'usd'
}),
() => {
queue.add('deferred-charge', req.body);
return { id: 'pending', status: 'deferred' };
}
);
res.json(charge);
});
The three states matter. CLOSED is normal. OPEN means the circuit has tripped — all calls go straight to the fallback. HALF_OPEN lets a few test requests through to see if the dependency has recovered.
Rate Limiting
The sliding window approach gives smoother behavior than fixed windows:
async function slidingWindowRateLimit(
key: string,
maxRequests: number,
windowMs: number
): Promise<{ allowed: boolean; remaining: number }> {
const now = Date.now();
const windowStart = now - windowMs;
const pipe = redis.pipeline();
pipe.zremrangebyscore(key, 0, windowStart);
pipe.zadd(key, now.toString(), `${now}:${Math.random()}`);
pipe.zcard(key);
pipe.expire(key, Math.ceil(windowMs / 1000));
const results = await pipe.exec();
const requestCount = results?.[2]?.[1] as number;
return {
allowed: requestCount <= maxRequests,
remaining: Math.max(0, maxRequests - requestCount),
};
}
function rateLimit(maxRequests: number, windowMs: number) {
return async (req: Request, res: Response, next: NextFunction) => {
const identifier = req.headers['x-api-key'] as string
|| req.ip || 'unknown';
const key = `ratelimit:${identifier}`;
const result = await slidingWindowRateLimit(key, maxRequests, windowMs);
res.set('X-RateLimit-Limit', maxRequests.toString());
res.set('X-RateLimit-Remaining', result.remaining.toString());
if (!result.allowed) {
return res.status(429).json({ error: 'Too many requests' });
}
next();
};
}
app.use('/api/auth/login', rateLimit(5, 60_000));
app.use('/api/search', rateLimit(30, 60_000));
app.use('/api/', rateLimit(100, 60_000));
Putting It All Together
These five patterns reinforce each other. Connection pooling keeps your database connections healthy. Query optimization reduces the work each connection does. Caching prevents the majority of queries from reaching the database at all. Circuit breakers protect you when external dependencies fail. Rate limiting prevents any single client from overwhelming the system.
Start with connection pooling and query optimization — they're foundational. Add caching next for the biggest throughput gain. Then layer in circuit breakers and rate limiting as your API surface grows.
One final lesson: measure before you optimize. Use pg_stat_statements to find your actual slow queries. Use application-level metrics (p50, p95, p99 latency by endpoint) to identify which routes need attention first. Scalability isn't about adding more hardware — it's about eliminating waste at every layer of the stack.