Security

Protecting Auth Endpoints: Rate Limiting and Brute-Force Defense

How to design rate limiting for login, OTP, and reset endpoints — algorithms, what to key on, lockout strategies, and avoiding the traps that let attackers through or lock out real users.

Emilian GheoneaJune 12, 2026 4 min read

Every authentication endpoint is a guessing game an attacker would love to play at machine speed. Rate limiting is how you slow that game down to the point where it isn't worth playing. Done well, it stops brute-force and credential-stuffing attacks without ever inconveniencing a real user. Done badly, it either does nothing or locks out the very people you're trying to protect.

What you're defending against

  • Password brute-forcing — trying many passwords against one account.
  • Credential stuffing — trying many leaked email/password pairs across many accounts.
  • OTP / code brute-forcing — a six-digit code is only a million possibilities; without limits it falls in seconds.
  • Reset/enumeration abuse — hammering the "forgot password" or "resend code" endpoints to spam users or probe for valid accounts.

Notice these have different shapes. Per-account limits stop brute-forcing one account; per-IP limits stop one machine hitting many accounts. You need both.

Choosing a rate-limit algorithm

Fixed window

Count requests in each clock-aligned window (e.g. per minute). Simple, but vulnerable to bursts at the window boundary — an attacker can send a full quota at 0:59 and another at 1:00.

Sliding window

Smooths the boundary problem by weighting the previous window. More accurate, slightly more state.

Token bucket

A bucket refills at a steady rate up to a maximum. Each request spends a token; an empty bucket means rejection. This allows short legitimate bursts while enforcing a sustained average — usually the best fit for auth endpoints.

For most applications, a token-bucket or sliding-window limiter backed by a fast shared store (Redis or equivalent) is the right default. The store must be shared across all your servers, or attackers simply spread requests across instances to dodge per-node counters.

What to key the limit on

Choosing the right key is more important than the algorithm:

  • Per IP — catches a single source hammering you. But beware: many legitimate users share an IP (corporate NAT, mobile carriers), and attackers rotate IPs cheaply. IP alone is necessary but not sufficient.
  • Per account / username — catches brute-forcing of a specific account regardless of source IP. Essential for login.
  • Per IP + account combination — fine-grained and useful for distinguishing targeted attacks.
  • Global / endpoint-wide — a circuit breaker for the whole endpoint under a distributed attack.

A layered approach — combining per-account, per-IP, and global limits — is far more robust than any single key.

Lockout strategies

When a threshold is crossed, you have options, from gentle to strict:

  • Throttling / exponential backoff — each failed attempt increases the required wait. Smooth and self-correcting; usually the best user experience.
  • CAPTCHA challenge — after a few failures, require proof of humanity. Stops bots while letting real users continue.
  • Temporary lockout — block further attempts for a fixed period. Effective, but be careful: a naive per-account lockout becomes a denial-of-service tool — an attacker can lock a victim out of their own account on purpose.

Prefer throttling and CAPTCHA over hard account lockouts. If you do lock accounts, lock the attempt source rather than the target account where possible, and always give the legitimate user a recovery path.

Don't forget the secondary endpoints

The single most common mistake: rate-limiting the login form and nothing else. Apply limits consistently to every auth-adjacent endpoint:

  • Sign-in
  • OTP verification (critical — this is the most brute-forceable)
  • OTP / magic-link request (to prevent email/SMS flooding and cost abuse)
  • Password reset request and submission
  • Token refresh
  • Sign-up (to limit automated account creation)

An unprotected OTP-verify endpoint quietly undoes all the work you did protecting the login form.

Make responses safe, too

Rate limiting interacts with information disclosure. Keep responses uniform so the limiter itself doesn't leak which accounts exist or which codes were "closer." Return a clear, generic error and the standard 429 Too Many Requests status with a Retry-After header so well-behaved clients can back off gracefully.

HTTP/1.1 429 Too Many Requests
Retry-After: 60

Observability

Rate limiting is also a detection surface. Spikes in 429s, a flood of failed logins across many accounts, or unusual geographic patterns are early warnings of an attack in progress. Log limiter decisions, alert on anomalies, and feed the data back into tuning your thresholds.

A sensible default configuration

  • Login: per-account limit (e.g. 5 failures → exponential backoff, CAPTCHA after 10), plus a per-IP limit, plus a global circuit breaker.
  • OTP verify: strict — a handful of attempts per code, then the code is burned and a new one required.
  • OTP/magic-link/reset requests: a few per account and per IP per hour to prevent flooding.
  • Refresh: modest per-token limits, with rotated-token-reuse detection layered on top.
  • Shared store across all instances; 429 with Retry-After; uniform error messages; full logging.

Rate limiting isn't a feature you bolt on after a breach — it's a structural property of a healthy auth system. The goal is asymmetry: make each guess so cheap for you to reject and so expensive for an attacker to attempt that brute force simply stops being viable.

Written by

Emilian Gheonea

Senior Blockchain & Full-Stack Software Engineer. I build EmbedAuth — an embeddable authentication platform for SaaS — and write about the auth problems most teams hit too late.