Normal görünüm

Dünden önceki gün alınanlar

Jono Alderson
A complete guide to HTTP caching 29 Ağustos 2025 saat 18:53

A complete guide to HTTP caching

29 Ağustos 2025 saat 18:53

Caching is the invisible backbone of the web. It’s what makes sites feel fast, reliable, and affordable to run. Done well, it slashes latency, reduces server load, and allows even fragile infrastructure to withstand sudden spikes in demand. Done poorly – or ignored entirely – it leaves websites slow, fragile, and expensive.

At its core, caching is about reducing unnecessary work. Every time a browser, CDN, or proxy has to ask your server for a resource that hasn’t changed, you’ve wasted time and bandwidth. Every time your server has to rebuild or re-serve identical content, you’ve added load and cost. Under heavy traffic – whether that’s Black Friday, a viral news story, or a DDoS attack – those mistakes compound until the whole stack buckles.

And yet, despite being so fundamental, caching is one of the most misunderstood aspects of web performance. Many developers:

Confuse no-cache with “don’t cache,” when it actually means “store, but revalidate”.
Reach for no-store as a “safe” default, unintentionally disabling caching entirely.
Misunderstand how Expires interacts with Cache-Control: max-age.
Fail to distinguish between public and private, leading to security or performance issues.
Ignore advanced directives like s-maxage or stale-while-revalidate.
Don’t realise that CDNs, browsers, proxies, and application caches all layer their own rules on top.

The result? Countless sites ship with fragile, inconsistent, or outright broken caching policies. They leave money on the table in infrastructure costs, frustrate users with sluggish performance, and collapse under load that better-configured systems would sail through.

This guide exists to fix that. Over the next chapters, we’ll unpack the ecosystem of HTTP caching in detail:

How headers like Cache-Control, Expires, ETag, and Age actually work, alone and together.
How browsers, CDNs, and app-level caches interpret and enforce them.
The common pitfalls and misconceptions that can trip up even experienced developers.
Practical recipes for static assets, HTML documents, APIs, and more.
Modern browser behaviours, like BFCache, speculation rules, and signed exchanges.
CDN realities, with a deep dive into Cloudflare’s defaults, quirks, and advanced features.
How to debug and verify caching in the real world.

By the end, you’ll not only understand the nuanced interplay of HTTP caching headers – you’ll know how to design and deploy a caching strategy that makes your sites faster, cheaper, and more reliable.

The business case for caching

Caching matters because it directly impacts four fundamental outcomes of how a site performs and scales:

Speed

Caching eliminates unnecessary network trips. A memory-cache hit in the browser is effectively instant, compared to the 100–300ms you’d otherwise wait just to complete a handshake and see the first byte. Multiply that by dozens of assets and you get smoother page loads, lower Core Web Vitals, and happier users.

Resilience

When demand surges, cache hits multiply capacity. If 80% of traffic is absorbed by a CDN edge, your servers only need to handle the other 20%. That’s the difference between sailing through Black Friday and collapsing under a viral traffic spike.

Cost

Every cache hit is one less expensive origin request. CDN bandwidth is cheap; uncached origin hits consume CPU, database queries, and outbound traffic that you pay for. A 5–10% improvement in cache hit ratio can translate directly into thousands of dollars saved at scale. And that’s not even counting when requests are cached in users’ browsers, and don’t even hit the CDN!

SEO

Caching improves both speed and efficiency for search engines. Bots are less aggressive when they see effective caching headers, conserving crawl budget for fresher and deeper content. Faster pages also feed directly into Google’s performance signals.

Real-world Scenarios

A news site avoids a meltdown during a breaking story because 95% of requests are served from the CDN cache.
An API under sustained load continues to respond consistently thanks to stale-if-error and validator-based revalidation.
An e‑commerce platform handles Black Friday traffic smoothly because static assets and category pages are long-lived at the edge.

Side note on the philosophy of caching

It’s worth acknowledging that there’s a quiet anti-culture around caching. Some developers see it as a hack – a band-aid slapped over slow systems, masking deeper flaws in design or architecture. In an ideal world, every request would be cheap, every response instant, and caching wouldn’t even be needed. And there’s merit in that vision: designing systems to be inherently fast avoids the complexity and fragility that caching introduces.

In practice, most of us don’t live in that world. Real systems face unpredictable spikes, long geographic distances, and sudden swings in demand. Even the best-architected applications benefit from caching as an amplifier. The key is balance: caching should never excuse poor underlying performance, but it should always be part of how you scale and stay resilient when traffic surges.

Mental model: who caches what?

Before diving into the fine-grained details of headers and directives, it helps to understand the landscape of who is actually caching your content. Caching isn’t a single thing that happens in one place — it’s an ecosystem of layers, each with its own rules, scope, and quirks.

Browsers

Every browser maintains both a memory cache and a disk cache. The memory cache is extremely fast but short-lived – it only lasts while a page is open – and is designed to avoid redundant network fetches during a single session. It isn’t governed by HTTP caching headers: even resources marked no-store may be reused from memory if they’re requested again within the same page. The disk cache, by contrast, persists across tabs and sessions, can hold much larger resources, and does respect HTTP caching headers (though browsers may still apply their own heuristics when metadata is missing).

Proxies

Between the browser and the wider internet, requests often pass through proxies – especially in corporate environments or ISP-managed networks. These proxies can act as shared caches, storing responses to reduce bandwidth costs or to enforce organisational policies. Unlike CDNs, you usually don’t configure them yourself, and their behaviour may be opaque.

For example, a corporate proxy might cache software downloads to avoid repeated gigabyte transfers across the same office connection. An ISP might cache popular news images to improve load times for customers. The problem is that these proxies don’t always respect HTTP caching headers perfectly, and they may apply their own heuristics or overrides. That can lead to inconsistencies, like a user behind a proxy seeing a stale or stripped-down response long after it should have expired.

While less visible than browser or CDN caches, proxies are still an important part of the ecosystem. They remind us that caching isn’t always under the site owner’s direct control – and that intermediaries in the network can influence freshness, reuse, and even correctness.

Side note on transparent ISP proxies

In the early 2000s, many ISPs deployed “transparent” proxies that cached popular resources without users or site owners even knowing. They still crop up in some regions today. These proxies sit silently between the browser and the origin, caching opportunistically to save bandwidth. The downside is that they sometimes ignore cache headers entirely, serving outdated or inconsistent content. If you’ve ever seen a site behave differently at home vs on mobile data, a transparent proxy might have been the reason.

Shared caches

Between users and origin servers sit a host of shared caches – CDNs like Cloudflare or Akamai, ISP-level proxies, corporate gateways, or reverse proxies. These shared layers can dramatically reduce origin load, but they come with their own logic and sometimes override or reinterpret origin instructions.

Reverse proxies

Technologies like Varnish or NGINX can act as local accelerators in front of your application servers. They intercept and cache responses close to the origin, smoothing traffic spikes and offloading heavy lifting from your app or database.

Application and database caches

Inside your stack, systems like Redis or Memcached store fragments of rendered pages, precomputed query results, or sessions. They aren’t governed by HTTP headers – you design the keys and TTLs yourself – but they are crucial parts of the caching ecosystem.

Cache keys and variants

Every cache needs a way to decide whether two requests are “the same thing” or not. That decision is made using a cache key – essentially, the unique identifier for a stored response.

By default, a cache key is based on the scheme, host, path, and query string of the requested resource. But in practice, browsers add more dimensions. Most implement double-keyed caching, where the top-level browsing context (the site you’re on) is also part of the key. That’s why your browser can’t reuse a Google Font downloaded while visiting one site when another, unrelated site requests the same font file – each gets its own cache entry, even though the URL is identical.

Modern browsers are moving towards triple-keyed caching, which adds subframe context into the key as well. This means a resource requested inside an embedded iframe may have its own cache entry, separate from the same resource requested by the top-level page or by another iframe. This design improves privacy (by limiting cross-site tracking via shared cache entries), but it also reduces opportunities for cache reuse.

On top of that, HTTP adds another layer of complexity: the Vary header. This tells caches that certain request headers should also be part of the cache key.

Examples:

Vary: Accept-Encoding → store one copy compressed with gzip, another with brotli.
Vary: Accept-Language → store separate versions for en-US vs de-DE.
Vary: Cookie → every unique cookie value creates a separate cache entry (often catastrophic).
Vary: * → means “you can’t safely reuse this for anyone else,” which effectively kills cacheability.

This is powerful, and sometimes essential. If your server switches image formats based on Accept headers, or serves AVIF to browsers that support it, you must use Vary: Accept to avoid sending incompatible responses to clients that can’t handle them. At the same time, Vary is easy to misuse. Carelessly adding Vary: User-Agent, Vary: Cookie, or Vary: * can explode your cache into thousands of near-duplicate entries. The key is to vary only on headers that genuinely change the response – nothing more.

That’s where normalisation comes in. Smart CDNs and proxies can simplify cache keys, collapsing away differences that don’t matter. For example:

Ignoring analytics query parameters (e.g., ?utm_source=...).
Treating all iPhones as the same “mobile” variant, instead of keying on every device string.

The balance is to vary only on things that truly change the response. Anything else is wasted fragmentation and lower hit ratios.

Side note on No-Vary-Search

A new experimental header, No-Vary-Search, lets servers tell caches to ignore certain query parameters when deciding cache keys. For example, you could treat ?utm_source= or ?fbclid= as irrelevant and avoid fragmenting your cache into thousands of variants. At the moment, support is limited – Chrome only uses it with speculation rules – but if adopted more widely, it could offer a standards-based way to normalise cache keys without relying on CDN configuration.

Freshness vs validation

Knowing who is caching your content and how they decide whether two requests are the same only answers part of the question. The other part is when a stored response can be reused.

Every cache, whether it’s a browser or a CDN, has to decide:

Is this copy still fresh enough to serve as-is?
Or has it gone stale, and do I need to check with the origin?

That’s the core trade-off in caching: freshness (serve immediately, fast but risky if outdated) versus validation (double-check with the origin, slower but guaranteed correct).

All the headers we’ll explore next – HTTP headers like Cache-Control, Expires, ETag, and Last-Modified help us to guide this decision-making process.

Core HTTP caching headers

Now that we know who caches content and how they make basic decisions, it’s time to look at the raw materials: the headers that control caching. These are the levers you pull to influence every layer of the system – browsers, CDNs, proxies, and beyond.

At a high level, there are three categories:

Freshness controls: tell caches how long a response can be served without revalidation.
Validators: provide a way to check cheaply if something has changed.
Metadata: describe how the response should be stored, keyed, or observed.

Let’s break them down.

The `Date` header

Every response should carry a Date header. It’s the server’s timestamp for when the response was generated, and it’s the baseline for all freshness and age calculations. If Date is missing or skewed, caches will make their own assumptions.

The `Cache-Control` (response) header

This is the most important header – the control panel for how content should be cached. It carries multiple directives, split into two broad groups:

Freshness directives:

max-age: how long (in seconds) the response is fresh.
s-maxage: like max-age, but applies only to shared caches (e.g. CDNs). Overrides max-age there.
immutable: signals that the resource will never change (ideal for versioned static assets).
stale-while-revalidate: allows serving a stale response while fetching a fresh one in the background.
stale-if-error: allows serving stale content if the origin is down or errors.

Storage/use directives:

public: response may be stored by any cache, including shared ones.
private: response may be cached only by the browser, not shared caches.
no-cache: store, but revalidate before serving.
no-store: do not store at all.
must-revalidate: once stale, the response must be revalidated before use.
proxy-revalidate: same, but targeted at shared caches.

The `Cache-Control` (request) header

Browsers and clients can also send caching directives. These don’t change the server’s headers, but they influence how caches along the way behave.

no-cache: forces revalidation (but allows use of stored entries).
no-store: bypasses caching entirely.
only-if-cached: instructs to return a cached response if available, otherwise error (useful offline).
max-age, min-fresh, max-stale: fine-tune tolerance for staleness.

The `Expires` header

An older way of defining freshness, based on providing an absolute date/timestamp.

Example: Expires: Wed, 29 Aug 2025 12:00:00 GMT.
Ignored if Cache-Control: max-age is present.
Vulnerable to clock skew between servers and clients.
Still widely seen, often for backwards compatibility.

The `Pragma` header

The Pragma header dates back to HTTP 1.0 and was used to prevent caching before Cache-Control existed (on requests; asking intermediaries to revalidate content before reuse). Modern browsers and CDNs now rely on Cache-Control, but some intermediaries and older systems still respect Pragma. In theory, it could take any arbitrary name/value pairs; in practice, only one ever mattered: Pragma: no-cache.

For maximum compatibility – especially when dealing with mixed or legacy infrastructure – it’s harmless to include both.

The `Age` header

Age tells you how old the response is (in seconds) when delivered. It’s supposed to be set by shared caches, but not every intermediary implements it consistently. Browsers never set it. Treat it as a helpful signal, not an absolute truth.

Side note on Age

You’ll only ever see Age headers from shared caches like CDNs or proxies. Why? Because browsers don’t expose their internal cache state to the network – they just serve responses directly to the user. Shared caches, on the other hand, need to communicate freshness downstream (to other proxies, or to browsers), so they add Age. That’s why you’ll often see Age: 0 on a fresh CDN hit, but never on a pure browser cache hit.

Validator headers: `ETag` and `Last-Modified`

When freshness runs out, caches use validators to avoid re-downloading the whole resource.

ETag: a unique identifier (opaque string) for a specific version of a resource.
- Strong ETags ("abc123") mean byte-for-byte identical.
- Weak ETags (W/"abc123") mean semantically the same, though bytes may differ (e.g. re-gzipped).
Last-Modified: timestamp of when the resource last changed.
- Less precise, but still useful.
- Supports heuristic freshness when max-age/Expires are missing.
Conditional requests:
- If-None-Match (with ETag) → server replies 304 Not Modified if unchanged.
- If-Modified-Since (with Last-Modified) → same, but based on date.
- Both save bandwidth and reduce load, because only headers are exchanged.

Side note on strong vs weak ETags

An ETag is an identifier for a specific version of a resource. A strong ETag ("abc123") means byte-for-byte identical – if even a single bit changes (like whitespace), the ETag must change. A weak ETag (W/"abc123") means “semantically the same” – the content may differ in trivial ways (e.g. compressed differently, reordered attributes) but is still valid to reuse.

Strong ETags give more precision, but can cause cache misses if your infrastructure (say, different servers behind a load balancer) generates slightly different outputs. Weak ETags are more forgiving, but less strict. Both work with conditional requests – the choice is about balancing precision vs practicality.

Side note on ETags vs Cache-Control headers

Cache-Control directives are processed before the ETag. If it determines that a resource is stale, the cache uses the ETag (or Last-Modified) to revalidate with the origin. Think of it this way:

While fresh: the cache serves the copy immediately, no validation.
When stale: the cache sends If-None-Match: "etag-value".

If the origin replies 304 Not Modified, the cache can keep using the stored copy without re-downloading the whole thing. Without Cache-Control, the ETag may be used for heuristic freshness or unconditional revalidation – but that usually means more frequent trips back to the origin. The two are designed to work together: Cache-Control sets the lifetime, ETags handle the check-ups.

The `Vary` header

The Vary header tells caches which request headers should be factored into the cache key. It’s what allows a single URL to have multiple valid cached variants. For example, if a server responds with Vary: Accept-Encoding, the cache will store one copy compressed with gzip and another compressed with brotli. Each encoding is treated as a distinct object, and the right one is chosen depending on the next request.

This flexibility is powerful, but also easy to misuse. Setting Vary: * is effectively the same as saying “this response can never be reused safely for anyone else”, which makes it uncacheable in shared caches. Similarly, Vary: Cookie is notorious for destroying hit rates, because every unique cookie value creates a separate cache entry.

The best approach is to keep Vary minimal and intentional. Only vary on headers that truly change the response in a meaningful way. Anything else just fragments your cache, lowers efficiency, and adds unnecessary complexity.

Observability helpers

Modern caches don’t just make decisions silently – they often add their own debugging headers to help you understand what happened. The most important of these is Cache-Status, a new standard that reports whether a response was a HIT or a MISS, how long it sat in cache, and sometimes even why it was revalidated. Many CDNs and proxies also use the older X-Cache header for the same purpose, typically showing a simple HIT or MISS flag. Cloudflare goes a step further with its cf-cache-status header, which distinguishes between HIT, MISS, EXPIRED, BYPASS and DYNAMIC (and other values).

These headers are invaluable when tuning or debugging, because they reveal the cache’s own decision-making rather than just echoing your origin’s intent. A response might look cacheable on paper, but if you see a steady stream of MISS or DYNAMIC, it probably means that the intermediary isn’t following your headers the way you expect.

Freshness & age calculations

Once you understand who caches content and which headers control their behaviour, the next step is to see how those pieces come together in practice. Every cache – whether it’s a browser, a CDN, or a reverse proxy – follows the same logic:

Work out how long the response should be considered fresh.
Work out how old the response currently is.
Compare the two, and decide whether to serve, revalidate, or fetch anew.

This is the hidden math that drives every “cache hit” or “cache miss” you’ll ever see.

Freshness lifetime

The freshness lifetime tells a cache how long it can serve a response without re-checking with the origin. To work that out for a given request, caches look for the following HTTP response headers in a strict order of precedence:

Cache-Control: max-age (or s-maxage) → overrides everything else.
Expires → an absolute date, used only if max-age is absent.
Heuristic freshness → if neither of those directives is present, caches guess.

Example 1: `max-age`

Date: Tue, 29 Aug 2025 12:00:00 GMT
Cache-Control: max-age=300

Here, the server explicitly tells caches, “This response is good for 300 seconds after the Date”. That means the response can be considered fresh until 12:05:00 GMT. After that, it becomes stale unless revalidated.

Example 2: `Expires`

Date: Tue, 29 Aug 2025 12:00:00 GMT
Expires: Tue, 29 Aug 2025 12:10:00 GMT

There’s no max-age, but Expires provides an absolute cutoff. Caches compare the Date (12:00:00) with the Expires time (12:10:00). That’s a 10-minute freshness window: the response is fresh until 12:10:00, then stale.

Example 3: Heuristic

Date: Tue, 29 Aug 2025 12:00:00 GMT
Last-Modified: Mon, 28 Aug 2025 12:00:00 GMT

With no max-age or Expires, caches fall back to heuristics. Browsers have varying approaches; Chrome uses 10% of the time since the last modification. Here, the resource was last modified 24 hours ago, so the cache should be considered fresh for 2.4 hours (until about 14:24:00 GMT), after which revalidation kicks in.

Current age

The current age is the cache’s estimate of how old the response is right now. The spec gives a formula, but we can break it into steps:

Apparent age = now – Date (if positive).
Corrected age = max(Apparent age, Age header).
Resident time = how long it’s been sitting in the cache.
Current age = Corrected age + Resident time.

Example 4: Simple case

Date: Tue, 29 Aug 2025 12:00:00 GMT
Cache-Control: max-age=60

The response was generated at 12:00:00 and reached the cache at 12:00:05, so it already appeared to be 5 seconds old when it arrived. With no Age header present, the cache then held onto it for another 15 seconds, making the total current age 20 seconds. Since the response had a max-age of 60 seconds, it was still considered fresh.

Example 5: With `Age` header

Date: Tue, 29 Aug 2025 12:00:00 GMT
Age: 30
Cache-Control: max-age=60

The origin sends a response stamped with Date: 12:00:00 and also includes Age: 30, meaning some upstream cache already held it for 30 seconds. When a downstream cache receives it at 12:00:40, it looks 40 seconds old. The cache takes the higher of the two (40 vs 30) and then adds the 20 seconds it sits locally until 12:01:00. That makes the total current age 60 seconds – exactly matching the max-age=60 limit. At that point, the response is no longer fresh and must be revalidated.

Decision tree

Once a cache knows both numbers:

If current age < freshness lifetime → Serve immediately (fresh hit).
If current age ≥ freshness lifetime →
- If stale-while-revalidate → Serve stale now, revalidate it in the background.
- If stale-if-error and origin is failing → Serve stale.
- Else → Revalidate with origin (conditional GET/HEAD).

Example 6: `stale-while-revalidate`

Cache-Control: max-age=60, stale-while-revalidate=30

A response has Cache-Control: max-age=60, stale-while-revalidate=30. At 12:01:10, the cache’s copy is 70 seconds old – 10 seconds beyond its freshness window. Normally, that would require a revalidation before serving, but stale-while-revalidate allows the cache to serve the stale copy instantly as long as it revalidates in the background. Because the copy is only 10 seconds into its 30-second stale allowance, the cache can safely serve it while updating in parallel.

Example 7: stale-if-error

Cache-Control: max-age=60, stale-if-error=600

Another response has Cache-Control: max-age=60, stale-if-error=600. At 12:02:00, the copy is 120 seconds old – well past its 60-second freshness lifetime. The cache tries to fetch a fresh version, but the origin returns a 500 error. Thanks to stale-if-error, the cache is allowed to fall back to its stale copy for up to 600 seconds while the origin remains unavailable, ensuring the user still gets a response.

Why this matters

Understanding the math explains a lot of “weird” behaviour:

A resource expiring “too soon” may be down to a short max-age or a non-zero Age header.
A response that looks stale but is served anyway may be covered by stale-while-revalidate or stale-if-error.
A 304 Not Modified doesn’t mean caching failed – it means the cache correctly revalidated and saved bandwidth.

Caches aren’t mysterious black boxes. They’re just running these calculations thousands of times per second, across millions of resources. Once you know the math, the behaviour becomes predictable – and controllable. But in practice, developers often trip over subtle defaults and misleading directive names. Let’s tackle those misconceptions head-on.

Common misconceptions & gotchas

Even experienced developers misconfigure caching all the time. The directives are subtle, the defaults are quirky, and the interactions are easy to misunderstand. Here are some of the most common traps.

`no-cache` ≠ “don’t cache”

The name is misleading. no-cache actually means “store this, but revalidate before reusing it.” Browsers and CDNs will happily keep a copy, but they’ll always check back with the origin before serving it. If you truly don’t want anything stored, you need no-store.

`no-store` means nothing is kept

no-store is the nuclear option. It instructs every cache – browser, proxy, CDN – not to keep a copy at all. Every request goes straight to the origin. This is correct for highly sensitive data (e.g. banking), but overkill for most use cases. Many sites use it reflexively, throwing away huge performance gains.

`max-age=0` vs `must-revalidate`

They seem similar, but aren’t the same. max-age=0 means “this response is immediately stale”. Without must-revalidate, caches are technically allowed to reuse it briefly under some conditions (e.g. if the origin is temporarily unavailable). Adding must-revalidate removes that leeway, forcing caches to always check with the origin once freshness has expired.

`s-maxage` vs `max-age`

max-age applies everywhere – browsers and shared caches alike. s-maxage only applies to shared caches like CDNs or proxies, and it overrides max-age there. This lets you set a short freshness window for browsers (say, max-age=60) but a longer one at the CDN (s-maxage=600). Many developers don’t realise s-maxage even exists.

`immutable` misuse

immutable tells browsers “this resource will never change, don’t bother revalidating it”. That’s fantastic for fingerprinted assets (like app.9f2d1.js) that are versioned by filename. But it’s dangerous for HTML or any resource that might change under the same URL. Use it on the wrong thing, and you’ll lock users into stale content for months.

Redirect and error caching

Caches can and do store redirects and even error responses. A 301 is cacheable by default (often permanently). Even a 404 or 500 may be cached briefly, depending on headers and CDN settings. Developers are often surprised when “temporary” outages linger because an error response was cached.

Clock skew and heuristic surprises

Caches compare Date, Expires, and Age headers to decide freshness. If clocks aren’t perfectly in sync, or if no explicit headers are present, caches fall back to heuristics. That can lead to surprising expiry behaviour. Explicit freshness directives are always safer.

Cache fragmentation: devices & geography

Caching is simple when one URL maps to one response. It gets tricky when responses vary by device or region.

Device splits: Sites often serve different HTML or JS for desktop vs mobile. If keyed on User-Agent, every browser/version combination becomes a separate cache entry; the result is that cache hit rates collapse. Safer options include normalising User-Agents at the CDN, or using Client Hints (Sec-CH-UA, DPR) with controlled Vary headers.
Geo splits: Serving different content by region (e.g. India vs Germany) often uses Accept-Language or GeoIP rules. But every language combination (en, en-US, en-GB) creates a new cache key. Unless you normalise by region/ruleset, your cache fragments into thousands of variants.

The trade-off is clear: more personalisation usually means less caching efficiency. Once the traps are clear, we can move from theory to practice. Here are the caching “recipes” you’ll use for different content types.

Patterns & recipes

Now that we’ve covered the mechanics and the common pitfalls, let’s look at how to put caching into practice. These are the patterns you’ll reach for again and again, adapted for different kinds of content.

Static assets (JS, CSS, fonts)

Goal: Serve instantly, never revalidate, safe to cache for a very long time.

Typical headers:

Cache-Control: public, max-age=31536000, immutable

Why:

Fingerprinted filenames (app.9f2d1.js) guarantee uniqueness, so old versions can stay cached forever.
Long max-age means they never expire in practice.
immutable stops browsers from wasting time revalidating.

HTML documents

The right TTL depends on how often your HTML changes and how quickly changes must appear. Use one of these profiles, and pair long edge TTLs with event-driven purge on publish/update.

Profile A: High-change (news/homepages):

Cache-Control: public, max-age=60, s-maxage=300, stale-while-revalidate=60, stale-if-error=600
ETag: "abc123"

Rationale: keep browsers very fresh (1m), let the CDN cushion load for 5m, serve briefly stale while revalidating for snappy UX, and survive origin wobbles.

Profile B – Low-change (blogs/docs):

Cache-Control: public, max-age=300, s-maxage=86400, stale-while-revalidate=300, stale-if-error=3600
ETag: "abc123"

Rationale: browsers can reuse for a few minutes; CDN can hold for a day to slash origin traffic. On publish/edit, purge the page (and related pages) to make changes instantaneously.

Logged-in / personalised pages:

Cache-Control: private, no-cache
ETag: "abc123"

Rationale: allow browser storage but force revalidation every time; never share at the CDN.

Side note on long HTML TTLs are safe with event-driven purge

You can run very long CDN cache expiration times (hours, even days) for HTML as long as you actively bust the cache on important events: publish, update, unpublish. Use CDN features like Cache Tags / surrogate keys to purge collections (e.g., “post-123”, “author-jono”), and trigger purges from your CMS. This gives you the best of both worlds: instant updates when it matters, rock-solid performance the rest of the time.

If updates must appear within seconds with no manual purge → keep short CDNs TTLs (≤5m) + stale-while-revalidate.

If updates are event-driven (publish/edit) → use long CDNs TTLs (hours/days) + automatic purge by tag.

If content is personalised → don’t share (use private, no-cache + validators).

APIs

Goal: Balance freshness with performance and resilience.

Typical headers:

Cache-Control: public, s-maxage=30, stale-while-revalidate=30, stale-if-error=300
ETag: "def456"

Why:

Shared caches (CDNs) can serve results for 30s, reducing load.
stale-while-revalidate keeps latency low even as responses are refreshed.
stale-if-error ensures reliability if the backend fails.
Clients can revalidate cheaply with ETags.

Side note on why APIs use short s-maxage + stale-while-revalidate

APIs often serve data that changes frequently, but not every single second. A short s-maxage (e.g. 30s) lets shared caches like CDNs soak up most requests, while still ensuring data stays reasonably fresh.

Adding stale-while-revalidate smooths over the edges: even if the cache has to fetch a new copy, it can serve the slightly stale one instantly while revalidating in the background. That keeps latency low for users.

The combination gives you a sweet spot: low origin load, fast responses, and data that’s “fresh enough” for most real-world use cases.

Authenticated dashboards & user-specific pages

Goal: Prevent shared caching, but allow browser reuse.

Typical headers:

Cache-Control: private, no-cache
ETag: "ghi789"

Why:

private ensures only the end-user’s browser caches the response.
no-cache allows reuse, but forces validation first.
ETags prevent full downloads on every request.

Side note on the omission of max-age

For user-specific content, you can’t risk serving stale data. That’s why the recipe uses private, no-cache but leaves out max-age.

no-cache means the browser may keep a local copy, but must revalidate it with the origin before reusing it.

If you added max-age, you’d be telling the browser it’s safe to serve without checking – which could expose users to out-of-date account info or shopping carts.

Pairing no-cache with an ETag gives you the best of both worlds: safety (always validated) and efficiency (cheap 304 Not Modified responses instead of re-downloading everything).

Side note on security

When handling or presenting sensitive data, you may wish to use private, no-store instead, in order to prevent the browser from storing a locally available cached version. This reduces the likelehood of leaks on devices used by multiple users, for example.

Images & media

Goal: Cache efficiently across devices, while serving the right variant.

Typical headers:

Cache-Control: public, max-age=86400
Vary: Accept-Encoding, DPR, Width

Why:

A one-day freshness window balances speed with flexibility – images can change, but not as often as HTML.
Vary allows different versions to be cached for different devices or display densities.
CDNs can normalise query parameters (e.g. ignore utm_*) and collapse variants intelligently to avoid fragmentation.

Side note on client hints

Modern browsers send Client Hints like DPR (device pixel ratio) and Width (intended display width) when requesting images. If your server or CDN supports responsive images, it can generate and return different variants — e.g. a high-res version for a retina phone, a smaller one for a low-res laptop.

By including Vary: DPR, Width, you’re telling caches: “Store separate copies depending on these hints.” That ensures the right variant is reused for future requests with the same device characteristics.

The catch? Every new DPR or Width value creates a new cache key. If you don’t normalise (e.g. bucket widths into sensible breakpoints), your cache can fragment into hundreds of variants. CDNs often provide built-in rules to manage this.

Beyond headers: browser behaviours

HTTP headers set the rules, but browsers have their own layers of optimisation that can look like “caching” – or interfere with it. These don’t follow the same rules as Cache-Control or ETag, and they often confuse developers when debugging.

Back/forward cache (BFCache)

What it is: A full-page snapshot (DOM, JS state, scroll position) kept in memory when a user navigates away.
Why it matters: Going “back” or “forward” feels instant because the browser restores the page without even hitting HTTP caches.
Gotchas: Many pages aren’t BFCache-eligible. The most common blockers are unload handlers, long-lived connections, or the use of certain browser APIs. Another subtle but important one is Cache-Control: no-store on the document itself – this tells the browser not to keep any copy around, which extends to BFCache. Chrome has recently carved out a small set of exceptions where no-store pages can still enter BFCache in safe cases, but for the most part, if you want BFCache eligibility, you should avoid no-store on documents.

Side note on BFCache vs HTTP Cache

BFCache is like pausing a tab and resuming it – the entire page state is frozen and restored. HTTP caching only stores network resources. A page might fail BFCache but still be quite fast thanks to HTTP cache hits (or vice versa).

Hard refresh vs soft reload

Soft reload (e.g. pressing the reload button): Browser will use cached responses if they’re still fresh. If stale, it revalidates.
Hard refresh (e.g. opening DevTools and right-clicking the reload button to do a fuller reload, or ticking the “disable cache” button): Browser bypasses the cache, re-fetching all resources from the origin.
Gotcha: Users may think “refresh” always fetches new content – but unless it’s a hard refresh, caches still apply.

Speculation rules & link relations

Browsers provide developers with tools that let them (pre)load resources, before the user requests them. These don’t change how caching works, but they can change what ends up in the cache ahead of time.

Prefetch: The browser may fetch resources speculatively and place them in cache, but only for a short window. If they’re not used quickly, they’ll be evicted.
Preload: Resources are fetched early and inserted into cache so they’re ready by the time the parser needs them.
Prerender: The entire page and its subresources are loaded and cached in advance. When a user navigates, it all comes straight from cache rather than the network.
Speculation rules API: Eviction, freshness, and validation usually follow the normal caching rules – but prerendering makes some exceptions. For example, Chrome may prerender a page even if it’s marked with Cache-Control: no-store or no-cache. In those cases, the prerendered copy lives in a temporary store that isn’t part of the standard HTTP cache and is discarded once the prerender session ends (though this behaviour may vary by browser).

The key takeaway: speculation rules are about cache timing, but not cache policy. They front-load work so the cache is already warm, but freshness and expiry are still governed by your headers.

Signed exchanges (SXG)

Signed exchanges don’t change cache mechanics either, but they do change who can serve cached content while keeping origin authenticity intact.

An SXG is a package of a response, plus a cryptographic signature from the origin.
Intermediaries (like Google Search) can store and serve that package from their own caches.
When the browser receives it, it can trust the content as if it came from your domain, while still applying your headers for freshness and validation.

The catch: SXGs have their own signature expiry in addition to your normal caching headers. Even if your Cache-Control allows reuse, the SXG may be discarded once its signature is out of date.

SXGs also support varying by cookie, which means they can package and serve different signed variants depending on cookie values. This enables personalised experiences to be cached and distributed via SXG, but it fragments the cache heavily – every cookie combination creates a new variant.

Key takeaway: SXG adds another clock (signature lifetime) and, if you use cookie variation, another source of cache fragmentation. Your headers still govern freshness, but these extra layers can shorten reuse windows and multiply cache entries.

CDNs in practice: Cloudflare

So far, we’ve looked at how browsers handle caching and the directives that control freshness and validation. But for most modern websites, the first and most important cache your traffic will hit isn’t the browser — it’s the CDN.

Cloudflare is one of the most widely used CDNs, fronting millions of sites. It’s a great example of how shared caches don’t just passively obey your headers. They add defaults, overrides, and proprietary features that can completely change how caching works in practice. Understanding these quirks is essential if you want your origin headers and your CDN behaviour to align.

Defaults and HTML Caching

By default, Cloudflare doesn’t cache HTML at all. Static assets like CSS, JavaScript, and images are stored happily at the edge, but documents are always passed through to the origin unless you explicitly enable “Cache Everything.” That default catches many site owners out: they assume Cloudflare is shielding their servers, when in reality their most expensive requests – the HTML pages themselves – are still hitting the backend every time.

The temptation, then, is to flip the switch and enable “Cache Everything.” But this blunt tool applies indiscriminately, even to pages that vary by cookie or authentication state. In that scenario, Cloudflare can end up serving cached private dashboards or logged-in user data to the wrong people.

The safer pattern is more nuanced: bypass the cache when a session cookie is present, but cache aggressively when the user is anonymous. This approach ensures that public pages reap the benefits of edge caching, while private content is always fetched fresh from the origin.

Side note on Cloudflare’s APO addon

Cloudflare’s Automatic Platform Optimization (APO) addon integrates with WordPress websites, and rewrites caching behaviour so HTML can be cached safely while respecting logged-in cookies. It’s a good example of CDNs layering platform-specific heuristics on top of standard HTTP logic.

Edge vs browser lifetimes

Your origin headers – things like Cache-Control and Expires – define how long a browser should hold onto a resource. But CDNs like Cloudflare add another layer of control with their own settings, such as “Edge Cache TTL” and s-maxage. These apply only to what Cloudflare stores at its edge servers, and they can override whatever the origin says without changing how the browser behaves.

That separation is both powerful and confusing. From the browser’s perspective, you might see max-age=60 and assume the content is cached for just a minute. Meanwhile, Cloudflare could continue serving the same cached copy for ten minutes, because its edge cache TTL is set to 600 seconds. The result is a split reality: browsers refresh often, but Cloudflare still shields the origin from repeated requests.

Cache keys and fragmentation

Cloudflare uses the full URL as its cache key. That means every distinct query parameter – whether it’s a tracking token like ?utm_source=… or something trivial like ?v=123 – creates a separate cache entry. Left unchecked, this behaviour quickly fragments your cache into hundreds of near-identical variants, each one consuming space while reducing the hit rate.

It’s important to note that canonical URLs don’t help here. Cloudflare doesn’t care what your HTML declares as the “true” version of a page; it caches by the literal request URL it receives. To avoid fragmentation, you need to explicitly normalise or ignore unnecessary parameters in Cloudflare’s configuration, ensuring that trivial differences don’t splinter your cache.

Site note on normalising cache keys

Cloudflare lets you define which query parameters to ignore, or how to collapse variants. Stripping out analytics paramaters, for example, can dramatically improve cache hit ratios.

Device and geography splits

Cloudflare also allows you to customise cache keys by including request headers, such as User-Agent or geo-based values. In theory, this enables fine-grained caching — one version of a page for mobile devices, another for desktop, or distinct versions for visitors in different countries.

But in practice, unless you normalise these inputs aggressively, it can explode into massive fragmentation. Caching by raw User-Agent means every browser and version string generates its own entry, instead of collapsing them into a simple “mobile vs desktop” split. The same problem arises with geographic rules: caching by full Accept-Language headers, for example, can create thousands of variants when only a handful of languages are truly necessary.

Done carefully, device and geography splits let you serve tailored content from cache. Done carelessly, they destroy your hit rate and multiply origin load.

Cache tags

Cloudflare also supports tagging cached objects with labels – for example, tagging every page of a blog post with blog-post-123. These tags allow you to purge or revalidate whole groups of resources at once, rather than expiring them one by one.

For CMS-driven sites, this is a powerful tool: when an article is updated, the site can trigger a purge for its tag and instantly invalidate every related URL. But over-tagging – attaching too many labels to too many resources – is common, and can undermine efficiency and make purge operations slower or less predictable.

Other caching layers in the stack

So far, we’ve focused on browser caches, HTTP directives, and CDNs like Cloudflare. But many sites add even more layers between the user and the origin. Reverse proxies, application caches, and database caches all play a role in what a “cached” response actually means.

These layers don’t always speak HTTP – Redis doesn’t care about Cache-Control, and Varnish can happily override your origin headers. But they still shape the user experience, infrastructure load, and the headaches of cache invalidation. To understand caching in the real world, you need to see how these pieces stack and interact.

Application & database caches

Inside the application tier, technologies like Redis and Memcached are often used to keep session data, fragments of rendered pages, or precomputed query results. An ecommerce site, for example, might cache its “Top 10 Products” list in Redis for sixty seconds, saving hundreds of database queries every time a page loads. This is fantastically efficient – until it isn’t.

One common failure mode is when the database updates, but the Redis key isn’t cleared at the right moment. In that case, the HTTP layer happily serves “fresh” pages that are already out of date, because they’re pulling from stale Redis data underneath.

The inverse problem happens just as often. Imagine the app has correctly refreshed Redis with a new product price, but the CDN or reverse proxy still has an HTML page cached with the old price. The origin told that the outer cache that the page was valid for five minutes, so until the TTL runs out (or someone manually purges it), users continue seeing stale HTML – even though Redis already has the update.

In other words: sometimes HTTP looks fresh while Redis is stale, and sometimes Redis is fresh while HTTP caches are stale. Both failure modes stem from the same root issue – multiple caching layers, each with its own logic, falling out of sync.

Reverse proxy caches

One layer closer to the edge, reverse proxies like Varnish or NGINX often sit in front of the application servers, caching whole responses. In principle, they respect HTTP headers, but in practice, they’re usually configured to enforce their own rules. A Varnish configuration might, for example, force a five-minute lifetime on all HTML pages, regardless of what the origin headers say. That’s excellent for resilience during a traffic spike, but dangerous if the content is time-sensitive. Developers frequently run into this mismatch: they open DevTools, inspect the origin’s headers, and assume they know what’s happening – not realising that Varnish is rewriting the rules one hop earlier.

Service Workers

Service Workers add another cache layer inside the browser, sitting between the network and the page. Unlike the built-in HTTP cache, which just follows headers, the Service Worker Cache API is programmable. That means developers can intercept requests and decide – in JavaScript – whether to serve from cache, fetch from the network, or do something else entirely.

This is powerful: a Service Worker can precache assets during install, create custom caching strategies (stale-while-revalidate, network-first, cache-first), or even rewrite responses before handing them back to the page. It’s the foundation of Progressive Web Apps (PWAs) and offline support.

But it comes with pitfalls. Because Service Workers can ignore origin headers and invent their own logic, they can drift out of sync with the HTTP caching layer. For example, you might set Cache-Control: max-age=60 on an API, but a Service Worker coded to “cache forever” will happily serve stale results long after they should have expired. Debugging gets trickier too: responses can look cacheable in DevTools but actually be served from a Service Worker’s script.

The key takeaway: Service Workers don’t replace HTTP caching – they stack on top of it. They give developers fine-grained control, but they also add another layer where things can go wrong if caching strategies conflict.

Layer interactions

The real complexity comes when all these layers interact. A single request might pass through the browser cache, then Cloudflare, then Varnish, and finally Redis. Each layer has its own rules about freshness and invalidation, and they don’t always line up neatly. You might purge the CDN and think you’ve fixed an issue, but the reverse proxy continues to serve its stale copy. Or you might flush Redis and repopulate the data, only to discover the CDN is still serving the “old” version it cached earlier. These kinds of mismatches are the root cause of many mysterious “cache bugs” that show up in production.

Debugging & verification

With so many caching layers in play – browsers, CDNs, reverse proxies, application stores – the hardest part of working with caching is often figuring out which cache served a response and why. Debugging caching isn’t about staring at a single header; it’s about tracing requests through the stack and verifying how each layer is behaving.

Inspecting headers

The first step is to look closely at the headers. Standard fields like Cache-Control, Age, ETag, Last-Modified, and Expires tell you what the origin intended. But they don’t tell you what the caches actually did. For that, you need the debugging signals added along the way:

Age shows how long a response has been sitting in a shared cache. If it’s 0, the response likely came from origin. If it’s 300, you know a cache has been serving the same object for five minutes.
X-Cache (used by many proxies) or cf-cache-status (Cloudflare) show whether a cache hit or miss occurred.
Cache-Status is the emerging standard, adopted by CDNs like Fastly, which reports not just HIT/MISS but also why a decision was made.

Together, these headers form the breadcrumb trail that tells you where the response has been.

Using browser DevTools

The Network panel in Chrome or Firefox’s DevTools is essential for seeing cache behaviour from the user’s side. It shows whether a resource came from disk cache, memory cache, or over the network.

Memory cache hits are near-instant but short-lived, surviving only within the current tab/session.
Disk cache hits persist across sessions but may be evicted.
304 Not Modified responses reveal that the browser revalidated the cached copy with the origin.

It’s also worth testing with different reload types. A normal reload (Ctrl+R) may use cached entries, while a hard reload (Ctrl+Shift+R) bypasses them entirely. Knowing which type of reload you’re performing avoids false assumptions about what the cache is doing.

CDN logs and headers

If you’re using a CDN, its logs and headers are often the most reliable source of truth. Cloudflare’s cf-cache-status, Akamai’s X-Cache, and Fastly’s Cache-Status headers all reveal edge decisions. Most providers also expose logs or dashboards where you can see hit/miss ratios and TTL behaviour at scale.

For example, if you see cf-cache-status: MISS or BYPASS on every request, it usually means Cloudflare isn’t storing your HTML at all – either because it’s following defaults (no HTML caching), or because a cookie is bypassing cache. Debugging at the edge often comes down to correlating what your origin sent, what the CDN says it did, and what the browser eventually received.

Reverse proxies and custom headers

Reverse proxies like Varnish or NGINX can be more opaque. Many deployments add custom headers like X-Cache: HIT or X-Cache: MISS to reveal proxy behaviour. If those aren’t available, logs are your fallback: Varnish’s varnishlog and NGINX’s access logs can both show whether a request was served from cache or passed through.

The tricky part is remembering that reverse proxies may override headers silently. If you see Cache-Control: no-cache from origin but a five-minute TTL in Varnish, the headers in DevTools won’t tell you the full story. You need the proxy’s own debugging signals to verify.

Following the request path

When in doubt, step through the request chain:

Browser → Check DevTools: was it memory, disk, or network?
CDN → Inspect cf-cache-status, Cache-Status, or X-Cache.
Proxy → Look for custom headers or logs to confirm whether the request hit local cache.
Application → See if Redis/Memcached served the data.
Database → If all else fails, confirm the query ran.

Walking layer by layer helps isolate where the stale copy lives. It’s rarely the case that “the cache is broken.” More often, one cache is misaligned while the others are behaving perfectly.

Common debugging mistakes

There are a few traps developers fall into repeatedly:

Only looking at browser headers: These tell you what the origin intended, not what the CDN actually did.
Assuming 304 Not Modified means no caching: In fact, it means the cache did store the response and successfully revalidated it.
Forgetting about cookies: A stray cookie can make a CDN bypass cache entirely.
Testing with hard reloads: A hard reload bypasses the cache, so it doesn’t reflect normal user experience. The same is true if you enable the “Disable cache” tickbox in DevTools – that setting forces every request to skip caching entirely while DevTools is open. Both are useful for troubleshooting, but they give you an artificial view of performance that real users will never see.
Ignoring multi-layer conflicts: Purging the CDN but forgetting to clear Varnish, or clearing Redis but leaving a stale copy at the edge.

Good debugging is less about clever tricks and more about being systematic: check each layer, verify its decision, and compare against what you expect from the headers.

Caching in the AI-mediated web

Up to now, we’ve treated caching as a conversation between websites, browsers, and CDNs. But increasingly, the consumers of your site aren’t human users at all – they’re search engine crawlers, LLM training pipelines, and agentic assistants. These systems rely heavily on caching, and your headers can shape not just performance, but how your brand and content are represented in machine-mediated contexts.

Crawl & scrape efficiency

Search engines and scrapers rely on HTTP caching to avoid re-downloading the entire web every day. Misconfigured caching can make crawlers hammer your origin unnecessarily, or worse, cause them to give up on deeper pages if revalidation is too costly. Well-tuned headers keep crawl efficient and ensure that fresh updates are discovered quickly.

Training data freshness

LLMs and recommendation systems ingest web content at scale. If your resources are always marked no-store or no-cache, they may get re-fetched inconsistently, leading to patchy or outdated snapshots of your site in training corpora. Conversely, stable cache policies help ensure that what makes it into these models is consistent and representative.

Agentic consumption

In an AI-mediated web, agents may act on behalf of users – shopping bots, research assistants, travel planners. For these agents, speed and reliability are first-class signals. A site with poor caching may look slower or less consistent than its competitors, biasing agents away from recommending it. In this sense, caching isn’t just about performance for humans – it’s about competitiveness in machine-driven decision-making.

Fragmentation risks

If caches serve inconsistent or fragmented variants – split by query strings, cookies, or geography – that noise propagates into machine understanding. A crawler or model might see dozens of subtly different versions of the same page. The result isn’t just poor cache efficiency; it’s a fractured representation of your brand in training data and agent outputs.

Wrapping up: caching as strategy

Caching is often treated as a technical detail, an afterthought, or a hack that papers over performance problems. But the truth is more profound: caching is infrastructure. It’s the nervous system that keeps the web responsive under load, that shields brittle origins, and that shapes how both humans and machines experience your brand.

When it’s configured badly, caching makes sites slower, more fragile, and more expensive. It fragments user experience, confuses crawlers, and poisons the well for AI systems that are already struggling to understand the web. When it’s configured well, it’s invisible — things just feel fast, resilient, and trustworthy.

That’s why caching can’t just be left to chance or to defaults. It needs to be a deliberate strategy, as fundamental to digital performance as security or accessibility. A strategy that spans layers — browser, CDN, proxy, application, database. A strategy that understands not just how to shave milliseconds for a single user, but how to present a coherent, consistent version of your site to millions of users, crawlers, and agents simultaneously.

The web isn’t getting simpler. It’s getting faster, more distributed, more automated, and more machine-mediated. In that world, caching isn’t a relic of the old performance playbook. It’s the foundation of how your site will scale, how it will be perceived, and how it will compete.

Caching is not an optimisation. It’s a strategy.

The post A complete guide to HTTP caching appeared first on Jono Alderson.

Jono Alderson
You’re loading fonts wrong (and it’s crippling your performance) 22 Ağustos 2025 saat 00:48

You’re loading fonts wrong (and it’s crippling your performance)

Jono Alderson

Yazar:Jono Alderson

22 Ağustos 2025 saat 00:48

Fonts are one of the most visible, most powerful parts of the web. They carry our brands, shape our identities, and define how every word feels. They’re the connective tissue between design, content, and experience.

And yet: almost everyone gets them wrong.

It’s a strange paradox. Fonts are everywhere. Every website uses them. But very few people – designers, developers, even performance specialists – actually know how they work, or how to load them efficiently. The result is a web full of bloated font files, broken loading strategies, poor accessibility, and a huge amount of wasted bandwidth.

Fonts aren’t decoration. They’re infrastructure. They sit on the critical rendering path, they affect performance metrics like LCP and CLS, they carry licensing and privacy baggage, and they directly influence whether users can read, engage, or trust what’s on the page. If you don’t treat them with the same care and discipline you apply to code, you’re hurting your users and your business.

This article is a deep dive into that problem. We’ll look at how we got here – from the history of web-safe fonts and the rise of Google Fonts, through the myths and bad habits that still dominate today. We’ll get into the mechanics of how fonts actually work in browsers, and why the defaults and “best practices” you’ll find online are often anything but.

We’ll explore performance fundamentals, loading strategies, modern CSS techniques, and the global realities of serving text in multiple scripts and languages. We’ll also dig into the legal and ethical side of font usage, and what the future of web typography might look like.

By the end, you’ll know why your current font setup is probably wrong – and how to fix it.

Because if there’s one thing you take away, it’s this: fonts are not free, fonts are not simple, and fonts are not optional. They deserve the same rigour you apply to performance, accessibility, and SEO.

A brief history of webfonts

To understand why so many people still get fonts wrong, you need a bit of history. The way we think about web typography today is still shaped by the compromises, hacks, and half-truths of the last twenty years.

The “web-safe” era

In the early days, there was no such thing as custom web typography. You picked from a handful of “web-safe” system fonts (Arial, Times New Roman, Verdana, Georgia) and hoped they looked the same on your users’ machines. If you wanted anything else, you sliced it into images.

Hacks before @font-face: sIFR and Cufón

Designers wanted brand typography, but browsers weren’t ready. Enter the hacks:

sIFR (Scalable Inman Flash Replacement): text rendered in Flash, swapped in at runtime over the real HTML. It worked, sort of, but was heavy, brittle, and inaccessible.
Cufón: a JavaScript trick that converted fonts into vector graphics and injected them into pages. No Flash required, but still slow and inaccessible.

These were desperate attempts to break out of the web-safe ecosystem, but they cemented the idea that custom typography was always going to be fragile, heavy, and hacky.

The arrival of @font-face

Then came @font-face. In theory, it let you serve any typeface you wanted, embedded straight into your CSS. In practice, it was a mess:

Different browsers required different, often proprietary formats: EOT (Embedded OpenType) for Internet Explorer, SVG fonts for early iOS Safari, raw TTF/OTF elsewhere.
Developers built “bulletproof” @font-face stacks – verbose CSS rules pointing to four different file formats just to cover every browser.
Licensing was a nightmare: many foundries banned web embedding or charged per-domain/pageview royalties.
Piracy was rampant, with ripped desktop fonts dumped online as “webfonts”.

Commercial services: Typekit and friends

Recognising the mess, commercial services stepped in. Typekit (launched 2009, now Adobe Fonts, and just as awful) offered subscription-based, legally licensed, properly formatted webfonts with a simple embed script. Other foundries built their own hosted services.

Typekit solved licensing and compatibility headaches for many teams, but it also entrenched the idea that fonts should load via third-party JavaScript snippets – a pattern that persists on millions of sites today.

Compatibility hacks and workarounds

Even with @font-face and services like Typekit, the webfont era was littered with workarounds:

Hosting multiple formats of the same font, bloating payloads.
Shipping fonts with whole Unicode ranges bundled “just in case”.
Battling FOUT (Flash of Unstyled Text) vs FOIT (Flash of Invisible Text), often with ugly JavaScript “fixes”.
Leaning on icon fonts to cover missing glyphs and UI symbols.

A whole generation of developers learned fonts as fragile, bloated, and temperamental – lessons that still echo in today’s bad practices.

Google Fonts and the “free font” boom

In 2010, Google Fonts arrived. Suddenly, there was a free, easy CDN with a growing library of open-licensed fonts. Developers embraced it, designers tolerated it, and performance people grumbled but went along with it.

It solved a lot of problems (including licensing, formats, hosting, and CSS wrangling) but it also created new ones. Everyone defaulted to it, even when they shouldn’t. Fonts started loading from a third-party CDN on every pageview, often slowly, and sometimes even illegally (as European courts would later decide).

An aside: Licensing realities

Licensing is the quiet trap in many font strategies. Not every “webfont license” lets you do what this article recommends. Some foundries:

Prohibit subsetting or conversion to WOFF2.
Charge based on pageviews or monthly active users.
Restrict embedding to specific domains.

That’s why Google Fonts felt liberating: no lawyers. But commercial fonts often come with terms that make aggressive optimisation legally risky. If you’re paying for a brand font, read the contract – or negotiate it – before you start slicing and optimising. Some foundries even charge per pageview or per monthly active user, so aggressive optimisation could technically put you out of compliance if you don’t have the right license.

The myths that stuck

From these eras came a set of myths and bad habits that are still alive today:

That custom fonts are “free” and easy.
That it’s fine to ship a single, monolithic font file for every user, in every language.
That Google Fonts (or Typekit) is always the best option.
That typography is a design flourish, not a performance or accessibility concern.

Those assumptions made sense in 2005 or even 2010. They don’t today. But they still shape how most websites load fonts – which is why the state of web typography is such a mess.

How fonts work (the basics)

Before we start tearing down bad practices, we need a shared baseline. Fonts are deceptively simple – “just some CSS” – but under the hood, they’re a surprisingly complex part of the rendering pipeline. Understanding that pipeline explains why fonts so often go wrong.

Formats: from TTF to WOFF2

At heart, a font is a container of glyphs (shapes), tables (instructions, metrics, metadata), and sometimes extras (ligatures, alternate forms, emoji palettes). They come in one of the following formats:

TTF/OTF (TrueType/OpenType): desktop-oriented formats, heavy and not optimised for web transfer.
EOT: Internet Explorer’s proprietary format, thankfully extinct.
SVG fonts: an early hack for iOS Safari, nearly extinct.
WOFF (Web Open Font Format): a wrapper that compressed TTF/OTF for the web.
WOFF2: the modern default – smaller, faster, built on Brotli compression.

If you’re serving anything but WOFF2 today, you’re doing it wrong. For almost every project, WOFF2 is all you need. Unless you have a specific business case (like IE11 on a locked-down enterprise intranet), serving older formats just makes every visitor pay a performance tax. If you absolutely must support a legacy browser, add WOFF as a conditional fallback – but don’t ship it to everyone.

The rendering pipeline

When a browser “loads a font,” it isn’t just a straight line from CSS to pixels. Multiple stages (and your CSS choices) dictate how text behaves.

Registration: As the browser parses CSS, each @font-face rule is registered in a font set – essentially a catalogue of families, weights, styles, stretches, and unicode-ranges. At this stage, no files are downloaded.
Style resolution: The cascade runs. Each element ends up with a computed font-family, font-weight, font-style, and font-stretch. The browser compares that against the registered font set to see what could be used.
Font matching: The font-matching algorithm looks for the closest available face. If the requested weight or style doesn’t exist, the browser may synthesise it (fake bold/italic) or fall back to a generic serif/sans/monospace.
Glyph coverage: Fonts are only queued for download if the text actually requires glyphs from them. If a unicode-range excludes the characters on the page, the font may never load at all.
Request: Once needed, the font request is queued. If @font-face rules are buried in a late-loading stylesheet, this can happen surprisingly late in the render cycle. Preload or inline to avoid the lag.
Display phase: While waiting for the font to arrive, the browser decides how to handle text – this is where font-display matters:
- No explicit setting (old default): Historically inconsistent. Safari often hid text entirely until the font arrived (FOIT), while Chrome showed fallback text immediately (FOUT). This inconsistency fuelled years of bad hacks.
- font-display: swap; Renders fallback text immediately, then swaps to the webfont when ready (FOUT).
- font-display: block; Hides text for up to ~3s (FOIT), then shows fallback if still not ready.
- font-display: fallback; Very short block (~100ms), then fallback shows. Font swaps later if it arrives.
- font-display: optional; Shows fallback immediately and may never swap if conditions are poor.
Decoding and shaping: Once downloaded, the font is decompressed, parsed, and shaped (OpenType features applied, ligatures resolved, contextual forms chosen). Only then can glyphs be rasterised and painted. On low-end devices, this shaping step can add noticeable delay.

All of this happens under the hood before a single glyph hits the screen. Developers can’t change how a shaping engine works – but they can influence what happens afterwards. The next piece of the puzzle is metrics: how tall, wide, and spaced your text appears, and how to stop those dimensions from shifting when fonts swap in.

Metrics

Fonts don’t just define glyph shapes. They also define metrics:

Ascent, descent, line gap – how tall lines are, where baselines sit.
x‑height – how big lowercase letters appear.
Kerning and ligatures – how characters fit together.

If your fallback system font has different metrics, the page will render one way during FOUT, then “jump” when the custom font loads. That’s not just ugly – it’s measurable layout shift, and it can tank your Core Web Vitals.

We’ll explore how to tinker with these values later in the post.

Synthesised styles

When the browser can’t find the exact weight or style you’ve asked for, it doesn’t just give up. It fakes it:

Fake bolding: If you request font-weight: 600 but only have a 400 (regular), most browsers will thicken the strokes algorithmically. The result often looks clumsy, inconsistent, and can ruin brand typography.
Fake italics: If you request font-style: italic without having a true italic face, the browser simply slants the regular glyphs. It’s a cheap trick, and typographically awful.

That “helpfulness” can make your typography look sloppy, and it can throw off spacing/metrics in subtle ways. The fix:

Only declare weights/styles you actually provide.
Use font-synthesis: none; to prevent browsers from faking bold/italic.
Subset/serve the actual weights you need – and stop at the ones you’ll really use.

One more layer: once fetched, fonts aren’t just “ready.” Browsers must decode, shape, and rasterize them. That means parsing OpenType tables, applying shaping rules (HarfBuzz, CoreText, DirectWrite), and rasterising glyphs to pixels. On low-end devices, this step can take measurable milliseconds. Font choice isn’t just about bytes on the wire – it’s also about CPU cycles at paint time.

Glyph coverage

Finally, fonts aren’t universal. A Latin font may not contain accented characters, Cyrillic glyphs, Arabic ligatures, or emoji. When a glyph is missing, the browser silently switches to a fallback font to cover that code point. The result can be inconsistent rendering, mismatched sizing, or even boxes and question marks.

This is why subsetting matters, why fallback stacks matter, and why understanding coverage is essential.

So: fonts aren’t just “download this file and it works.” They’re complex, heavy, and integral to how the browser paints text. Which is exactly why treating them as decoration – instead of as infrastructure – is such a bad idea.

Performance & strategy fundamentals

If the history explains why fonts are messy, the performance reality explains why they matter. Fonts aren’t just a design choice – they’re part of your critical rendering path, and they can make or break your Core Web Vitals.

File size

Most websites are serving far too much font data. A single “complete” font family can easily be 400–800 KB per style. Add bold, italic, and a few weights, and suddenly you’re shipping multiple megabytes of font data before your content is even legible. That’s more than many sites spend on JavaScript.

And the kicker? Most of those glyphs and weights are never used.

Layout shift

Fonts don’t just block rendering; they actively cause reflows when they arrive.

If your fallback font has different metrics (x‑height, ascent, descent, line-gap), your content will jump when the webfont loads.
That’s measurable Cumulative Layout Shift (CLS), and it directly impacts Core Web Vitals.

The good news: modern CSS gives us the tools to fix all of this.

Modern CSS descriptors (and what they actually do)

font-display – controls what happens while the font is loading.

swap: show fallback immediately, swap to webfont when ready (FOUT). Good default.
fallback: tiny block (~100ms), then fallback; swap later. Safer on poor networks.
optional: show fallback, may never swap. Great for decorative fonts.
block: hide text for a while (≈3s). Looks “clean” on fast, awful on slow. Avoid.

👉 This is your first-paint policy. Choose carefully.

Metrics override descriptors – make fallback and webfont metrics match.

These live inside @font-face. They tell the browser: “scale and align this webfont so it behaves like the fallback you showed first.” That way, when the swap happens, nothing jumps.

size-adjust: scales the webfont so its perceived x‑height matches the fallback.
ascent-override / descent-override: align baselines and descender space.
line-gap-override: controls extra line spacing to keep paragraphs steady.

Example:

@font-face {
  font-family: 'Brand';
  src: url('/fonts/brand.woff2') format('woff2');
  font-display: swap;                 /* first paint policy */
  size-adjust: 102%;                  /* match fallback x-height */
  ascent-override: 92%;               /* align baseline */
  descent-override: 8%;               /* balance descenders */
  line-gap-override: normal;          /* stabilise line height */
}

In practice, you can use tools like Font Style Matcher to calculate the right values. These help you match fallback and custom font metrics precisely and eliminate CLS .

unicode-range – serve only the glyphs a page actually needs.

Declare separate @font-face blocks for each subset (Latin, Latin-Extended, Cyrillic, etc.). The browser only requests the ones it needs.

Example:

@font-face {
  font-family: "Brand";
  src: url("/fonts/brand-latin.woff2") format("woff2");
  unicode-range: U+0000-00FF, U+0131, U+0152-0153;
  font-display: swap;
}

👉 Saves hundreds of kilobytes by not shipping glyphs for scripts you’ll never use.

font-size-adjust – property for elements (not @font-face).

Scales fallback fonts so their x‑height ratio matches the intended font. Prevents fallback text from looking too small or too tall.

Example:

html { font-size-adjust: 0.5; } /* ratio matched to your brand font */

From descriptors to strategy

These CSS descriptors are your scalpel: precise tools for cutting out CLS and wasted payload. But solving font performance isn’t just about fine-tuning metrics; it’s about making the right high-level choices in how you ship, scope, and prioritise fonts in the first place.

Language coverage and subsetting

A huge but often overlooked opportunity is language coverage and subsetting.

Most sites only need Latin or Latin Extended, yet many ship fonts containing Cyrillic, Greek, Arabic, or full CJK sets they’ll never use. That’s hundreds of kilobytes – sometimes megabytes – wasted on every visitor.

Smarter strategy:

Subset fonts with tools like fonttools, Glyphhanger, or Subfont.
Use unicode-range to declare subsets per script.
Build locale-specific bundles (e.g. fonts-en.css, fonts-ar.css) for internationalised sites.

That way, browsers will only download subsets if they’re needed – so a Cyrillic user gets Cyrillic, a Latin user gets Latin, and nobody pays for both.

When not to subset

⚠️ For sites with genuine multilingual needs, especially across non-Latin scripts, stripping glyphs can do more harm than good. Arabic, Hebrew, Thai, and Indic scripts rely on shaping and positioning tables (GSUB/GPOS). We’ll explore this later.

⚠️ And if your site has a lot of user-generated content, be conservative. Users will surprise you with stray Greek, Cyrillic, or emoji. In those cases, lean on broader coverage or robust system fallbacks rather than slicing too aggressively.

Lazy-loading non-critical fonts

Not every font has to be part of the critical rendering path. Headline display fonts, decorative typefaces, and icon sets (e.g., for social media icons that only appear in the footer) often aren’t essential to that first paint. These can be deferred or staged in later, once the core content is visible.

Two reliable approaches:

Use the Font Loading API (document.fonts.load) to request and apply them after the page is stable.
Or set font-display: optional, which tells the browser the fallback is fine – and if the custom font arrives late (or never), the page still works.

This keeps the focus on performance where it matters most: content-first, aesthetics second.

Fonts as progressive enhancement

At the end of the day, fonts should be treated as progressive enhancement. Your site should load quickly, render legibly, and remain usable even if a custom font never arrives. A well-chosen system fallback ensures content-first delivery, while the webfont (if, or when, it loads) adds polish and brand identity.

Typography matters, but it should never get in the way of reading, speed, or stability.

Variable fonts: promise vs reality

If subsetting and smart loading are the practical fixes, variable fonts are the seductive promise. One font file, infinite possibilities. The idea is compelling: instead of shipping a dozen separate files for regular, bold, italic, condensed, and wide, you just ship one variable font that can flex along those axes.

And in theory, that means less to download, finer design control, and a more responsive, fluid typographic system.

The promise

Consolidation: collapse dozens of static files into a single resource.
Precision: use exact weights (512, 537…) instead of stepping through 400/500/600.
Responsiveness: unlock width and optical size axes that adjust seamlessly across breakpoints.
Consistency: fewer moving parts, cleaner CSS, and potentially smaller payloads.

The reality

Variable fonts are brilliant – but not a magic bullet.

File size creep: if you only need two weights, a variable file may actually be larger than two well-subset static fonts.
Browser support quirks: weight interpolation is universal, but some axes (optical sizing, italic, grade) are patchy across browsers.
Double-loading traps: many teams ship a variable font and static files “just in case,” which cancels out the benefits.
Licensing headaches: some foundries sell or license variable fonts separately, or prohibit modifications like subsetting.
Support quirks: core axes like weight, width, slant, and optical size are now universally supported. But custom axes (like grade) still require font-variation-settings and may not have CSS shorthands. So don’t assume every axis is ergonomic across browsers.

Performance strategy

Treat variable fonts like any other asset: audit, measure, and subset.

Pick your axes carefully: do you really need width, optical size, or italics?
Subset by script just as you would with static fonts; don’t ship the whole world.
Benchmark payloads: check whether one variable file actually saves over two or three statics.

Design strategy

When used deliberately, variable fonts unlock design latitude you simply can’t get otherwise.

Responsive typography: scale weight or width subtly as the viewport changes.
Optical sizing: automatically adjust letterforms for legibility at small vs large sizes.
Brand expression: interpolate between styles for more personality than a static set.

But use restraint. Animating font-variation-settings may look slick in demos, but it often janks in practice.

Example: using variable font axes in CSS

/* Load a variable font */
@font-face {
  font-family: "Acme Variable";
  src: url("/fonts/acme-variable.woff2") format("woff2-variations");
  font-weight: 100 900;        /* declares supported weight range */
  font-stretch: 75% 125%;      /* declares supported width range */
  font-style: normal italic;   /* declares upright and slanted */
  font-display: swap;
}

/* Use weight and width as normal */
h1 {
  font-family: "Acme Variable", system-ui, sans-serif;
  font-weight: 700;    /* resolves within the declared 100–900 range */
  font-stretch: 110%;  /* slightly wider */
}

/* Non-standard axes via font-variation-settings */
.hero-text {
  font-family: "Acme Variable", system-ui, sans-serif;
  font-variation-settings: "wght" 500, "wdth" 120, "slnt" -5;
}

/* Responsive fluid typography: adjust weight with viewport size */
h2 {
  font-family: "Acme Variable", system-ui, sans-serif;
  font-weight: clamp(400, 2vw + 300, 700);
}

👉 This shows both the “semantic” way (with font-weight, font-stretch, font-style) and the raw font-variation-settings way for full control.

Best practice

Start with a needs audit: what weights, styles, and scripts do you actually use?
If variable fonts win on size and coverage, great – deploy them with subsetting and unicode-range.
If two or three statics are leaner and simpler, stick with them.

Variable fonts are a tool, not a default. The key is to be deliberate: weigh the trade-offs, and implement them with the same discipline you’d apply to any other part of your performance budget.

System stacks and CDNs

Not every project needs custom fonts. In fact, one of the most powerful performance wins is simply not loading them at all.

System font stacks

System fonts – the ones already bundled with the OS – are free, instant, and familiar. The trick is in choosing a system stack that feels cohesive across platforms. A typical modern stack looks like this:

body {
  font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
               Helvetica, Arial, sans-serif, "Apple Color Emoji",
               "Segoe UI Emoji", "Segoe UI Symbol";
}

This cascades through macOS, iOS, Windows, Android, Linux, and falls back cleanly to web-safe sans-serifs. For body text, navigation, and utilitarian UI elements, system stacks are hard to beat.

They’re also excellent fallbacks: even if you do load custom fonts, designing around the system stack first guarantees legibility and resilience.

It’s also worth noting that system fonts almost always handle emoji better – lighter weight, more coverage, and more consistent rendering than trying to ship emoji glyphs in a webfont. We’ll explore emojis this in more detail later.

CDNs and third-party hosting

For years, Google Fonts was the default solution: paste a <link> into your <head> and you were done. But today that’s a bad trade-off.

Privacy: loading fonts from Google Fonts leaks visitor data to Google. Regulators (especially in Europe) have judged this a GDPR violation.
Performance: third-party CDNs add latency, DNS lookups, and potential blocking. In most cases, self-hosting is faster and more reliable.
Caching myths: the old argument that “Google Fonts are already cached” simply isn’t true anymore. Modern browsers partition caches per site for privacy. A font fetched on site A won’t be reused on site B. In practice, each site (and the user) pays the cost independently.

Best practice is simple: self-host your fonts. Download them from the foundry (or even from Google Fonts), serve them from your own domain, and control headers, preloading, and caching yourself.

But even if you self-host and optimise your fonts, what users see first isn’t your brand font – it’s the fallback. That’s where the real user experience lives.

Fallbacks and matching

Loading fonts isn’t just about the primary choice – it’s also about how gracefully the design holds up before and if the custom font arrives. That’s where fallbacks matter.

Designing with fallbacks in mind

A fallback font isn’t just an emergency plan – it’s a baseline your visitors might actually see, even if only for a few milliseconds. That makes it worth designing for. A good fallback:

Matches the x‑height and letter width of your primary font closely enough that layout shifts are minimal.
Feels stylistically compatible: if your brand font is a geometric sans, pick a system sans, not Times New Roman.
Includes emoji, symbols, and ligatures that your primary font may lack.

Tuning fallbacks with modern CSS

We covered font-size-adjust and font-optical-sizing in Section 4, but the key is their application here: you can actually tune your fallback stack to minimise visible shifts.

For example, if your fallback font has a smaller x‑height, you can bump its size slightly using font-size-adjust so that text aligns more closely with your custom font when it swaps in.

body {
  font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
  font-size-adjust: 0.52; /* Match x-height ratio of custom font */
}

This avoids the infamous “jump” when the real font finishes loading.

Matching custom and fallback fonts

The end goal isn’t perfection, it’s stability. You won’t get Helvetica Neue to perfectly mirror Segoe UI, but you can:

Choose fallbacks with similar proportions.
Adjust size/line-height to reduce reflow.
Use variable font axes (when available) to more closely approximate your fallback’s look at initial render.

The better your fallback, the less anyone notices when your custom font finally kicks in.

And remember: rendering engines differ. Windows ClearType, macOS CoreText, and Linux FreeType all anti-alias and hint fonts differently. Chasing pixel-perfect consistency across platforms is a lost cause; stability and legibility matter more than identical rendering.

Preloading and loading strategies

Even with the right fonts, subsets, and fallbacks, the delivery strategy can make or break the user experience. A beautiful font served late is still a broken experience.

The alphabet soup of loading outcomes

Most developers have at least heard of FOIT and FOUT, but rarely think about how deliberate choices (or lack thereof) cause them.

FOIT (Flash of Invisible Text): text is hidden until the custom font loads. Looks sleek when it works fast, looks catastrophic on slow networks.
FOUT (Flash of Unstyled Text): fallback text renders first, then switches when the custom font arrives. Stable, but potentially jarring.
FOFT (Flash of Faux Text): a messy hybrid where a browser synthesises weight/italic, then swaps to the real cut. Distracting and ugly.

The browser’s defaults – and your CSS – determine which outcome users see.

font-display

The font-display descriptor is the blunt instrument for influencing this:

swap: show fallback immediately, swap when ready (the safe modern default).
block: hide text (FOIT) for up to 3s, then fallback. Dangerous.
fallback: like swap, but gives the real font less time to load.
optional: load only if the font is already fast/cached. Good for non-critical assets.

Most sites should default to swap. Don’t leave it undefined.

Preloading fonts

Preload is the sharp tool. Adding:

<link rel="preload" as="font" type="font/woff2" crossorigin
      href="/fonts/brand-regular.woff2">

…tells the browser to fetch the font immediately, rather than waiting until it encounters the @font-face rule in CSS. This is especially valuable if you inline your @font-face declarations in the <head> (as you should) – otherwise fonts often load after layout and render have already begun.

⚠️ Be selective when preloading: if you’ve split fonts into subsets with unicode-range, only preload the subset you know is needed for initial content. Preloading every subset defeats the purpose by forcing them all to download, even if not used.

Preload gotchas (worth your time)

Match everything: your preload URL must exactly match the @font-face src (path, querystring), and the response must include the right CORS header (Access-Control-Allow-Origin), and your <link> must carry crossorigin. If any of those disagree, the preload won’t be reused by the actual font load. <link rel="preload" as="font" type="font/woff2" href="/fonts/brand-regular.woff2" crossorigin>
Use the right as/type: as="font" and a correct MIME hint (type="font/woff2") influence prioritization and help browsers coalesce requests. Wrong/missing values can cause the preload to be ignored.
Don’t preload everything: if you’ve split by unicode-range (e.g., Latin, Cyrillic), preload only the subset you’ll actually paint above the fold. Preloading every subset forces downloads and defeats subsetting.
“Preload used late” warnings: browsers will warn if a preloaded resource isn’t used shortly after navigation. That’s usually a smell (wrong URL, late‑discovered @font-face, or you preloaded a non‑critical face).
Service Worker synergy: if you run a SW, pre‑cache WOFF2 at install. First‑hit uses preload; subsequent hits come from SW in ~0ms.

Inline vs buried @font-face

This is an easy win that almost nobody takes. If your @font-face lives in an external CSS file, the browser won’t even discover the font until that file is downloaded, parsed, and executed. Inline it in the <head> and preload the asset, and you’ve cut an entire round trip out of the waterfall.

But – there are caveats.

If you already ship a single, render-blocking stylesheet early: inlining doesn’t buy you much. The browser was going to see those @font-face rules quickly anyway, and it still won’t request the font until the text that needs it is on screen. The browser will also wait until that render-blocking CSS is executed in case it overrides the font. In that setup, preload is what really makes the difference.
If your CSS arrives late or piecemeal – critical CSS inline, async or route-level styles, CSS-in-JS, @import, SPA hydration – then inlining can be genuinely useful. It ensures fonts are discovered immediately, not halfway through page render. In those cases, it’s an under-used safeguard.

So: inlining plus preload can be a neat win, especially on modern, fragmented architectures. But if it makes a dramatic difference to your site, that’s also a signal that your CSS delivery strategy might need fixing.

Early Hints (HTTP 103)

Even preloading has limits – the browser still has to parse enough of the HTML to see your <link rel="preload"> (or, wait for the HTTP response to get the equivalent header). If your server or network is slow, that might take quite some time.

With Early Hints (HTTP status 103), the server can tell the browser immediately which critical assets to start fetching – before the main HTML response is delivered.

That means your fonts can be on the wire during the first round trip, rather than waiting for HTML parsing.

HTTP/1.1 103 Early Hints
Link: </fonts/brand-regular.woff2>; rel=preload; as=font; type="font/woff2"; crossorigin

Things to bear in mind:

Coalesce with HTML Link preloads: it’s fine to hint the same font in 103 and again in the final 200 via a Link header/HTML tag (as modern browsers dedupe). Don’t rely on intermediaries though; some proxies still drop 103s. Keep the HTML/200 fallback preload.
Manage CORS in the hint: include crossorigin in the 103 Link, so that he early request is eligible for reuse by the @font-face.
Be choosy: only hint critical above‑the‑fold faces/weights. Over‑hinting competes with HTML/CSS and can slow TTFB in practice.

Support is growing across servers, CDNs, and browsers. If you’re already preloading fonts, adding Early Hints is a straightforward way to shave another few hundred milliseconds off time-to-text.

⚠️ Don’t go wild: only hint fonts you know are needed above the fold. Over-hinting can waste bandwidth and compete with more critical assets.

Don’t use `@import`

One of the worst mistakes is loading fonts (or CSS that declares them) via @import. Every @import is another round trip: the browser fetches the parent CSS, parses it, then discovers it needs another CSS file, then discovers the @font-face… and only then requests the font.

That means your text can’t render until the slowest possible path has played out.

Best practice is simple: never use @import for fonts. Always declare @font-face in a stylesheet the browser sees as early as possible, ideally inlined in the <head> with a preload.

Strategic trade-offs

Critical fonts (body, navigation): preload + font-display: swap.
Secondary fonts (headlines, accents): preload only if they’re above the fold, otherwise lazy-load.
Decorative fonts: consider optional or defer entirely.

Loading strategy isn’t about dogma (“always swap” vs “always block”) – it’s about choosing the least worst compromise for your audience. The difference between text that renders instantly and text that lags behind is the difference between a user staying or bouncing.

The font loading API

For fine-grained control, the Font Loading API gives you promises to detect when fonts are ready and orchestrate swaps. But in practice, it’s rarely necessary unless you’re building a highly dynamic or JS-heavy site, but it’s useful to know it exists.

File formats: WOFF2, WOFF, TTF, and the legacy baggage

The font world is littered with old formats, half-truths, and cargo-cult practices. A lot of sites are still serving fonts like it’s 2012 – shipping multiple redundant formats and bloating their payloads.

WOFF2: the modern default

If you take only one thing away from this section: serve WOFF2, and almost nothing else.

It’s the most efficient web format, compressing smaller than WOFF or TTF.
It’s universally supported in all modern browsers.
It can contain full OpenType tables, variations, and modern features.

For the vast majority of projects, WOFF2 is all you need. Unless you have an explicit business case for IE11 or very old Android builds, there’s no reason to ship anything else. Legacy compatibility can’t justify making every visitor pay a performance tax.

One caveat: some CDNs try to Brotli-compress WOFF2 files again, even though they’re already Brotli-encoded. That wastes CPU cycles for no gain. Make sure your pipeline serves WOFF2 as-is.

WOFF: the fallback you probably don’t need

WOFF was designed as a web-optimised wrapper around TTF/OTF. Today it’s only relevant if you absolutely must support a very old browser (think IE11 in corporate intranets). In public web contexts, it’s dead weight.

TTF/OTF: desktop-first relics

TrueType (TTF) and OpenType (OTF) fonts are great for design tools and local installs, but shipping them directly to browsers is wasteful. They’re larger, slower to parse, and in some cases reveal more metadata than you want to serve publicly.

If your build pipeline still spits out .ttf for the web, it’s time to modernise.

SVG fonts: just… no

Once upon a time, SVG-in-font was a hack to get colour glyphs (like emoji) into the browser. That era is gone. Modern emoji and colour fonts use COLR/CPAL or CBDT/CBLC tables inside OpenType/WOFF2. If you see SVG fonts in your stack, delete them with fire.

Base64 embedding

Every so often, someone still tries to inline fonts as base64 blobs in CSS. Don’t. It bloats CSS files, breaks caching, and blocks parallelisation. Fonts are heavy assets that deserve their own requests and their own cache headers.

Do you need multiple formats?

No. Not unless your business case genuinely includes “must support IE11 and Android 4.x, and absolutely cannot live with fallback system fonts” For everyone else:

WOFF2 only
Self-hosted
Preloaded and cached properly

That’s it. And once you serve WOFF2, serve it well: give font files a long cache lifetime (months or a year) and use versioned file names for cache busting when fonts change. Fonts rarely update, so they should almost always come from cache on repeat visits. NB, see my post about caching for more tips here.

Legacy formats are ballast. If you’re still serving them, you’re making every visitor pay the price for browsers that nobody uses anymore.

Icon fonts: Font Awesome and the great mistake

Once upon a time, icon fonts felt clever. Pack a bunch of glyphs into a font file, assign them to letters, and voilà – scalable, CSS-stylable icons. Font Awesome, Ionicons, Bootstrap’s Glyphicon set… they were everywhere.

But it was always a hack. And in 2025, it’s indefensible.

The fundamental problems with icon fonts

Accessibility: Screen readers announce “private use” characters as gibberish, because there’s no semantic meaning.
Fragility: If the font fails to load, users see meaningless squares or fallback letters.
Styling hacks: Matching line-height, alignment, and sizing was always fragile.
Performance: You end up shipping an entire font file (often hundreds of unused icons) just to use a handful.

Better alternatives

Inline SVGs: semantic, flexible, styleable with CSS.
SVG sprites: cacheable, easy to swap or reference by ID.
Icon components (React/Vue/etc): imported on demand, tree-shakeable.
CSS mask-image / -webkit-mask-image: a neat option when you want a vector shape as a pure CSS-driven mask (e.g. colourising icons dynamically).

“But I already use Font Awesome…”

If you’re stuck with an icon font, there are two urgent things you should do:

Subset it so you’re not shipping 700 icons to render 7.
Plan your migration – usually to SVG. Most modern icon sets (including Font Awesome itself) now offer SVG-based alternatives.

The lingering myth

People cling to icon fonts because they “just work everywhere.” That used to be true. But today, SVG has universal support, better semantics, and better tooling.

Icon fonts are like using tables for layout – a clever hack in their day, but a mistake we shouldn’t still be repeating.

Beyond Latin: Non-Latin scripts, RTL languages, and emoji

If icon fonts were a hack born from a lack of glyph coverage, global typography is the opposite problem: too many glyphs, too many scripts, and too much complexity.

It’s easy to optimise fonts if you’re only thinking in English. But the web isn’t just Latin letters, and many of the “best practices” break down once you step into other scripts.

Non-Latin scripts

Arabic, Devanagari, Thai, and many others are far more complex than Latin. They rely on shaping engines, ligatures, and contextual forms. Subsetting recklessly can break whole words, turning live text into nonsense.

Don’t subset blindly. Many scripts need entire blocks intact to render correctly.
Test across OSes. Some scripts have wildly different default fallback behaviour depending on platform.
Expect heavier fonts. A full-featured CJK font can easily be 5–10MB before optimisation. In those cases, variable fonts or progressive loading are even more critical.

RTL languages

Right-to-left scripts like Arabic and Hebrew aren’t just flipped text. They come with:

Different punctuation and digit shaping.
Directional controls (bidi) that interact with your markup and CSS.
Font metrics that can differ significantly from Latin-based fallbacks.

Your fallback stack needs to understand RTL – not just render mirrored Latin glyphs. Always test with real RTL content.

Emoji

Emoji are a special case. Nobody should be shipping emoji glyphs in a webfont. They’re heavy, inconsistent, and outdated as soon as the Unicode consortium adds new ones.

Best practice is simple:

Use the system’s native emoji font (Apple Color Emoji, Segoe UI Emoji, Noto Color Emoji, etc).
Include them in your system stack, usually after your primary fonts: font-family: "YourBrandFont", -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Noto Color Emoji", "Apple Color Emoji", sans-serif;
Accept that emoji will look different on different platforms. That’s the web.

Designing for global text

If your brand works internationally, test with:

Mixed scripts (English + Arabic, or Chinese + emoji).
Platform differences (Android vs iOS vs Windows).
Fallback handling when your chosen font doesn’t cover the script.

Global typography isn’t just about coverage – it’s about resilience. Your font strategy should assume diversity, not break under it.

The future of webfonts: evolving standards and modern risks

We’ve covered the history and the present. But the font story isn’t finished – new CSS specs, new browser behaviours, and new app architectures are all shaping what “best practice” will look like over the next few years.

Upcoming CSS and font tech

There’s a steady stream of new descriptors and properties landing across specs and browsers:

font-palette: lets you switch or customise colour palettes inside COLR/CPAL colour fonts.
font-synthesis: controls whether the browser is allowed to fake bold/italic styles (finally giving you a “no thanks” switch).
size-adjust (expanding on font-size-adjust): more granular tuning of fallback alignment.
Incremental font transfer (IFT): a still-emerging approach where browsers can fetch only the glyphs a page needs, progressively, instead of downloading the full file.

None of these are mainstream defaults yet, but they point towards a more controlled, nuanced future.

Risks in JS-heavy websites and SPAs

New standards are exciting, but real-world implementation often collides with how sites are actually built today. And the reality is: the modern web is dominated by JavaScript-heavy, framework-driven applications. That changes the font-loading landscape.

Modern JavaScript frameworks (React, Vue, Angular, Next, etc.) introduce new challenges for font loading:

Fonts triggered late: if your routes/components lazy-load CSS, fonts may only be requested after hydration, leading to jank.
Critical CSS extraction gone wrong: automated tooling sometimes misses @font-face rules, breaking preload chains.
Client-side routing: navigating between views might trigger new font loads that weren’t preloaded up front.
Font Loading API misuse: some SPAs try to orchestrate font loading manually and end up delaying it unnecessarily.

Best practice here is simple: treat fonts as application-critical assets, not just “another stylesheet.” Preload and inline your declarations early, and test your routes for late font requests.

The trend towards variable and colour fonts

Variable fonts are becoming the expectation rather than the exception, and colour/emoji fonts are mainstreaming. That means:

Your font strategy needs to handle richer files and more axes of variation.
Subsetting and loading strategies matter even more as file sizes grow.
Expect to see more sites using a single highly flexible variable font, instead of juggling multiple static weights.

The cultural shift

For years, fonts were treated as decorative – a flourish bolted on at the end of a build. The future demands the opposite: treating typography as infrastructure. Performance budgets, accessibility standards, and internationalisation all hinge on doing fonts properly.

The webfont ecosystem is still maturing. If the last decade was about getting fonts to work at all, the next decade will be about making them efficient, predictable, and global.

But optimism and theory don’t mean much without proof. Fonts need measurement, not just faith – which is where tooling comes in.

Tooling and auditing

The nice thing about tinkering with fonts is that your decisions and performance are measurable. If you want to know whether your setup is efficient (and beautiful – or, at least, on brand), everything is testable. You can use:

DevTools: Simulate “Slow 3G, empty cache” in Chrome/Edge to see whether text is invisible, unstyled, or jumping. Watch the waterfall to confirm when fonts start downloading.
WebPageTest / Lighthouse: Both expose font request timing, blocking resources, and CLS caused by late swaps.
Glyphhanger / Subfont: CLI tools that analyse which glyphs your site actually uses, and generate subsets automatically.
Fonttools (pyftsubset): The Swiss Army knife for professional font subsetting and editing.
CI checks: Set budgets (e.g. no more than 3 weights, no font over 200 KB compressed).
Transfonter for generating optimised font files and CSS.
Font Style Matcher: For configuring fallback font metrics to match your custom font.

The golden rule: if you’ve never tested your fonts under a cold-cache, slow-network condition, you don’t know how your site actually behaves.

A manifesto for doing fonts properly

Webfonts are not decoration. They shape usability, performance, accessibility, and even legality. Yet most of the web still treats them as an afterthought – bolting them on late, bloating them with legacy baggage, and breaking the user experience for something as basic as text.

It doesn’t have to be this way. Handling fonts properly is straightforward once you treat them with the same seriousness you treat JavaScript, caching, or analytics.

The principles:

System-first: Start with a robust system stack. Custom fonts are progressive enhancement, not a crutch.
Subset aggressively (but intelligently): Ship only what users need, and test in the languages you support.
Preload and inline: Don’t bury critical @font-face rules or delay requests.
WOFF2 only (in 99% of cases): Drop the ballast of legacy formats.
SVG for icons: Leave icon fonts in the past where they belong.
Variable fonts when they add value: One flexible file beats a family of static weights.
Design your fallbacks: Tune metrics so your system stack doesn’t break the layout.
Respect global scripts: Optimise differently for Arabic, CJK, RTL, and emoji.
Test like it matters: Different devices, different networks, different locales.

This isn’t about chasing purity or obscure micro-optimisations. It’s about building a web that renders fast, looks good, and works everywhere.

Fonts are content. Fonts are brand. Fonts are user experience. If you’re still treating them as “just another asset,” you’re loading them wrong.

The post You’re loading fonts wrong (and it’s crippling your performance) appeared first on Jono Alderson.

Normal görünüm

Table of contents

The business case for caching

Speed

Resilience

Cost

SEO

Real-world Scenarios

Side note on the philosophy of caching

Mental model: who caches what?

Browsers

Proxies

Side note on transparent ISP proxies

Shared caches

Reverse proxies

Application and database caches

Cache keys and variants

Side note on No-Vary-Search

Freshness vs validation

Core HTTP caching headers

The Date header

The Cache-Control (response) header

The Cache-Control (request) header

The Expires header

The Pragma header

The Age header

Side note on Age

Validator headers: ETag and Last-Modified

Side note on strong vs weak ETags

Side note on ETags vs Cache-Control headers

The Vary header

Observability helpers

Freshness & age calculations

Freshness lifetime

Example 1: max-age

Example 2: Expires

Example 3: Heuristic

Current age

Example 4: Simple case

Example 5: With Age header

Decision tree

Example 6: stale-while-revalidate

Example 7: stale-if-error

Why this matters

Common misconceptions & gotchas

no-cache ≠ “don’t cache”

no-store means nothing is kept

max-age=0 vs must-revalidate

s-maxage vs max-age

immutable misuse

Redirect and error caching

Clock skew and heuristic surprises

Cache fragmentation: devices & geography

Patterns & recipes

Static assets (JS, CSS, fonts)

HTML documents

Profile A: High-change (news/homepages):

Profile B – Low-change (blogs/docs):

Logged-in / personalised pages:

Side note on long HTML TTLs are safe with event-driven purge

APIs

Side note on why APIs use short s-maxage + stale-while-revalidate

Authenticated dashboards & user-specific pages

Side note on the omission of max-age

Side note on security

Images & media

Side note on client hints

Beyond headers: browser behaviours

Back/forward cache (BFCache)

Side note on BFCache vs HTTP Cache

Hard refresh vs soft reload

Speculation rules & link relations

Signed exchanges (SXG)

CDNs in practice: Cloudflare

Defaults and HTML Caching

Side note on Cloudflare’s APO addon

Edge vs browser lifetimes

Cache keys and fragmentation

Site note on normalising cache keys

Device and geography splits

Side note on `No-Vary-Search`

The `Date` header

The `Cache-Control` (response) header

The `Cache-Control` (request) header

The `Expires` header

The `Pragma` header

The `Age` header

Side note on `Age`

Validator headers: `ETag` and `Last-Modified`

Side note on ETags vs `Cache-Control` headers

The `Vary` header

Example 1: `max-age`

Example 2: `Expires`

Example 5: With `Age` header

Example 6: `stale-while-revalidate`

`no-cache` ≠ “don’t cache”

`no-store` means nothing is kept

`max-age=0` vs `must-revalidate`

`s-maxage` vs `max-age`

`immutable` misuse

Side note on why APIs use short `s-maxage` + `stale-while-revalidate`

Side note on the omission of `max-age`

Don’t use `@import`