We use cookies to keep the site working, understand how it’s used, and measure our marketing. You can accept everything, reject non-essentials, or pick what’s on.
Note: Right-click the Table of Contents and select “Update Field” to refresh page numbers after opening in Word.
Introduction: When Performance Becomes a Bottleneck
Imagine a documentation platform serving 800 MDX pages to developers worldwide. Every time a user requests a page, the headless CMS returns a single JSON response containing the entire content tree—navigation, footer, related articles, global callouts, and the page body itself. The total payload? Twelve megabytes. On a standard 4G connection, Time to First Byte (TTFB) alone exceeds four seconds, and Largest Contentful Paint (LCP) averages 6.8 seconds. For a SaaS developer tools company, where documentation is a primary conversion channel, these numbers represent a critical competitive disadvantage.
This article examines a real-world optimization effort that reduced LCP from 6.8 seconds to 1.1 seconds across all 800 pages without modifying the underlying CMS API. The solution leveraged Incremental Static Regeneration (ISR) with a 300-second revalidation window, a strategic payload splitting architecture, React Server Components for zero client-side JavaScript, Vercel Edge caching, and a webhook-based on-demand revalidation system. The result was not merely faster page loads—it was an 89% reduction in bandwidth costs, a 97.4% edge cache hit rate, and author publish-to-live latency under eight seconds.
The case study serves as a comprehensive reference for teams facing similar headless CMS performance challenges, offering both architectural patterns and practical implementation details grounded in measurable Core Web Vitals improvements.
Background: Core Web Vitals and the Headless CMS Landscape
Core Web Vitals Explained
Google’s Core Web Vitals are a set of performance metrics that measure real-world user experience. Introduced in 2020 and integrated into Google’s search ranking algorithm, these metrics have become foundational to web performance strategy. The three primary metrics are Largest Contentful Paint (LCP), which measures loading performance; Interaction to Next Paint (INP), which measures interactivity; and Cumulative Layout Shift (CLS), which measures visual stability. Each metric has specific thresholds: LCP should occur within 2.5 seconds, INP should remain under 200 milliseconds, and CLS should stay below 0.1.
LCP is particularly critical for content-heavy sites like documentation platforms. It measures the render time of the largest image, text block, or video visible within the viewport. For pages dominated by text content—such as MDX-rendered documentation—the LCP element is typically the primary heading or a large text block. When a massive CMS payload delays the server response, LCP suffers directly because the browser cannot begin rendering content until the HTML (or the data required to generate it) arrives.
LCP Optimization Fundamentals
Google’s guidance on LCP optimization identifies four primary categories: server response time, render-blocking resources, resource load times, and client-side rendering. For server-rendered applications, the most impactful factor is often server response time—the time between the browser’s request and the first byte of the HTML response. According to web.dev documentation, server response time should be kept under 800 milliseconds for a good LCP score. In the case study examined here, TTFB exceeded 4,000 milliseconds on standard connections, placing it well outside the acceptable range.
LCP optimization for content-heavy sites involves minimizing the critical rendering path: reducing the amount of data the server must fetch and process before it can begin streaming a response, preloading critical resources, and eliminating render-blocking JavaScript. Each additional kilobyte of data the server must process at request time adds to TTFB, which directly inflates LCP.
The Headless CMS Payload Problem
Headless CMS architectures decouple content management from front-end presentation, enabling teams to deliver content across multiple channels from a single source. However, this flexibility introduces a common anti-pattern: the monolithic payload. Many headless CMS implementations return the entire content tree for a given page in a single API response. While this simplifies client-side data fetching, it creates severe performance problems at scale.
In the case study platform, the CMS returned approximately 12 MB of JSON for every page request. This payload included the navigation hierarchy, footer content, related articles, global callouts, SEO metadata, and the page body. For 800 pages, this meant 800 potential unique 12 MB responses. The sheer volume of data meant that even with a well-optimized front-end, the server could not begin rendering until all 12 MB had been fetched and parsed. The result was TTFB exceeding 4 seconds and LCP averaging 6.8 seconds—well above Google’s 2.5-second threshold.
The Solution: Architecture and Implementation
ISR Strategy: Moving CMS Fetches to the Background
Incremental Static Regeneration (ISR) is a Next.js feature that combines the performance benefits of static generation with the content freshness of server-side rendering. When a page is configured with a revalidation interval—in this case, revalidate: 300 (5 minutes)—Next.js serves the previously generated static page from cache and triggers a background regeneration when the interval expires. If the background regeneration fails, the stale page continues to be served, ensuring that users never encounter an error due to a failed fetch.
In this architecture, ISR fundamentally changed the timing of CMS data fetching. Instead of fetching the 12 MB payload at request time—which blocked the entire rendering pipeline—ISR moved the fetch to a background process that runs asynchronously after the cached page has been served. The first request after deployment triggers the initial static generation, which is then cached. Subsequent requests receive the pre-rendered HTML almost instantaneously, while the background regeneration ensures content stays fresh within a 5-minute window.
The revalidation interval of 300 seconds was chosen deliberately. Documentation content for a developer tools platform changes infrequently—typically when features are updated or bugs are documented. A 5-minute window balances freshness (content is never stale for more than 5 minutes) with performance (most requests hit the edge cache). For content that requires more immediate updates, the architecture employs on-demand revalidation via webhooks, discussed in detail later in this article.
Payload Splitting Architecture
The most impactful architectural change was splitting the monolithic 12 MB CMS response into two distinct fetch paths: global content and per-page content. Global content—navigation, footer, site-wide callouts, and shared metadata—was fetched once at build time and embedded directly into the Next.js layout. Since this content is identical across all 800 pages, there was no reason to fetch it repeatedly for each route. By moving global content to the build step, it became part of the static layout and was served from the edge cache without any additional CMS round-trips.
Per-page content—the MDX document body, page-specific metadata, and related articles—was fetched independently per route. After the split, individual page payloads ranged from 180 KB to 400 KB, representing a 97% reduction from the original 12 MB monolith. This reduction meant that ISR background regeneration could complete in seconds rather than minutes, and the resulting static HTML was significantly smaller and faster to serve from the edge cache.
Table 1: Architecture Comparison — Before and After Payload Splitting
Concern
Before (12 MB Monolith)
After (Split Payload)
Global nav / footer
Fetched every request with page content
Related articles sidebar
Included in 12 MB page response
Global callouts / banners
Serialized in every page payload
Page-specific MDX
Nested in monolith JSON tree
Total payload per page
~12 MB
TTFB on 4G connection
> 4 seconds
The splitting strategy followed a clear separation of concerns. The layout component in Next.js handles global content, which remains stable and is regenerated only during full site builds. The per-route page components handle dynamic content, which is regenerated by ISR at the configured interval. This separation not only improved performance but also simplified the content model—authors could update documentation pages without triggering a full site rebuild.
React Server Components: Zero Client JavaScript for Content
React Server Components (RSC), introduced in React 18 and fully supported in Next.js 15, allow components to render entirely on the server. Unlike traditional server-side rendering, RSCs produce a serialized component tree that is sent to the client without the associated JavaScript bundle. For content-heavy pages—documentation articles, blog posts, product pages—this means the entire content rendering pipeline executes on the server, and the client receives only the final HTML output.
In this case study, all MDX rendering was handled by React Server Components. The MDX parser, the remark/rehype plugin pipeline, and the component tree for documentation elements (code blocks, tables, callouts, headings) all executed on the server. The client received zero JavaScript for content rendering—no MDX parser, no component hydration, no runtime content processing. This had a direct and measurable impact on both INP and CLS: without client-side JavaScript to parse and execute, the browser could paint the content immediately upon receiving the HTML, eliminating render-blocking delays.
The choice of RSCs also eliminated an entire class of performance bugs. Traditional client-side MDX rendering requires the browser to download the MDX parser, parse the markdown/MDX content, transform it through the plugin pipeline, and then render the resulting React component tree. Each of these steps introduces potential delays and layout shifts. By moving this entire pipeline to the server, the client receives pre-rendered HTML that the browser can paint immediately, without any JavaScript execution.
Vercel Edge Caching: Achieving 97.4% Cache Hit Rate
The Vercel Edge Network distributes content across a global network of edge servers, caching static assets and pre-rendered pages close to users. When ISR generates a static page, that page is automatically propagated to the edge network, where it is served from the nearest node to the requesting user. This eliminates the latency of origin server round-trips for cached content.
In this architecture, the combination of ISR-generated static pages and the Vercel Edge Network produced a 97.4% cache hit rate. This means that 97.4% of all page requests were served directly from the edge cache without contacting the origin server or the CMS. The remaining 2.6% of requests—primarily the first request after a deployment or a cache purge—triggered ISR regeneration and were then cached for subsequent requests.
The bandwidth cost implications were significant. Before the optimization, every page request required a 12 MB transfer from the CMS and a full server-side render, consuming substantial bandwidth on the Vercel platform. After the optimization, most requests were served as static HTML from the edge cache, with payloads typically under 100 KB of compressed HTML. This 89% reduction in bandwidth costs translated directly to lower infrastructure spending on the Vercel platform.
While ISR with a 300-second revalidation window ensures content freshness within 5 minutes, some content updates require near-immediate publication—hotfixes, security advisories, or time-sensitive announcements. To address this, the architecture implemented an on-demand revalidation system using Next.js revalidatePath and a CMS webhook.
The workflow operates as follows. When an author publishes or updates a page in the CMS, the CMS sends a webhook payload to a dedicated Next.js API route at /api/revalidate. This route receives the slug of the updated page, validates the webhook signature, and calls revalidatePath with the corresponding route. Next.js immediately purges the cached version of that page from the edge network and triggers a fresh static generation. The entire cycle—from author clicking “Publish” to the updated page being live on the edge—completes in under 8 seconds.
This on-demand revalidation coexists seamlessly with time-based ISR. The time-based revalidation handles routine content freshness at a predictable interval, while the webhook handles exceptional cases requiring immediate updates. The two mechanisms are complementary: time-based ISR ensures no content ever becomes severely stale (even if the webhook fails), and on-demand revalidation enables instant updates when they matter most.
The webhook implementation also included security measures to prevent unauthorized cache purging. The API route validates a shared secret included in the webhook payload, rejects requests without a valid signature, and rate-limits revalidation calls to prevent abuse. These safeguards ensure that the revalidation system remains both fast and secure.
Performance Metrics Analysis
LCP: From 6.8 Seconds to 1.1 Seconds
The most dramatic improvement was in Largest Contentful Paint, which dropped from 6.8 seconds to 1.1 seconds—an 83.8% improvement. This improvement is attributable to three factors working in concert. First, ISR eliminated the request-time CMS fetch that was responsible for the 4-second TTFB. Second, the split payload meant that even when background regeneration was required, the server only needed to fetch 180–400 KB instead of 12 MB, reducing regeneration time from minutes to seconds. Third, React Server Components ensured that the LCP element (the primary documentation heading) was pre-rendered in the HTML and could be painted immediately by the browser without waiting for JavaScript execution.
The 1.1-second LCP places the platform firmly within Google’s “Good” threshold (under 2.5 seconds), with real-world field data from Chrome UX Report confirming that the 75th percentile LCP consistently remains below 1.5 seconds across all geographic regions. This level of performance is particularly notable given that the platform serves 800 pages of MDX documentation—a content type traditionally associated with heavier rendering costs.
INP: From 480 ms to 68 ms
Interaction to Next Paint dropped from 480 milliseconds to 68 milliseconds—an 85.8% improvement. The primary driver of this improvement was the elimination of client-side MDX rendering JavaScript. Before the optimization, the client-side JavaScript bundle included the MDX parser, remark/rehype plugins, and component hydration code. This JavaScript competed with user interactions for main-thread time, particularly on mobile devices and lower-powered hardware. After migrating to React Server Components, the content rendering JavaScript was eliminated entirely, freeing the main thread for user interactions.
The 68-millisecond INP is well below Google’s 200-millisecond threshold and places the platform in the “Good” category. Real-world testing confirmed that even complex documentation interactions—code block copy buttons, expandable sections, table-of-contents navigation—respond within a single frame (under 16 milliseconds), contributing to a perceived performance improvement that exceeds what the raw metric suggests.
CLS: From 0.31 to 0.02
Cumulative Layout Shift improved from 0.31 to 0.02—a 93.5% reduction. The high CLS score before optimization was primarily caused by client-side hydration: when the browser initially rendered the server-generated HTML, and then the client-side JavaScript hydrated the components, layout shifts occurred as React reconciled the server and client trees. This was particularly pronounced for documentation elements with variable heights—code blocks, tables, and embedded media—where the client-side rendering sometimes produced slightly different dimensions than the server-side rendering.
React Server Components eliminated hydration for content components entirely. Since the content is rendered entirely on the server and the client receives only the final HTML, there is no hydration step to cause layout shifts. The remaining 0.02 CLS is attributable to minor shifts from lazy-loaded images and third-party scripts, which are being addressed in a follow-up optimization phase.
Table 2: Core Web Vitals Before and After Optimization
Metric
Before
After
Improvement
Largest Contentful Paint (LCP)
6.8 s
1.1 s
Interaction to Next Paint (INP)
480 ms
68 ms
Cumulative Layout Shift (CLS)
0.31
0.02
Bandwidth Cost
Baseline
–89%
Publish-to-Live Latency
Minutes
< 8 seconds
Limitations and Considerations
ISR Stale Content Windows
While ISR with a 300-second revalidation window provides an excellent balance of freshness and performance, it does introduce a potential stale content window. In the 5-minute interval between regenerations, users may see content that is up to 5 minutes old. For most documentation use cases, this is acceptable—documentation changes are rarely time-critical. However, for platforms where content freshness is paramount (news sites, real-time dashboards, live event pages), ISR alone may not be sufficient. The webhook-based on-demand revalidation system addresses this concern for high-priority updates, but it requires the CMS to support webhook notifications, which not all headless CMS platforms offer.
Platform Lock-In to Vercel
The architecture described relies on Vercel-specific features: the Edge Network, automatic ISR cache propagation, and revalidatePath for on-demand revalidation. While Next.js is open-source and can be self-hosted, the edge caching performance benefits described in this case study are specific to the Vercel platform. Teams deploying to other platforms (AWS, Cloudflare, Netlify) would need to implement equivalent caching and revalidation strategies, which may require additional infrastructure and configuration.
React Server Components Maturity
React Server Components are a relatively new paradigm, and their ecosystem is still maturing. While Next.js 15 provides excellent RSC support, certain third-party libraries may not yet be optimized for server-side rendering. Components that rely on browser APIs (window, document), client-side state management (React Context with reducers), or complex animation libraries may require explicit client-side boundaries (“use client” directives), which can reintroduce JavaScript payloads. Teams adopting RSCs should audit their component libraries for compatibility and plan for potential migration work.
Payload Splitting Requires CMS API Design
The payload splitting strategy described in this case study requires the CMS API to support granular data fetching—the ability to request specific content types or depth levels without retrieving the entire content tree. Not all headless CMS platforms offer this level of API flexibility. Some platforms may require custom API endpoints, GraphQL query optimization, or middleware layers to achieve the necessary payload separation. Teams evaluating this architecture should assess their CMS’s API capabilities before implementation and factor any required API modifications into their migration timeline.
Conclusion and Implications
This case study demonstrates that dramatic Core Web Vitals improvements are achievable without modifying the underlying CMS API. By strategically applying ISR, payload splitting, React Server Components, and edge caching, the platform achieved an 83.8% improvement in LCP, an 85.8% improvement in INP, and a 93.5% reduction in CLS. The 89% reduction in bandwidth costs further underscores that performance optimization and cost optimization are not opposing goals—in this case, they are the same goal achieved through the same architectural decisions.
The implications extend beyond this specific platform. Any large-scale content site using a headless CMS is potentially susceptible to the monolithic payload anti-pattern identified here. The optimization patterns described—ISR for background content refresh, payload splitting for reduced data transfer, RSC for zero-client-JS rendering, and webhook revalidation for instant updates—are applicable to a wide range of headless CMS implementations and content management architectures.
For teams evaluating similar optimizations, the key takeaway is that performance should be an architectural concern, not an afterthought. The 12 MB monolithic payload was not a bug in the CMS—it was a consequence of an architecture that prioritized developer convenience over user experience. By restructuring the data flow to respect the boundaries between global and page-specific content, between build-time and request-time fetching, and between server-side and client-side rendering, the platform achieved performance levels that rival hand-coded static sites while retaining the content management flexibility of a headless CMS.
As the web continues to evolve—with Core Web Vitals firmly established as ranking signals, React Server Components maturing as a rendering paradigm, and edge computing becoming ubiquitous—the architectural patterns demonstrated in this case study will become table stakes for any content-heavy web application. The teams that invest in these patterns today will be well-positioned to deliver fast, reliable, and cost-effective web experiences at scale.
References
Google. “Web Vitals — Essential Metrics for a Healthy Site.” web.dev / Google Developers, 2024. https://web.dev/vitals/