This guide has been kindly supported by our friends at LogRocket, a service that combines frontend performance monitoring, session replay, and product analytics to help you build better customer experiences. LogRocket tracks key metrics, incl. DOM complete, time to first byte, first input delay, client CPU and memory usage. Get a free trial of LogRocket today.
Back in the day, performance was often a mere afterthought. Often deferred till the very end of the project, it would boil down to minification, concatenation, asset optimization and potentially a few fine adjustments on the server’s config file. Looking back now, things seem to have changed quite significantly.
Performance isn’t just a technical concern: it affects everything from accessibility to usability to search engine optimization, and when baking it into the workflow, design decisions have to be informed by their performance implications. Performance has to be measured, monitored and refined continually, and the growing complexity of the web poses new challenges that make it hard to keep track of metrics, because data will vary significantly depending on the device, browser, protocol, network type and latency (CDNs, ISPs, caches, proxies, firewalls, load balancers and servers all play a role in performance).
So, if we created an overview of all the things we have to keep in mind when improving performance — from the very start of the project until the final release of the website — what would that look like? Below you’ll find a (hopefully unbiased and objective) front-end performance checklist for 2021 — an updated overview of the issues you might need to consider to ensure that your response times are fast, user interaction is smooth and your sites don’t drain user’s bandwidth.
Micro-optimizations are great for keeping performance on track, but it’s critical to have clearly defined targets in mind — measurable goals that would influence any decisions made throughout the process. There are a couple of different models, and the ones discussed below are quite opinionated — just make sure to set your own priorities early on.
Establish a performance culture. In many organizations, front-end developers know exactly what common underlying problems are and what strategies should be used to fix them. However, as long as there is no established endorsement of the performance culture, each decision will turn into a battlefield of departments, breaking up the organization into silos. You need a business stakeholder buy-in, and to get it, you need to establish a case study, or a proof of concept using the Performance API on how speed benefits metrics and Key Performance Indicators (KPIs) they care about.
For example, to make performance more tangible, you could expose the revenue performance impact by showing the correlation between the conversion rate and time to application load, as well as rendering performance. Or the search bot crawling rate (PDF, pages 27–50).
Without a strong alignment between dev/design and business/marketing teams, performance isn’t going to sustain long-term. Study common complaints coming into customer service and sales team, study analytics for high bounce rates and conversion drops. Explore how improving performance can help relieve some of these common problems. Adjust the argument depending on the group of stakeholders you are speaking to.
Run performance experiments and measure outcomes — both on mobile and on desktop (for example, with Google Analytics). It will help you build up a company-tailored case study with real data. Furthermore, using data from case studies and experiments published on WPO Stats will help increase sensitivity for business about why performance matters, and what impact it has on user experience and business metrics. Stating that performance matters alone isn’t enough though — you also need to establish some measurable and trackable goals and observe them over time.
While having these conversations in organizations, it’s important to keep in mind that just like UX is a spectrum of experiences, web performance is a distribution. As Karolina Szczur noted, “expecting a single number to be able to provide a rating to aspire to is a flawed assumption.” Hence performance goals need to be granular, trackable and tangible.
Goal: Be at least 20% faster than your fastest competitor. According to psychological research, if you want users to feel that your website is faster than your competitor’s website, you need to be at least 20% faster. Study your main competitors, collect metrics on how they perform on mobile and desktop and set thresholds that would help you outpace them. To get accurate results and goals though, make sure to first get a thorough picture of your users’ experience by studying your analytics. You can then mimic the 90th percentile’s experience for testing.
To get a good first impression of how your competitors perform, you can use Chrome UX Report (CrUX, a ready-made RUM data set, video introduction by Ilya Grigorik and detailed guide by Rick Viscomi), or Treo, a RUM monitoring tool that is powered by Chrome UX Report. The data is gathered from the Chrome browser users, so the reports will be Chrome-specific, but they will give you a fairly thorough distribution of performance across a wide range of your visitors. Note that new CrUX datasets are released on the second Tuesday of each month.
Note: If you use Page Speed Insights or Page Speed Insights API (no, it isn’t deprecated!), you can get CrUX performance data for specific pages instead of just the aggregates. This data can be much more useful for setting performance targets for assets like “landing page” or “product listing”. And if you are using CI to test the budgets, you need to make sure your tested environment matches CrUX if you used CrUX for setting the target (thanks Patrick Meenan!).
If you need some help to show the reasoning behind prioritization of speed, or you’d like to visualize conversion rate decay or increase in bounce rate with slower performance, or perhaps you’d need to advocate for a RUM solution in your organization, Sergey Chernyshev has built a UX Speed Calculator, an open-source tool that helps you simulate data and visualize it to drive your point across.
Sometimes you might want to go a bit deeper, combining the data coming from CrUX with any other data you already have to work out quickly where the slowdowns, blindspots, and inefficiencies lie — for your competitors, or for your project. In his work, Harry Roberts has been using a Site-Speed Topography Spreadsheet which he uses to break down performance by key page types, and track how different key metrics are across them. You can download the spreadsheet as Google Sheets, Excel, OpenOffice document or CSV.
Collect data, set up a spreadsheet, shave off 20%, and set up your goals (performance budgets) this way. Now you have something measurable to test against. If you’re keeping the budget in mind and trying to ship down just the minimal payload to get a quick time-to-interactive, then you’re on a reasonable path.
Need resources to get started?
Addy Osmani has written a very detailed write-up on how to start performance budgeting, how to quantify the impact of new features and where to start when you are over budget.
In many companies, performance budgets shouldn’t be aspirational, but rather pragmatic, serving as a holding sign to avoid slipping past a certain point. In that case, you could pick your worst data point in the past two weeks as a threshold, and take it from there. Performance Budgets, Pragmatically shows you a strategy to achieve that.
Also, make both performance budget and current performance visible by setting up dashboards with graphs reporting build sizes. There are many tools allowing you to achieve that: SiteSpeed.io dashboard (open source), SpeedCurve and Calibre are just a few of them, and you can find more tools on perf.rocks.
Performance awareness shouldn’t come from performance budgets alone though. Just like Pinterest, you could create a custom eslint rule that disallows importing from files and directories that are known to be dependency-heavy and would bloat the bundle. Set up a listing of “safe” packages that can be shared across the entire team.
Also, think about critical customer tasks that are most beneficial to your business. Study, discuss and define acceptable time thresholds for critical actions and establish “UX ready” user timing marks that the entire organization has approved. In many cases, user journeys will touch on the work of many different departments, so alignment in terms of acceptable timings will help support or prevent performance discussions down the road. Make sure that additional costs of added resources and features are visible and understood.
Align performance efforts with other tech initiatives, ranging from new features of the product being built to refactoring to reaching to new global audiences. So every time a conversation about further development happens, performance is a part of that conversation as well. It’s much easier to reach performance goals when the code base is fresh or is just being refactored.
Once you’ve established a strong performance culture in your organization, aim for being 20% faster than your former self to keep priorities in tact as time passes by (thanks, Guy Podjarny!). But account for the different types and usage behaviors of your customers (which Tobias Baldauf called cadence and cohorts), along with bot traffic and seasonality effects.
Planning, planning, planning. It might be tempting to get into some quick “low-hanging-fruits”-optimizations early on — and it might be a good strategy for quick wins — but it will be very hard to keep performance a priority without planning and setting realistic, company-tailored performance goals.
Choose the right metrics. Not all metrics are equally important. Study what metrics matter most to your application: usually, it will be defined by how fast you can start to render most important pixels of your interface and how quickly you can provide input responsiveness for these rendered pixels. This knowledge will give you the best optimization target for ongoing efforts. In the end, it’s not the load events or server response times that define the experience, but the perception of how snappy the interface feels.
What does it mean? Rather than focusing on full page loading time (via onLoad and DOMContentLoaded timings, for example), prioritize page loading as perceived by your customers. That means focusing on a slightly different set of metrics. In fact, choosing the right metric is a process without obvious winners.
Based on Tim Kadlec’s research and Marcos Iglesias’ notes in his talk, traditional metrics could be grouped into a few sets. Usually, we’ll need all of them to get a complete picture of performance, and in your particular case some of them will be more important than others.
Quantity-based metrics measure the number of requests, weight and a performance score. Good for raising alarms and monitoring changes over time, not so good for understanding user experience.
Milestone metrics use states in the lifetime of the loading process, e.g. Time To First Byte and Time To Interactive. Good for describing the user experience and monitoring, not so good for knowing what happens between the milestones.
Rendering metrics provide an estimate of how fast content renders (e.g. Start Render time, Speed Index). Good for measuring and tweaking rendering performance, but not so good for measuring when important content appears and can be interacted with.
Custom metrics measure a particular, custom event for the user, e.g. Twitter’s Time To First Tweet and Pinterest’s PinnerWaitTime. Good for describing the user experience precisely, not so good for scaling the metrics and comparing with with competitors.
To complete the picture, we’d usually look out for useful metrics among all of these groups. Usually, the most specific and relevant ones are:
Time to Interactive(TTI) The point at which layout has stabilized, key webfonts are visible, and the main thread is available enough to handle user input — basically the time mark when a user can interact with the UI. The key metrics for understanding how much wait a user has to experience to use the site without a lag. Boris Schapira has written a detailed post on how to measure TTI reliably.
Total Blocking Time (TBT) A metric that helps quantify the severity of how non-interactive a page is prior to it becoming reliably interactive (that is, the main thread has been free of any tasks running over 50ms (long tasks) for at least 5s). The metric measures the total amount of time between the first paint and Time to Interactive (TTI) where the main thread was blocked for long enough to prevent input responsiveness. No wonder, then, that a low TBT is a good indicator for good performance. (thanks, Artem, Phil)
Cumulative Layout Shift (CLS) The metric highlights how often users experience unexpected layout shifts (reflows) when accessing the site. It examines unstable elements and their impact on the overall experience. The lower the score, the better.
Speed Index Measures how quickly the page contents are visually populated; the lower the score, the better. The Speed Index score is computed based on the speed of visual progress, but it’s merely a computed value. It’s also sensitive to the viewport size, so you need to define a range of testing configurations that match your target audience. Note that it is becoming less important with LCP becoming a more relevant metric (thanks, Boris, Artem!).
CPU time spent A metric that shows how often and how long the main thread is blocked, working on painting, rendering, scripting and loading. High CPU time is a clear indicator of a janky experience, i.e. when the user experiences a noticeable lag between their action and a response. With WebPageTest, you can select “Capture Dev Tools Timeline” on the “Chrome” tab to expose the breakdown of the main thread as it runs on any device using WebPageTest.
FrustrationIndex While many metrics featured above explain when a particular event happens, Tim Vereecke’s FrustrationIndex looks at the gaps between metrics instead of looking at them individually. It looks at the key milestones perceived by the end-user, such as Title is visible, First content is visible, Visually ready and Page looks ready and calculates a score indicating the level of frustration while loading a page. The bigger the gap, the bigger the chance a user gets frustrated. Potentially a good KPI for user experience. Tim has published a detailed post about FrustrationIndex and how it works.
Ad Weight Impact If your site depends on the revenue generated by advertising, it’s useful to track the weight of ad related code. Paddy Ganti’s script constructs two URLs (one normal and one blocking the ads), prompts the generation of a video comparison via WebPageTest and reports a delta.
Deviation metrics As noted by Wikipedia engineers, data of how much variance exists in your results could inform you how reliable your instruments are, and how much attention you should pay to deviations and outlers. Large variance is an indicator of adjustments needed in the setup. It also helps understand if certain pages are more difficult to measure reliably, e.g. due to third-party scripts causing significant variation. It might also be a good idea to track browser version to understand bumps in performance when a new browser version is rolled out.
Steve Souders has a detailed explanation of many of these metrics. It’s important to notice that while Time-To-Interactive is measured by running automated audits in the so-called lab environment, First Input Delay represents the actual user experience, with actual users experiencing a noticeable lag. In general, it’s probably a good idea to always measure and track both of them.
Note: both FID and TTI do not account for scrolling behavior; scrolling can happen independently since it’s off-main-thread, so for many content consumption sites these metrics might be much less important (thanks, Patrick!).
Measure and optimize the Core Web Vitals. For a long time, performance metrics were quite technical, focusing on the engineering view of how fast servers are at responding, and how quick browsers are at loading. The metrics have changed over the years — attempting to find a way to capture the actual user experience, rather than server timings. In May 2020, Google has announced Core Web Vitals, a set of new user-focused performance metrics, each representing a distinct facet of the user experience.
For each of them, Google recommends a range of acceptable speed goals. At least 75% of all page views should exceed the Good range to pass this assessment. These metrics quickly gained traction, and with Core Web Vitals becoming ranking signals for Google Search in May 2021 (Page Experience ranking algorithm update), many companies have turned their attention to their performance scores.
Let’s break down each of the Core Web Vitals, one by one, along with useful techniques and tooling to optimize your experiences with these metrics in mind. (It’s worth noting that you will end up with better Core Web Vitals scores by following a general advice in this article.)
For a good experience, LCP should occur within 2.5s of when the page first starts loading. That means that we need to render the first visible portion of the page as early as possible. That will require tailored critical CSS for each template, orchestrating the <head>-order and prefetching critical assets (we’ll cover them later).
The main reason for a low LCP score is usually images. To deliver an LCP in <2.5s on Fast 3G — hosted on a well-optimized server, all static without client-side rendering and with an image coming from a dedicated image CDN — means that the maximum theoretical image size is only around 144KB. That’s why responsive images matter, as well as preloading critical images early (with preload).
Quick tip: to discover what is considered LCP on a page, in DevTools you can hover over the LCP badge under “Timings” in the Performance Panel (thanks, Tim Kadlec!).
First Input Delay (FID) < 100ms. Measures the responsiveness of the UI, i.e. how long the browser was busy with other tasks before it could react to a discrete user input event like a tap, or a click. It’s designed to capture delays that result from the main thread being busy, especially during page load.
Quick tip: in general, a reliable strategy to get a better FID score is to minimize the work on the main thread by breaking larger bundles into smaller ones and serving what the user needs when they need it, so user interactions won’t be delayed. We’ll cover more on that in detail below.
Cumulative Layout Shift (CLS) < 0.1. Measures visual stability of the UI to ensure smooth and natural interactions, i.e. the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the lifespan of the page. An individual layout shift occurs any time an element which was already visible changes its position on the page. It’s scored based on the size of the content and distance it moved.
It’s worth noting that Core Web Vitals are supposed to evolve over time, with a predictable annual cycle. For the first year update, we might be expecting First Contentful Paint to be promoted to Core Web Vitals, a reduced FID threshold and better support for single-page applications. We might also see the responding to user inputs after load gaining more weight, along with security, privacy and accessibility (!) considerations.
Related to Core Web Vitals, there are plenty of useful resources and articles that are worth looking into:
Web Vitals Leaderboard allows you to compare your scores against competition on mobile, tablet, desktop, and on 3G and 4G.
Core SERP Vitals, a Chrome extension that shows the Core Web Vitals from CrUX in the Google Search Results.
However, as Katie Sylor-Miller explains, some of the main problems with Core Web Vitals are the lack of cross-browser support, we don’t really measure the full lifecycle of a user’s experience, plus it’s difficult to correlate changes in FID and CLS with business outcomes.
As we should be expecting Core Web Vitals to evolve, it seems only reasonable to always combine Web Vitals with your custom-tailored metrics to get a better understanding of where you stand in terms of performance.
Gather data on a device representative of your audience. To gather accurate data, we need to thoroughly choose devices to test on. In most companies, that means looking into analytics and creating user profiles based on most common device types. Yet often, analytics alone doesn’t provide a complete picture. A significant portion of the target audience might be abandoning the site (and not returning back) just because their experience is too slow, and their devices are unlikely to show up as the most popular devices in analytics for that reason. So, additionally conducting research on common devices in your target group might be a good idea.
A representative device, then, is an Android device that is at least 24 months old, costing $200 or less, running on slow 3G, 400ms RTT and 400kbps transfer, just to be slightly more pessimistic. This might be very different for your company, of course, but that’s a close enough approximation of a majority of customers out there. In fact, it might be a good idea to look into current Amazon Best Sellers for your target market. (Thanks to Tim Kadlec, Henri Helvetica and Alex Russel for the pointers!).
What test devices to choose then? The ones that fit well with the profile outlined above. It’s a good option to choose a slightly older Moto G4/G5 Plus, a mid-range Samsung device (Galaxy A50, S8), a good middle-of-the-road device like a Nexus 5X, Xiaomi Mi A3 or Xiaomi Redmi Note 7 and a slow device like Alcatel 1X or Cubot X19, perhaps in an open device lab. For testing on slower thermal-throttled devices, you could also get a Nexus 4, which costs just around $100.
Also, check the chipsets used in each device and do not over-represent one chipset: a few generations of Snapdragon and Apple as well as low-end Rockchip, Mediatek would be enough (thanks, Patrick!).
If you don’t have a device at hand, emulate mobile experience on desktop by testing on a throttled 3G network (e.g. 300ms RTT, 1.6 Mbps down, 0.8 Mbps up) with a throttled CPU (5× slowdown). Eventually switch over to regular 3G, slow 4G (e.g. 170ms RTT, 9 Mbps down, 9Mbps up), and Wi-Fi. To make the performance impact more visible, you could even introduce 2G Tuesdays or set up a throttled 3G/4G network in your office for faster testing.
Introducing the slowest day of the week. Facebook has introduced 2G Tuesdays to increase visibility and sensitivity of slow connections. (Image source)
Luckily, there are many great options that help you automate the collection of data and measure how your website performs over time according to these metrics. Keep in mind that a good performance picture covers a set of performance metrics, lab data and field data:
Synthetic testing tools collect lab data in a reproducible environment with predefined device and network settings (e.g. Lighthouse, Calibre, WebPageTest) and
Real User Monitoring (RUM) tools evaluate user interactions continuously and collect field data (e.g. SpeedCurve, New Relic — the tools provide synthetic testing, too).
The former is particularly useful during development as it will help you identify, isolate and fix performance issues while working on the product. The latter is useful for long-term maintenance as it will help you understand your performance bottlenecks as they are happening live — when users actually access the site.
Set up “clean” and “customer” profiles for testing. While running tests in passive monitoring tools, it’s a common strategy to turn off anti-virus and background CPU tasks, remove background bandwidth transfers and test with a clean user profile without browser extensions to avoid skewed results (in Firefox, and in Chrome).
However, it’s also a good idea to study which browser extensions your customers use frequently, and test with dedicated “customer” profiles as well. In fact, some extensions might have a profound performance impact (2020 Chrome Extension Performance Report) on your application, and if your users use them a lot, you might want to account for it up front. Hence, “clean” profile results alone are overly optimistic and can be crushed in real-life scenarios.
Share the performance goals with your colleagues. Make sure that performance goals are familiar to every member of your team to avoid misunderstandings down the line. Every decision — be it design, marketing or anything in-between — has performance implications, and distributing responsibility and ownership across the entire team would streamline performance-focused decisions later on. Map design decisions against performance budget and the priorities defined early on.
Setting Realistic Goals
100-millisecond response time, 60 fps. For an interaction to feel smooth, the interface has 100ms to respond to user’s input. Any longer than that, and the user perceives the app as laggy. The RAIL, a user-centered performance model gives you healthy targets: To allow for <100 milliseconds response, the page must yield control back to main thread at latest after every <50 milliseconds. Estimated Input Latency tells us if we are hitting that threshold, and ideally, it should be below 50ms. For high-pressure points like animation, it’s best to do nothing else where you can and the absolute minimum where you can’t.
Also, each frame of animation should be completed in less than 16 milliseconds, thereby achieving 60 frames per second (1 second ÷ 60 = 16.6 milliseconds) — preferably under 10 milliseconds. Because the browser needs time to paint the new frame to the screen, your code should finish executing before hitting the 16.6 milliseconds mark. We’re starting to have conversations about 120fps (e.g. iPad Pro’s screens run at 120Hz) and Surma has covered some rendering performance solutions for 120fps, but that’s probably not a target we’re looking at just yet.
FID < 100ms, LCP < 2.5s, TTI < 5s on 3G, Critical file size budget < 170KB (gzipped). Although it might be very difficult to achieve, a good ultimate goal would be Time to Interactive under 5s, and for repeat visits, aim for under 2s (achievable only with a service worker). Aim for Largest Contentful Paint of under 2.5s and minimize Total Blocking Time and Cumulative Layout Shift. An acceptable First Input Delay is under 100ms–70ms. As mentioned above, we’re considering the baseline being a $200 Android phone (e.g. Moto G4) on a slow 3G network, emulated at 400ms RTT and 400kbps transfer speed.
We have two major constraints that effectively shape a reasonable target for speedy delivery of the content on the web. On the one hand, we have network delivery constraints due to TCP Slow Start. The first 14KB of the HTML — 10 TCP packets, each 1460 bytes, making around 14.25 KB, albeit not to be taken literally — is the most critical payload chunk, and the only part of the budget that can be delivered in the first roundtrip (which is all you get in 1 sec at 400ms RTT due to mobile wake-up times).
If you want to target growing markets such as South East Asia, Africa or India, you’ll have to look into a very different set of constraints. Addy Osmani covers major feature phone constraints, such as few low cost, high-quality devices, unavailability of high-quality networks and expensive mobile data — along with PRPL-30 budget and development guidelines for these environments.
In fact, Google’s Alex Russel recommends to aim for 130–170KB gzipped as a reasonable upper boundary. In real-world scenarios, most products aren’t even close: a median bundle size today is around 452KB, which is up 53.6% compared to early 2015. On a middle-class mobile device, that accounts for 12–20 seconds for Time-To-Interactive.
We could also go beyond the bundle size budget though. For example, we could set performance budgets based on the activities of the browser’s main thread, i.e. paint time before start render, or track down front-end CPU hogs. Tools such as Calibre, SpeedCurve and Bundlesize can help you keep your budgets in check, and can be integrated into your build process.
Finally, a performance budget probably shouldn’t be a fixed value. Depending on the network connection, performance budgets should adapt, but payload on slower connection is much more “expensive”, regardless of how they’re used.
Note: It might sound strange to set such rigid budgets in times of wide-spread HTTP/2, upcoming 5G and HTTP/3, rapidly evolving mobile phones and flourishing SPAs. However, they do sound reasonable when we deal with the unpredictable nature of the network and hardware, including everything from congested networks to slowly developing infrastructure, to data caps, proxy browsers, save-data mode and sneaky roaming charges.
Choose and set up your build tools. Don’t pay too much attention to what’s supposedly coolthese days. Stick to your environment for building, be it Grunt, Gulp, Webpack, Parcel, or a combination of tools. As long as you are getting results you need and you have no issues maintaining your build process, you’re doing just fine.
Among the build tools, Rollup keeps gaining traction, so does Snowpack, but Webpack seems to be the most established one, with literally hundreds of plugins available to optimize the size of your builds. Watch out for the Webpack Roadmap 2021.
One of the most notable strategies that appeared recently is Granular chunking with Webpack in Next.js and Gatsby to minimize duplicate code. By default, modules that aren’t shared in every entry point can be requested for routes that do not use it. This ends up becoming an overhead as more code is downloaded than necessary. With granular chunking in Next.js, we can use a server-side build manifest file to determine which outputted chunks are used by different entry points.
With SplitChunksPlugin, multiple split chunks are created depending on a number of conditions to prevent fetching duplicated code across multiple routes. This improves page load time and caching during navigations. Shipped in Next.js 9.2 and in Gatsby v2.20.7.
Getting started with Webpack can be tough though. So if you want to dive into Webpack, there are some great resources out there:
Use progressive enhancement as a default. Still, after all these years, keeping progressive enhancement as the guiding principle of your front-end architecture and deployment is a safe bet. Design and build the core experience first, and then enhance the experience with advanced features for capable browsers, creating resilient experiences. If your website runs fast on a slow machine with a poor screen in a poor browser on a sub-optimal network, then it will only run faster on a fast machine with a good browser on a decent network.
In fact, with adaptive module serving, we seem to be taking progressive enhancement to another level, serving “lite” core experiences to low-end devices, and enhancing with more sophisticated features for high-end devices. Progressive enhancement isn’t likely to fade away any time soon.
In general, modern frameworks aren’t prioritizing less powerful devices, so the experiences on a phone and on desktop will often be dramatically different in terms of performances. According to research, sites with React or Angular spend more time on the CPU than others (which of course isn’t necessarily to say that React is more expensive on the CPU than Vue.js).
According to Tim, one thing is obvious: “if you’re using a framework to build your site, you’re making a trade-off in terms of initial performance — even in the best of scenarios.”
Once a framework is chosen, we’ll be staying with it for at least a few years, so if we need to use one, we need make sure our choice is informed and well considered — and that goes especially for key performance metrics that we care about.
Of course it is possible to make SPAs fast, but they aren’t fast out of the box, so we need to account for the time and effort required to make and keep them fast. It’s probably going to be easier by choosing a lightweight baseline performance cost early on.
So how do we choose a framework? It’s a good idea to consider at least the total cost on size + initial execution times before choosing an option; lightweight options such as Preact, Inferno, Vue, Svelte, Alpine or Polymer can get the job done just fine. The size of your baseline will define the constraints for your application’s code.
As noted by Seb Markbåge, a good way to measure start-up costs for frameworks is to first render a view, then delete it and then render again as it can tell you how the framework scales. The first render tends to warm up a bunch of lazily compiled code, which a larger tree can benefit from when it scales. The second render is basically an emulation of how code reuse on a page affects the performance characteristics as the page grows in complexity.
You can also rely on data collected on the web over a longer period of time. For example, Perf Track tracks framework performance at scale, showing origin-aggregated Core Web Vitals scores for websites built in Angular, React, Vue, Polymer, Preact, Ember, Svelte and AMP. You can even specify and compare websites built with Gatsby, Next.js or Create React App, as well as websites built with Nuxt.js (Vue) or Sapper (Svelte).
To avoid it, always break up the execution of functions into separate, asynchronous tasks, and where possible use requestIdleCallback. Consider lazy loading parts of the UI using WebPack’s dynamic import() support, avoiding the load, parse, and compile cost until users really need them (thanks Addy!).
Once we reached Interactive, we can then — either on demand or as time allows — boot non-essential parts of the app. Unfortunately, as Paul Lewis noticed, frameworks typically have no simple concept of priority that can be surfaced to developers, and hence progressive booting isn’t easy to implement with most libraries and frameworks.
Still, we are getting there. These days there are a couple of choices we can explore, and Houssein Djirdeh and Jason Miller provide an excellent overview of these options in their talk on Rendering on the Web and Jason’s and Addy’s write-up on Modern Front-End Architectures. The overview below is based on their stellar work.
Full Server-Side Rendering (SSR) In classic SSR, such as WordPress, all requests are handled entirely on the server. The requested content is returned as a finished HTML page and browsers can render it right away. Hence, SSR-apps can’t really make use of the DOM APIs, for example. The gap between First Contentful Paint and Time to Interactive is usually small, and the page can be rendered right away as HTML is being streamed to the browser.
This avoids additional round-trips for data fetching and templating on the client, since it’s handled before the browser gets a response. However, we end up with longer server think time and consequently Time To First Byte and we don’t make use of responsive and rich features of modern applications.
Thus, we can display a landing page quickly and then prefetch a SPA-framework for subsequent pages. Netflix has adopted this approach decreasing loading and Time-to-Interactive by 50%.
Server-Side Rendering With (Re)Hydration (Universal Rendering, SSR + CSR) We can try to use the best of both worlds — the SSR and the CSR approches. With hydration in the mix, the HTML page returned from the server also contains a script that loads a fully-fledged client-side application. Ideally, that achieve a fast First Contentful Paint (like SSR) and then continue rendering with (re)hydration. Unfortunately, that’s rarely the case. More often, the page does look ready but it can’t respond to user’s input, producing rage clicks and abandonments.
With Vue.js, we can use the vue-server-renderer to render a Vue instance into HTML using renderToString. In Angular, we can use @nguniversal to turn client requests into fully server-rendered HTML pages. A fully server-rendered experience can also be achieved out of the box with Next.js (React) or Nuxt.js (Vue).
The approach has its downsides. As a result, we do gain full flexibility of client-side apps while providing faster server-side rendering, but we also end up with a longer gap between First Contentful Paint and Time To Interactive and increased First Input Delay. Rehydration is very expensive, and usually this strategy alone will not be good enough as it heavily delays Time To Interactive.
Streaming Server-Side Rendering With Progressive Hydration (SSR + CSR) To minimize the gap between Time To Interactive and First Contentful Paint, we render multiple requests at once and send down content in chunks as they get generated. So we don’t have to wait for the full string of HTML before sending content to the browser, and hence improve Time To First Byte.
On the client-side, rather than booting the entire application at once, we boot up components progressively. Sections of the applications are first broken down into standalone scripts with code splitting, and then hydrated gradually (in order of our priorities). In fact, we can hydrate critical components first, while the rest could be hydrated later. The role of client-side and server-side rendering can then be defined differently per component. We can then also defer hydration of some components until they come into view, or are needed for user interaction, or when the browser is idle.
Trisomorphic Rendering With service workers in place, we can use streaming server rendering for initial/non-JS navigations, and then have the service worker taking on rendering of HTML for navigations after it has been installed. In that case, service worker prerenders content and enables SPA-style navigations for rendering new views in the same session. Works well when you can share the same templating and routing code between the server, client page, and service worker.
With Next.js, we can use static HTML export by prerendering an app to static HTML. In Gatsby, an open source static site generator that uses React, uses renderToStaticMarkup method instead of renderToString method during builds, with main JS chunk being preloaded and future routes are prefetched, without DOM attributes that aren’t needed for simple static pages.
The result is a better Time To First Byte and First Contentful Paint, and we reduce the gap between Time To Interactive and First Contentful Paint. We can’t use the approach if the content is expected to change much. Plus, all URLs have to be known ahead of time to generate all the pages. So some components might be rendered using prerendering, but if we need something dynamic, we have to rely on the app to fetch the content.
Full Client-Side Rendering (CSR) All logic, rendering and booting are done on the client. The result is usually a huge gap between Time To Interactive and First Contentful Paint. As a result, applications often feel sluggish as the entire app has to be booted on the client to render anything.
So, client-side or server-side? In general, it’s a good idea to limit the use of fully client-side frameworks to pages that absolutely require them. For advanced applications, it’s not a good idea to rely on server-side rendering alone either. Both server-rendering and client-rendering are a disaster if done poorly.
Whether you are leaning towards CSR or SSR, make sure that you are rendering important pixels as soon as possible and minimize the gap between that rendering and Time To Interactive. Consider prerendering if your pages don’t change much, and defer the booting of frameworks if you can. Stream HTML in chunks with server-side rendering, and implement progressive hydration for client-side rendering — and hydrate on visibility, interaction or during idle time to get the best of both worlds.
How much can we serve statically? Whether you’re working on a large application or a small site, it’s worth considering what content could be served statically from a CDN (i.e. JAM Stack), rather than being generated dynamically on the fly. Even if you have thousands of products and hundreds of filters with plenty of personalization options, you might still want to serve your critical landing pages statically, and decouple these pages from the framework of your choice.
There are plenty of static-site generators and the pages they generate are often very fast. The more content we can pre-build ahead of time instead of generating page views on a server or client at request time, the better performance we will achieve.
With JAMStack used on large sites these days, a new performance consideration appeared: the build time. In fact, building out even thousands of pages with every new deploy can take minutes, so it’s promising to see incremental builds in Gatsby which improve build times by 60 times, with an integration into popular CMS solutions like WordPress, Contentful, Drupal, Netlify CMS and others.
Also, Next.js announced ahead-of-time and incremental static generation, which allows us to add new static pages at runtime and update existing pages after they’ve been already built, by re-rendering them in the background as traffic comes in.
Consider using PRPL pattern and app shell architecture. Different frameworks will have different effects on performance and will require different strategies of optimization, so you have to clearly understand all of the nuts and bolts of the framework you’ll be relying on. When building a web app, look into the PRPL pattern and application shell architecture. The idea is quite straightforward: Push the minimal code needed to get interactive for the initial route to render quickly, then use service worker for caching and pre-caching resources and then lazy-load routes that you need, asynchronously.
PRPL stands for Pushing critical resource, Rendering initial route, Pre-caching remaining routes and Lazy-loading remaining routes on demand.
Have you optimized the performance of your APIs? APIs are communication channels for an application to expose data to internal and third-party applications via endpoints. When designing and building an API, we need a reasonable protocol to enable the communication between the server and third-party requests. Representational State Transfer (REST) is a well-established, logical choice: it defines a set of constraints that developers follow to make content accessible in a performant, reliable and scalable fashion. Web services that conform to the REST constraints, are called RESTful web services.
As with good ol’ HTTP requests, when data is retrieved from an API, any delay in server response will propagate to the end user, hence delaying rendering. When a resource wants to retrieve some data from an API, it will need to request the data from the corresponding endpoint. A component that renders data from several resources, such as an article with comments and author photos in each comment, may need several roundtrips to the server to fetch all the data before it can be rendered. Furthermore, the amount of data returned through REST is often more than what is needed to render that component.
If many resources require data from an API, the API might become a performance bottleneck. GraphQL provides a performant solution to these issues. Per se, GraphQL is a query language for your API, and a server-side runtime for executing queries by using a type system you define for your data. Unlike REST, GraphQL can retrieve all data in a single request, and the response will be exactly what is required, without over or under-fetching data as it typically happens with REST.
Will you be using AMP or Instant Articles? Depending on the priorities and strategy of your organization, you might want to consider using Google’s AMP or Facebook’s Instant Articles or Apple’s Apple News. You can achieve good performance without them, but AMP does provide a solid performance framework with a free content delivery network (CDN), while Instant Articles will boost your visibility and performance on Facebook.
The seemingly obvious benefit of these technologies for users is guaranteed performance, so at times they might even prefer AMP-/Apple News/Instant Pages-links over “regular” and potentially bloated pages. For content-heavy websites that are dealing with a lot of third-party content, these options could potentially help speed up render times dramatically.
Unless they don’t. According to Tim Kadlec, for example, “AMP documents tend to be faster than their counterparts, but they don’t necessarily mean a page is performant. AMP is not what makes the biggest difference from a performance perspective.”
A benefit for the website owner is obvious: discoverability of these formats on their respective platforms and increased visibility in search engines. You could build progressive web AMPs, too, by reusing AMPs as a data source for your PWA. Downside? Obviously, a presence in a walled garden places developers in a position to produce and maintain a separate version of their content, and in case of Instant Articles and Apple News without actual URLs(thanks Addy, Jeremy!).
Choose your CDN wisely. As mentioned above, depending on how much dynamic data you have, you might be able to “outsource” some part of the content to a static site generator, pushing it to a CDN and serving a static version from it, thus avoiding requests to the server. In fact, some of those generators are actually website compilers with many automated optimizations provided out of the box. As compilers add optimizations over time, the compiled output gets smaller and faster over time.
Notice that CDNs can serve (and offload) dynamic content as well. So, restricting your CDN to static assets is not necessary. Double-check whether your CDN performs compression and conversion (e.g. image optimization and resizing at the edge), whether they provide support for servers workers, A/B testing, as well as edge-side includes, which assemble static and dynamic parts of pages at the CDN’s edge (i.e. the server closest to the user), and other tasks. Also, check if your CDN supports HTTP over QUIC (HTTP/3).
Katie Hempenius has written a fantastic guide to CDNs that provides insights on how to choose a good CDN, how to finetune it and all the little things to keep in mind when evaluating one. In general, it’s a good idea to cache content as aggressively as possible and enable CDN performance features like Brotli, TLS 1.3, HTTP/2, and HTTP/3.
The catch is that because compressing all assets with Brotli at a high compression level is expensive, many hosting providers can’t use it at fule scall just because of the huge cost overhead it produces. In fact, at the highest level of compression, Brotli is so slow that any potential gains in file size could be nullified by the amount of time it takes for the server to begin sending the response as it waits to dynamically compress the asset. (But if you have time during the build time with static compression, of course, higher compression settings are preferred.)
On top of that, with Elena Kirilenko’s research, we can achieve fast and efficient Brotli recompression using previous compression artifacts. According to Elena, “once we have an asset compressed via Brotli, and we’re trying to compress dynamic content on-the-fly, where the content resembles content available to us ahead of time, we can achieve significant improvements in compression times.”
The strategy?Pre-compress static assets with Brotli+Gzip at the highest level and compress (dynamic) HTML on the fly with Brotli at level 4–6. Make sure that the server handles content negotiation for Brotli or Gzip properly.
Do we use adaptive media loading and client hints? It’s coming from the land of old news, but it’s always a good reminder to use responsive images with srcset, sizes and the <picture> element. Especially for sites with a heavy media footprint, we can take it a step further with adaptive media loading (in this example React + Next.js), serving light experience to slow networks and low-memory devices and full experience to fast network and high-memory devices. In the context of React, we can achieve it with client hints on the server and react-adaptive-hooks on the client.
The future of responsive images might change dramatically with the wider adoption of client hints. Client hints are HTTP request header fields, e.g. DPR, Viewport-Width, Width, Save-Data, Accept (to specify image format preferences) and others. They are supposed to inform the server about the specifics of user’s browser, screen, connection etc.
As a result, the server can decide how to fill in the layout with appropriately sized images, and serve only these images in desired formats. With client hints, we move the resource selection from HTML markup and into the request-response negotiation between the client and server.
As Ilya Grigorik noted a while back, client hints complete the picture — they aren’t an alternative to responsive images. “The <picture> element provides the necessary art-direction control in the HTML markup. Client hints provide annotations on resulting image requests that enable resource selection automation. Service Worker provides full request and response management capabilities on the client.”
A service worker could, for example, append new client hints headers values to the request, rewrite the URL and point the image request to a CDN, adapt response based on connectivity and user preferences, etc. It holds true not only for image assets but for pretty much all other requests as well.
However, if you supply both the normal responsive images markup and the <meta> tag for Client Hints, then a supporting browser will evaluate the responsive images markup and request the appropriate image source using the Client Hints HTTP headers.
Basically we can conditionally serve low-resolution background images with a 1x descriptor, and higher-resolution images with 2x descriptor, and even a print-quality image with 600dpi descriptor. Beware though: browsers do not provide any special information on background images to assistive technology, so ideally these photos would be merely decoration.
Do we use WebP? Image compression is often considered a quick win, yet it’s still heavily underutilized in practice. Of course images do not block rendering, but they contribute heavily to poor LCP scores, and very often they are just too heavy and too large for the device they are being consumed on.
So at the very least, we could explore using the WebP format for our images. In fact, the WebP saga has been nearing the end last year with Apple adding support for WebP in Safari 14. So after many years of discussions and debates, as of today, WebP is supported in all modern browsers. So we can serve WebP images with the <picture> element and a JPEG fallback if needed (see Andreas Bovens’ code snippet) or by using content negotiation (using Accept headers).
WebP isn’t without its downsides though. While WebP image file sizes compared to equivalent Guetzli and Zopfli, the format doesn’t support progressive rendering like JPEG, which is why users might see the finished image faster with a good ol’ JPEG although WebP images might be getting faster through the network. With JPEG, we can serve a “decent” user experience with the half or even quarter of the data and load the rest later, rather than have a half-empty image as it is in the case of WebP.
Your decision will depend on what you are after: with WebP, you’ll reduce the payload, and with JPEG you’ll improve perceived performance. You can learn more about WebP in WebP Rewind talk by Google’s Pascal Massimino.
Do we use AVIF? You might have heard the big news: AVIF has landed. It’s a new image format derived from the keyframes of AV1 video. It’s an open, royalty-free format that supports lossy and lossless compression, animation, lossy alpha channel and can handle sharp lines and solid colors (which was an issue with JPEG), while providing better results at both.
In fact, compared to WebP and JPEG, AVIF performs significantly better, yielding median file size savings for up to 50% at the same DSSIM ((dis)similarity between two or more images using an algorithm approximating human vision). In fact, in his thorough post on optimizing image loading, Malte Ubl notes that AVIF “very consistently outperforms JPEG in a very significant way. This is different from WebP which doesn’t always produce smaller images than JPEG and may actually be a net-loss due to lack of support for progressive loading.”
Ironically, AVIF can perform even better than large SVGs although of course it shouldn’t be seen as a replacement to SVGs. It is also one of the first image formats to support HDR color support; offering higher brightness, color bit depth, and color gamuts. The only downside is that currently AVIF doesn’t support progressive image decoding (yet?) and, similarly to Brotli, a high compression rate encoding is currently quite slow, although decoding is fast.
What’s the best way to serve images these days then? For illustrations and vector images, (compressed) SVG is undoubtedly the best choice. For photos, we use content negotiation methods with the picture element. If AVIF is supported, we send an AVIF image; if it’s not the case, we fall back to WebP first, and if WebP isn’t supported either, we switch to JPEG or PNG as fallback (applying @media conditions if needed):
A great reference for everything AVIF: Jake Archibald’s comprehensive post on AVIF has landed.
So is the future AVIF then? Jon Sneyers disagrees: AVIF performs 60% worse than JPEG XL, another free and open format developed by Google and Cloudinary. In fact, JPEG XL seems to be performingwaybetter across the board. However, JPEG XL is still only in the final stages of standardization, and does not yet work in any browser. (Not to mix up with Microsoft’s JPEG-XR coming from good ol’ Internet Explorer 9 times).
Are JPEG/PNG/SVGs properly optimized? When you’re working on a landing page on which it’s critical that a hero image loads blazingly fast, make sure that JPEGs are progressive and compressed with mozJPEG (which improves the start rendering time by manipulating scan levels) or Guetzli, Google’s open-source encoder focusing on perceptual performance, and utilizing learnings from Zopfli and WebP. The only downside: slow processing times (a minute of CPU per megapixel).
For PNG, we can use Pingo, and for SVG, we can use SVGO or SVGOMG. And if you need to quickly preview and copy or download all the SVG assets from a website, svg-grabber can do that for you, too.
Every single image optimization article would state it, but keeping vector assets clean and tight is always worth mentioning. Make sure to clean up unused assets, remove unnecessary metadata and reduce the number of path points in artwork (and thus SVG code). (Thanks, Jeremy!)
There are also useful online tools available though:
Use Squoosh to compress, resize and manipulate images at the optimal compression levels (lossy or lossless),
Use Guetzli.it to compress and optimize JPEG images with Guetzli, which works well for images with sharp edges and solid colors (but might be quite a bit slower)).
To optimize storage interally, you could use Dropbox’s new Lepton format for losslessly compressing JPEGs by an average of 22%.
Use BlurHash if you’d like to show a placeholder image early. BlurHash takes an image, and gives you a short string (only 20-30 characters!) that represents the placeholder for this image. The string is short enough that it can easily be added as a field in a JSON object.
Sometimes optimizing images alone won’t do the trick. To improve the time needed to start the rendering of a critical image, lazy-load less important images and defer any scripts to load after critical images have already rendered. The most bulletproof way is hybrid lazy-loading, when we utilize native lazy-loading and lazyload, a library that detects any visibility changes triggered through user interaction (with IntersectionObserver which we’ll explore later). Additionally:
Consider preloading critical images, so a browser doesn’t discover them too late. For background images, if you want to be even more aggressive than that, you can add the image as a regular image with <img src>, and then hide it off the screen.
Review image download inconsistencies to prevent unexpected downloads for foreground and background images. Watch out for images that are loaded by default, but might never be displayed — e.g. in carousels, accordions and image galleries.
With a service worker, it’s too late as you can’t control what’s on the wire, but it does work with Edge workers. So you can use them on top of static JPEGs saved progressively for a particular landing page.
Not good enough? Well, you can also improve perceived performance for images with the multiplebackgroundimagestechnique. Keep in mind that playing with contrast and blurring out unnecessary details (or removing colors) can reduce file size as well. Ah, you need to enlarge a small photo without losing quality? Consider using Letsenhance.io.
These optimizations so far cover just the basics. Addy Osmani has published a very detailed guide on Essential Image Optimization that goes very deep into details of image compression and color management. For example, you could blur out unnecessary parts of the image (by applying a Gaussian blur filter to them) to reduce the file size, and eventually you might even start removing colors or turn the picture into black and white to reduce the size even further. For background images, exporting photos from Photoshop with 0 to 10% quality can be absolutely acceptable as well.
On Smashing Magazine, we use the postfix -opt for image names — for example, brotli-compression-opt.png; whenever an image contains that postfix, everybody on the team knows that the image has already been optimized.
Ah, and don’t use JPEG-XR on the web — “the processing of decoding JPEG-XRs software-side on the CPU nullifies and even outweighs the potentially positive impact of byte size savings, especially in the context of SPAs” (not to mix up with Cloudinary/Google’s JPEG XL though).
Are videos properly optimized? We covered images so far, but we’ve avoided a conversation about good ol’ GIFs. Despite our love for GIFs, it’s really the time to abandon them for good (at least in our websites and apps). Instead of loading heavy animated GIFs which impact both rendering performance and bandwidth, it’s a good idea to switch either to animated WebP (with GIF being a fallback) or replace them with looping HTML5 videos altogether.
Unlike with images, browsers do not preload <video> content, but HTML5 videos tend to be much lighter and smaller than GIFs. Not an option? Well, at least we can add lossy compression to GIFs with Lossy GIF, gifsicle or giflossy.
Tests by Colin Bendell show that inline videos within img tags in Safari Technology Preview display at least 20× faster and decode 7× faster than the GIF equivalent, in addition to being a fraction in file size. However, it’s not supported in other browsers.
In the land of good news, video formats have been advancing massively over the years. For a long time, we had hoped that WebM would become the format to rule them all, and WebP (which is basically one still image inside of the WebM video container) will become a replacement for dated image formats. Indeed, Safari is now supporting WebP, but despite WebP and WebM gainingsupport these days, the breakthrough didn’t really happen.
<!-- By Houssein Djirdeh. https://web.dev/replace-gifs-with-videos/ -->
<!-- A common scenartio: MP4 with a WEBM fallback. -->
<video autoplay loop muted playsinline>
<source src="my-animation.webm" type="video/webm">
<source src="my-animation.mp4" type="video/mp4">
But perhaps we could revisit it altogether. In 2018, the Alliance of Open Media has released a new promising video format called AV1. AV1 has compression similar to the H.265 codec (the evolution of H.264) but unlike the latter, AV1 is free. The H.265 license pricing pushed browser vendors to adopt a comparably performant AV1 instead: AV1 (just like H.265) compresses twice as good as WebM.
In fact, Apple currently uses HEIF format and HEVC (H.265), and all the photos and videos on the latest iOS are saved in these formats, not JPEG. While HEIF and HEVC (H.265) aren’t properly exposed to the web (yet?), AV1 is — and it’s gaining browser support. So adding the AV1 source in your <video> tag is reasonable, as all browser vendors seem to be on board.
Need to start rendering videos faster but video files are still too large? For example, whenever hou have a large background video on a landing page? A common technique to use is to show the very first frame as a still image first, or display a heavily optimized, short looping segment that could be interpreted as a part of the video, and then, whenever the video is buffered enough, start playing the actual video. Doug Sillars has a written a detailed guide to background video performance that could be helpful in that case. (Thanks, Guy Podjarny!).
The research shows that video stream quality impacts viewer behavior. In fact, viewers start to abandon the video if the startup delay exceeds about 2 seconds. Beyond that point, a 1-second increase in delay results in roughly a 5.8% increase in abandonment rate. So it’s not surprising that the median video start time is 12.8s, with 40% of videos having at least 1 stall, and 20% at least 2s of stalled video playback. In fact, video stalls are unavoidable on 3G as videos play back faster than the network can supply content.
Is web font delivery optimized? The first question that’s worth asking is if we can get away with using UI system fonts in the first place — we just need to make sure to double check that they appear correctly on various platforms. If it’s not the case, chances are high that the web fonts we are serving include glyphs and extra features and weights that aren’t being used. We can ask our type foundry to subset web fonts or if we are using open-source fonts, subset them on our own with Glyphhanger or Fontsquirrel. We can even automate our entire workflow with Peter Müller’s subfont, a command line tool that statically analyses your page in order to generate the most optimal web font subsets, and then inject them into our pages.
Probably the better options to consider today are Critical FOFT with preload and “The Compromise” method. Both of them use a two-stage render for delivering web fonts in steps — first a small supersubset required to render the page fast and accurately with the web font, and then load the rest of the family async. The difference is that “The Compromise” technique loads polyfill asynchronously only if font load events are not supported, so you don’t need to load the polyfill by default. Need a quick win? Zach Leatherman has a quick 23-min tutorial and case study to get your fonts in order.
It’s a good idea to be selective and choose files that matter most, e.g. the ones that are critical for rendering or that would help you avoiding visible and disruptive text reflows. In general, Zach advises to preload one or two fonts of each family — it also makes sense to delay some font loading if they are less critical.
It has become quite common to use local() value (which refers to a local font by name) when defining a font-family in the @font-face rule:
/ Warning! Not a good idea! /
font-family: Open Sans;
src: local('Open Sans Regular'),
url('opensans.woff2') format ('woff2'),
The idea is reasonable: some popular open-source fonts such as Open Sans are coming pre-installed with some drivers or apps, so if the font is available locally, the browser doesn’t need to download the web font and can display the local font immediately. As Bram Stein noted, “though a local font matches the name of a web font, it most likely isn’t the same font. Many web fonts differ from their “desktop” version. The text might be rendered differently, some characters may fall back to other fonts, OpenType features can be missing entirely, or the line height may be different.”
Also, as typefaces evolve over time, the locally installed version might be very different from the web font, with characters looking very different. So, according to Bram, it’s better to never mix locally installed fonts and web fonts in @font-face rules. Google Fonts has followed suit by disabling local() on the CSS results for all users, other than Android requests for Roboto.
Nobody likes waiting for the content to be displayed. With the font-display CSS descriptor, we can control the font loading behavior and enable content to be readable immediately (with font-display: optional) or almost immediately (with a timeout of 3s, as long as the font gets successfully downloaded — with font-display: swap). (Well, it’s a bit more complicated than that.)
/* Zach Leatherman: https://noti.st/zachleat/KNaZEg/the-five-whys-of-web-font-loading-performance#sWkN4u4 */
// Remove existing @font-face blocks
// Create two
let font = new FontFace("Noto Serif", /* ... */);
let fontBold = new FontFace("Noto Serif, /* ... */);
// Load two fonts
let fonts = await Promise.all([
// Group repaints and render both fonts at the same time!
fonts.forEach(font => documents.fonts.add(font));
To initiate a very early fetch of the fonts with Font Loading API in use, Adrian Bece suggests to add a non-breaking space nbsp; at the top of the body, and hide it visually with aria-visibility: hidden and a .hidden class:
<!-- ... Website content ... -->
<div aria-visibility="hidden" class="hidden" style="font-family: '[web-font-name]'">
<!-- There is a non-breaking space here -->
This goes along with CSS that has different font families declared for different states of loading, with the change triggered by Font Loading API once the fonts have successfully loaded:
If you ever wondered why despite all your optimizations, Lighthouse still suggests to eliminate render-blocking resources (fonts), in the same article Adrian Bece provides a few techniques to make Lighthouse happy, along with a Gatsby Omni Font Loader, a performant asynchronous font loading and Flash Of Unstyled Text (FOUT) handling plugin for Gatsby.
<!-- By Harry Roberts.
- 1. Preemptively warm up the fonts’ origin.
- 2. Initiate a high-priority, asynchronous fetch for the CSS file. Works in
- most modern browsers.
- 3. Initiate a low-priority, asynchronous fetch that gets applied to the page
- 4. In the unlikely event that a visitor has intentionally disabled
- although this is a render-blocking request, it can still make use of the
- preconnect which makes it marginally faster than the default.
<!--  -->
<!--  -->
<!--  -->
media="print" onload="this.media='all'" />
<!--  -->
A quick word of caution though. If you use font-display: optional, it might be suboptimal to also use preload as it will trigger that web font request early (causing network congestion if you have other critical path resources that need to be fetched). Use preconnect for faster cross-origin font requests, but be cautious with preload as preloading fonts from a different origin wlll incur network contention. All of these techniques are covered in Zach’s Web font loading recipes.
Metrics? To measure the web font loading performance, consider the All Text Visible metric (the moment when all fonts have loaded and all content is displayed in web fonts), Time to Real Italics as well as Web Font Reflow Count after first render. Obviously, the lower both metrics are, the better the performance is.
What about variable fonts, you might ask? It’s important to notice that variablefonts might require a significant performance consideration. They give us a much broader design space for typographic choices, but it comes at the cost of a single serial request opposed to a number of individual file requests.
Now, what would make a bulletproof web font loading strategy then? Subset fonts and prepare them for the 2-stage-render, declare them with a font-display descriptor, use Font Loading API to group repaints and store fonts in a persistent service worker’s cache. On the first visit, inject the preloading of scripts just before the blocking external scripts. You could fall back to Bram Stein’s Font Face Observer if necessary. And if you’re interested in measuring the performance of font loading, Andreas Marschke explores performance tracking with Font API and UserTiming API.
Finally, don’t forget to include unicode-range to break down a large font into smaller language-specific fonts, and use Monica Dinculescu’s font-style-matcher to minimize a jarring shift in layout, due to sizing discrepancies between the fallback and the web fonts.
Does the future look bright? With progressive font enrichment, eventually we might be able to “download only the required part of the font on any given page, and for subsequent requests for that font to dynamically ‘patch’ the original download with additional sets of glyphs as required on successive page views”, as Jason Pamental explains it. Incremental Transfer Demo is already available, and it’s work in progress.
Set up a spreadsheet. Define the basic core experience for legacy browsers (i.e. fully accessible core content), the enhanced experience for capable browsers (i.e. an enriched, full experience) and the extras (assets that aren’t absolutely required and can be lazy-loaded, such as web fonts, unnecessary styles, carousel scripts, video players, social media widgets, large images). Years ago, we published an article on “Improving Smashing Magazine’s Performance,” which describes this approach in detail.
When optimizing for performance we need to reflect our priorities. Load the core experience immediately, then enhancements, and then the extras.
As a result, we help reduce blocking of the main thread by reducing the amount of scripts the browser needs to process. Jeremy Wagner has published a comprehensive article on differential serving and how to set it up in your build pipeline, from setting up Babel, to what tweaks you’ll need to make in Webpack, as well as the benefits of doing all this work.
Note: It’s worth stating that feature detection alone isn’t enough to make an informed decision about the payload to ship to that browser. On its own, we can’t deduce device capability from browser version. For example, cheap Android phones in developing countries mostly run Chrome and will cut the mustard despite their limited memory and CPU capabilities.
Are you using tree-shaking, scope hoisting and code-splitting? Tree-shaking is a way to clean up your build process by only including code that is actually used in production and eliminate unused imports in Webpack. With Webpack and Rollup, we also have scope hoisting that allows both tools to detect where import chaining can be flattened and converted into one inlined function without compromising the code. With Webpack, we can also use JSON Tree Shaking as well.
So, what’s the best way to code-split aggressively, but not too aggressively? According to Phil Walton, “in addition to code-splitting via dynamic imports, [we could] also use code-splitting at the package level, where each imported node modules get put into a chunk based on its package’s name.” Phil provides a tutorial on how to build it as well.
Can we improve Webpack’s output? As Webpack is often considered to be mysterious, there are plenty of Webpack plugins that may come in handy to further reduce Webpack’s output. Below are some of the more obscure ones that might need a bit more attention.
One of the interesting ones comes from Ivan Akulov’s thread. Imagine that you have a function that you call once, store its result in a variable, and then don’t use that variable. Tree-shaking will remove the variable, but not the function, because it might be used otherwise. However, if the function isn’t used anywhere, you might want to remove it. To do so, prepend the function call with /#PURE/ which is supported by Uglify and Terser — done!
Here are some of the other tools that Ivan recommends:
How to get started? Here are a few resources that are worth looking into:
If you’d like to learn more about WebAssembly:
Identify and rewrite legacy code with incremental decoupling. Long-living projects have a tendency to gather dust and dated code. Revisit your dependencies and assess how much time would be required to refactor or rewrite legacy code that has been causing trouble lately. Of course, it’s always a big undertaking, but once you know the impact of the legacy code, you could start with incremental decoupling.
First, set up metrics that tracks if the ratio of legacy code calls is staying constant or going down, not up. Publicly discourage the team from using the library and make sure that your CI alerts developers if it’s used in pull requests. polyfills could help transition from legacy code to rewritten codebase that uses standard browser features.
Furthermore, purgecss, UnCSS and Helium can help you remove unused styles from CSS. And if you aren’t certain if a suspicious piece of code is used somewhere, you can follow Harry Roberts’ advice: create a 1×1px transparent GIF for a particular class and drop it into a dead/ directory, e.g. /assets/img/dead/comments.gif.
After that, you set that specific image as a background on the corresponding selector in your CSS, sit back and wait for a few months if the file is going to appear in your logs. If there are no entries, nobody had that legacy component rendered on their screen: you can probably go ahead and delete it all.
Add bundle auditing into your regular workflow as well. There might be some lightweight alternatives to heavy libraries you’ve added years ago, e.g. Moment.js (now discontinued) could be replaced with:
Have we optimized the strategy for React/SPA? Struggling with performance in your single-page-application application? Jeremy Wagner has explored the impact of client-side framework performance on a variety of devices, highlighting some of the implications and guidelines we might want to be aware of when using one.
Hence, every interactive element is receiving a probability score for engagement, and based on that score, a client-side script decides to prefetch a resource ahead of time. You can integrate the technique to your Next.js application, Angular and React, and there is a Webpack plugin which automates the setup process as well.
Obviously, you might be prompting the browser to consume unneeded data and prefetch undesirable pages, so it’s a good idea to be quite conservative in the number of prefetched requests. A good use case would be prefetching validation scripts required in the checkout, or speculative prefetch when a critical call-to-action comes into the viewport.
If you want to look into the science of predictive prefetching in full detail, Divya Tagtachian has a great talk on The Art of Predictive Prefetch, covering all the options from start to finish.
Script streaming allows async or defer scripts to be parsed on a separate background thread once downloading begins, hence in some cases improving page loading times by up to 10%. Practically, use <script defer> in the <head>, so that the browsers can discover the resource early and then parse it on the background thread.
Caveat: Opera Mini doesn’t support script deferment, so if you are developing for India or Africa,deferwill be ignored, resulting in blocking rendering until the script has been evaluated (thanks Jeremy!).
You could also hook into V8’s code caching as well, by splitting out libraries from code using them, or the other way around, merge libraries and their uses into a single script, group small files together and avoid inline scripts. Or perhaps even use v8-compile-cache.
If you have any scroll or touch* listeners, pass passive: true to addEventListener. This tells the browser you’re not planning to call event.preventDefault() inside, so it can optimize the way it handles these events. (via Ivan Akulov)
For security reasons, to avoid fingerprinting, browsers have been implementing partitioned caching that was introduced in Safari back in 2013, and in Chrome last year. So if two sites point to the exact same third-party resource URL, the code is downloaded once per domain, and the cache is “sandboxed” to that domain due to privacy implications (thanks, David Calhoun!). Hence, using a public CDN will not automatically lead to better performance.
Constrain the impact of third-party scripts. With all performance optimizations in place, often we can’t control third-party scripts coming from business requirements. Third-party-scripts metrics aren’t influenced by end-user experience, so too often one single script ends up calling a long tail of obnoxious third-party scripts, hence ruining a dedicated performance effort. To contain and mitigate performance penalties that these scripts bring along, it’s not enough to just defer their loading and execution and warm up connections via resource hints, i.e. dns-prefetch or preconnect.
Furthermore, these third-parties often invite fourth-party scripts to join in, ending up with a huge performance bottleneck, sometimes going as far as to the eigth-party scripts on a page. So regularly auditing your dependencies and tag managers can bring along costly surprises.
Another problem, as Yoav Weiss explained in his talk on third-party scripts, is that in many cases these scripts download resources that are dynamic. The resources change between page loads, so we don’t necessarily know which hosts the resources will be downloaded from and what resources they would be.
According to Andy, there are two ways tags impact site-speed — they compete for network bandwidth and processing time on visitors’ devices, and depending on how they’re implemented, they can delay HTML parsing as well. So the first step is to identify the impact that third-parties have, by testing the site with and without scripts using WebPageTest. With Simon Hearne’s Request Map, we can also visualize third-parties on a page along with details on their size, type and what triggered their load.
If it’s not viable, we can at least lazy load third-party resources with facades, i.e. a static element which looks similar to the actual embedded third-party, but is not functional and therefore much less taxing on the page load. The trick, then, is to load the actual embed only on interaction.
One of the reasons why tag managers are usually large in size is because of the many simultaneous experiments that are running at the same time, along with many user segments, page URLs, sites etc., so according to Andy, reducing them can reduce both the download size and the time it takes to execute the script in the browser.
And then there are anti-flicker snippets. Third-parties such as Google Optimize, Visual Web Optimizer (VWO) and others are unanimous in using them. These snippets are usually injected along with running A/B tests: to avoid flickering between the different test scenarios, they hide the body of the document with opacity: 0, then adds a function that gets called after a few seconds to bring the opacity back. This often results in massive delays in rendering due to massive client-side execution costs.
Watch out: some third-party widgets hide themselves from auditing tools, so they might be more difficult to spot and measure. To stress-test third parties, examine bottom-up summaries in Performance profile page in DevTools, test what happens if a request is blocked or it has timed out — for the latter, you can use WebPageTest’s Blackhole server blackhole.webpagetest.org that you can point specific domains to in your hosts file.
What options do we have then? Consider using service workers by racing the resource download with a timeout and if the resource hasn’t responded within a certain timeout, return an empty response to tell the browser to carry on with parsing of the page. You can also log or block third-party requests that aren’t successful or don’t fulfill certain criteria. If you can, load the 3rd-party-script from your own server rather than from the vendor’s server and lazy load them.
Another option is to establish a Content Security Policy (CSP) to restrict the impact of third-party scripts, e.g. disallowing the download of audio or video. The best option is to embed scripts via <iframe> so that the scripts are running in the context of the iframe and hence don’t have access to the DOM of the page, and can’t run arbitrary code on your domain. Iframes can be further constrained using the sandbox attribute, so you can disable any functionality that iframe may do, e.g. prevent scripts from running, prevent alerts, form submission, plugins, access to the top navigation, and so on.
You could also keep third-parties in check via in-browser performance linting with feature policies, a relatively new feature that lets you opt-in or out of certain browser features on your site. (As a sidenote, it could also be used to avoid oversized and unoptimized images, unsized media, sync scripts and others). Currently supported in Blink-based browsers.
Consider using an Intersection Observer; that would enable ads to be iframed while still dispatching events or getting the information that they need from the DOM (e.g. ad visibility). Watch out for new policies such as Feature policy, resource size limits and CPU/Bandwidth priority to limit harmful web features and scripts that would slow down the browser, e.g. synchronous scripts, synchronous XHR requests, document.write and outdated implementations.
Finally, when choosing a third-party service, consider checking Patrick Hulce’s ThirdPartyWeb.Today, a service that groups all third-party scripts by category (analytics, social, advertising, hosting, tag manager etc.) and visualizes how long the entity’s scripts take to execute (on average). Obviously, largest entities have the worst performance impact to the pages they’re on. Just by skimming the page, you’ll get an idea of the performance footprint you should be expecting.
Set HTTP cache headers properly. Caching seems to be such an obvious thing to do, yet it might be quite tough to get right. We need to double-check that expires, max-age, cache-control, and other HTTP cache headers have been set properly. Without proper HTTP cache headers, browsers will set them automatically at 10% of elapsed time since last-modified, ending up with potential under- and over-caching.
Use Cache-control: immutable to avoid revalidation of long explicit cache lifetimes when users hit the reload button. For the reload case, immutable saves HTTP requests and improves the load time of the dynamic HTML as they no longer compete with the multitude of 304 responses.
Do you remember the good ol’ stale-while-revalidate? When we specify the caching time with the Cache-Control response header (e.g. Cache-Control: max-age=604800), after max-age expires, the browser will re-fetch the requested content, causing the page to load slower. This slowdown can be avoided with stale-while-revalidate; it basically defines an extra window of time during which a cache can use a stale asset as long as it revalidates it async in the background. Thus, it “hides” latency (both in the network and on the server) from clients.
In June–July 2019, Chrome and Firefox launched support of stale-while-revalidate in HTTP Cache-Control header, so as a result, it should improve subsequent page load latencies as stale assets are no longer in the critical path. Result: zero RTT for repeat views.
Note: We often assume that cached assets are retrieved instantly, but research shows that retrieving an object from cache can take hundreds of milliseconds. In fact, according to Simon Hearne, “sometimes network might be faster than cache, and retrieving assets from cache can be costly with a large number of cached assets (not file size) and the user’s devices. For example: Chrome OS average cache retrieval doubles from ~50ms with 5 cached resources up to ~100ms with 25 resources”.
Besides, we often assume that bundle size isn’t a huge issue and users will download it once and then use the cached version. At the same time, with CI/CD we push code to production multiple times a day, cache gets invalidated every time, so being strategic about caching matters.
When it comes to caching, there are plenty of resources that are worth reading:
It’s worth noting that there are a few misconceptions about async and defer. Most importantly, async doesn’t mean that the code will run whenever the script is ready; it means that it will run whenever the scripts is ready and all preceding sync work is done. In Harry Roberts’ words, “If you put an async scrip after sync scripts, your async script is only as fast as your slowest sync script.”
If you’d like to dive into more details, Milica Mihajlija has written a very detailed guide on Building the DOM faster, going into the details of speculative parsing, async and defer.
<!-- Lazy loading for images, iframes, scripts.
Probably for images outside of the viewport. -->
<img loading="lazy" ... />
<iframe loading="lazy" ... />
<!-- Prompt an early download of an asset.
For critical images, e.g. hero images. -->
<img loading="eager" ... />
<iframe loading="eager" ... />
That threshold depends on a few things, from the type of image resource being fetched to effective connection type. But experiments conducted using Chrome on Android suggest that on 4G, 97.5% of below-the-fold images that are lazy-loaded were fully loaded within 10ms of becoming visible, so it should be safe.
We can also use importance attribute (high or low) on a <script>, <img>, or <link> element (Blink only). In fact, it’s a great way to deprioritize images in carousels, as well as re-prioritize scripts. However, sometimes we might need a bit more granular control.
When the browser assigns "High" priority to an image,
but we don’t actually want that.
<img src="less-important-image.svg" importance="low" ... />
We want to initiate an early fetch for a resource,
but also deprioritize it.
<link rel="preload" importance="low" href="/script.js" as="script" />
The most performant way to do slightly more sophisticated lazy loading is by using the Intersection Observer API that provides a way to asynchronously observe changes in the intersection of a target element with an ancestor element or with a top-level document’s viewport. Basically, you need to create a new IntersectionObserver object, which receives a callback function and a set of options. Then we add a target to observe.
The callback function executes when the target becomes visible or invisible, so when it intercepts the viewport, you can start taking some actions before the element becomes visible. In fact, we have a granular control over when the observer’s callback should be invoked, with rootMargin (margin around the root) and threshold (a single number or an array of numbers which indicate at what percentage of the target’s visibility we are aiming).
A quick word of caution though: it’s worth noting that lazy loading should be an exception rather than the rule. It’s probably not reasonable to lazy-load anything that you actually want people to see quickly, e.g. product page images, hero images or a script required for the main navigation to become interactive.
Opinions differ if these techniques improve user experience or not, but it definitely improves time to first meaningful paint. We can even automate it by using SQIP that creates a low quality version of an image as an SVG placeholder, or Gradient Image Placeholders with CSS linear gradients.
These placeholders could be embedded within HTML as they naturally compress well with text compression methods. In his article, Dean Hume has described how this technique can be implemented using Intersection Observer.
Fallback? If the browser doesn’t support intersection observer, we can still lazy load a polyfill or load the images immediately. And there is even a library for it.
Want to go fancier? You could trace your images and use primitive shapes and edges to create a lightweight SVG placeholder, load it first, and then transition from the placeholder vector image to the (loaded) bitmap image.
Do you defer rendering with content-visibility? For complex layout with plentiful of content blocks, images and videos, decoding data and rendering pixels might be a quite expensive operation — especially on low-end devices. With content-visibility: auto, we can prompt the browser to skip the layout of the children while the container is outside of the viewport.
For example, you might skip the rendering of the footer and late sections on the initial load:
/ 1000px is an estimated height for sections that are not rendered yet. /
Note that content-visibility: auto; behaves like overflow: hidden;, but you can fix it by applying padding-left and padding-right instead of the default margin-left: auto;, margin-right: auto; and a declared width. The padding basically allows elements to overflow the content-box and enter the padding-box without leaving the box model as a whole and getting cut off.
Also, keep in mind that you might introduce some CLS when new content eventually gets rendered, so it’s a good idea to use contain-intrinsic-sizewith a placeholder properly sized (thanks, Una!).
And if you need to get a bit more granular, with CSS Containment, you can manually skip layout, style and paint work for descendants of a DOM node if you need only size, alignment or computed styles on other elements — or the element is currently off-canvas.
Do you defer decoding with decoding="async"? Sometimes content appears offscreen, yet we want to ensure that it’s available when customers need it — ideally, not blocking anything in the critical path, but decoding and rendering asynchronously. We can use decoding="async" to give the browser a permission to decode the image off the main thread, avoiding user impact of the CPU-time used to decode the image (via Malte Ubl):
<img decoding="async" … />
Alternatively, for offscreen images, we can display a placeholder first, and when the image is within the viewport, using IntersectionObserver, trigger a network call for the image to be downloaded in background. Also, we can defer render until decode with img.decode() or download the image if the Image Decode API isn’t available.
Do you generate and serve critical CSS? To ensure that browsers start rendering your page as quickly as possible, it’s become a common practice to collect all of the CSS required to start rendering the first visible portion of the page (known as “critical CSS” or “above-the-fold CSS”) and include it inline in the <head> of the page, thus reducing roundtrips. Due to the limited size of packages exchanged during the slow start phase, your budget for critical CSS is around 14KB. (This specific limitation doesn’t apply with TCP BBR in place although it’s still important to prioritize critical resources and load them as early as possible).
If you go beyond that, the browser will need additional roundtrips to fetch more styles. CriticalCSS and Critical enable you to output critical CSS for every template you’re using. In our experience though, no automatic system was ever better than manual collection of critical CSS for every template, and indeed that’s the approach we’ve moved back to recently.
When collecting all the critical CSS for each template, it’s common to explore the “above-the-fold” area alone. However, for complex layouts, it might be a good idea to include the groundwork of the layout as well to avoid massive recalculation and repainting costs, hurting your Core Web Vitals score as a result.
What if a user gets a URL that’s linking directly to the middle of the page but the CSS hasn’t been downloaded yet? In that case, it has become common to hide non-critical content, e.g. with opacity: 0; in inlined CSS and opacity: 1 in full CSS file, and display it when CSS is available. It has a major downside though, as users on slow connections might never be able to read the content of the page. That’s why it’s better to always keep the content visible, even although it might not be styled properly.
Putting critical CSS (and other important assets) in a separate file on the root domain has benefits, sometimes even more than inlining, due to caching. Chrome speculatively opens a second HTTP connection to the root domain when requesting the page, which removes the need for a TCP connection to fetch this CSS. That means that you could create a set of critical-CSS-files (e.g. critical-homepage.css, critical-product-page.css etc.) and serve them from your root, without having to inline them. (thanks, Philip!)
A word of caution: with HTTP/2, critical CSS could be stored in a separate CSS file and delivered via a server push without bloating the HTML. The catch is that server pushing was troublesome with many gotchas and race conditions across browsers. It was never supported consistently and had some caching issues (see slide 114 onwards of Hooman Beheshti’s presentation).
Experiment with regrouping your CSS rules. We’ve got used to critical CSS, but there are a few optimizations that could go beyond that. Harry Roberts conducted a remarkable research with quite surprising results. For example, it might be a good idea to split the main CSS file out into its individual media queries. That way, the browser will retrieve critical CSS with high priority, and everything else with low priority — completely off the critical path.
Do you stream responses? Often forgotten and neglected, streams provide an interface for reading or writing asynchronous chunks of data, only a subset of which might be available in memory at any given time. Basically, they allow the page that made the original request to start working with the response as soon as the first chunk of data is available, and use parsers that are optimized for streaming to progressively display the content.
Browser support? Still getting there with partial support in Chrome, Firefox, Safari and Edge supporting the API and Service Workers being supported in all modern browsers. And if you feel adventurous again, you can check an experimental implementation of streaming requests, which allows you to start sending the request while still generating the body. Available in Chrome 85.
Consider making your components connection-aware. Data can be expensive and with growing payload, we need to respect users who choose to opt into data savings while accessing our sites or apps. The Save-Data client hint request header allows us to customize the application and the payload to cost- and performance-constrained users.
Who is using save-data, you might be wondering? 18% of global Android Chrome users have Lite Mode enabled (with Save-Data on), and the number is likely to be higher. According to Simon Hearne’s research, the opt-in rate is highest on cheaper devices, but there are plenty of outliers. For example: users in Canada have an opt-in rate of over 34% (compared to ~7% in the US) and users on the latest Samsung flagship have an opt-in rate of almost 18% globally.
With the Save-Data mode on, Chrome Mobile will provide an optimized experience, i.e. a proxied web experience with deferred scripts, enforced font-display: swap and enforced lazy loading. It’s just more sensible to build the experience on your own rather than relying on the browser to make these optimizations.
In this context, Max Stoiber speaks of connection-aware components and Addy Osmani speaks of adaptive module serving. For example, with React, we could write a component that renders differently for different connection types. As Max suggested, a <Media /> component in a news article might output:
Offline: a placeholder with alt text,
2G / save-data mode: a low-resolution image,
3G on non-Retina screen: a mid-resolution image,
3G on Retina screens: high-res Retina image,
4G: an HD video.
Dean Hume provides a practical implementation of a similar logic using a service worker. For a video, we could display a video poster by default, and then display the “Play” icon as well as the video player shell, meta-data of the video etc. on better connections. As a fallback for non-supporting browsers, we could listen to canplaythrough event and use Promise.race() to timeout the source loading if the canplaythrough event doesn’t fire within 2 seconds.
If you want to dive in a bit deeper, here are a couple of resources to get started:
Consider making your components device memory-aware. Network connection gives us only one perspective at the context of the user though. Going further, you could also dynamically adjust resources based on available device memory, with the Device Memory API. navigator.deviceMemory returns how much RAM the device has in gigabytes, rounded down to the nearest power of two. The API also features a Client Hints Header, Device-Memory, that reports the same value.
Warm up the connection to speed up delivery. Use resource hints to save time on dns-prefetch (which performs a DNS lookup in the background), preconnect (which asks the browser to start the connection handshake (DNS, TCP, TLS) in the background), prefetch (which asks the browser to request a resource) and preload (which prefetches resources without executing them, among other things). Well supported in modern browsers, with support coming to Firefox soon.
Remember prerender? The resource hint used to prompt browser to build out the entire page in the background for next navigation. The implementations issues were quite problematic, ranging from a huge memory footprint and bandwidth usage to multiple registered analytics hits and ad impressions.
NoState Prefetch only uses ~45MiB of memory and subresources that are fetched will be fetched with an IDLE Net Priority. Since Chrome 69, NoState Prefetch adds the header Purpose: Prefetch to all requests in order to make them distinguishable from normal browsing.
Also, watch out for prerendering alternatives and portals, a new effort toward privacy-conscious prerendering, which will provide the inset preview of the content for seamless navigations.
Using resource hints is probably the easiest way to boost performance, and it works well indeed. When to use what? As Addy Osmani has explained, it’s reasonable to preload resources that we know are very likely to be used on the current page and for future navigations across multiple navigation boundaries, e.g. Webpack bundles needed for pages the user hasn’t visited yet.
Addy’s article on “Loading Priorities in Chrome” shows how exactly Chrome interprets resource hints, so once you’ve decided which assets are critical for rendering, you can assign high priority to them. To see how your requests are prioritized, you can enable a “priority” column in the Chrome DevTools network request table (as well as Safari).
Most of the time these days, we’ll be using at leastpreconnect and dns-prefetch, and we’ll be cautious with using prefetch, preload and prerender. Note that even with preconnect and dns-prefetch, the browser has a limit on the number of hosts it will look up/connect to in parallel, so it’s a safe bet to order them based on priority (thanks Philip Tellis!).
<!-- Loading two rendering-critical fonts, but not all their weights. -->
crossorigin="anonymous" is required due to CORS.
Without it, preloaded fonts will be ignored.
<link rel="preload" as="font"
media="only screen and (min-width: 48rem)" />
<link rel="preload" as="font"
media="only screen and (min-width: 48rem)" />
/ Adding a preload hint to the head /
var preload = document.createElement("link");
link.href = "myscript.js";
link.rel = "preload";
link.as = "script";
/ Injecting a script when we want it to execute /
var script = document.createElement("script");
script.src = "myscript.js";
A few gotchas to keep in mind: preload is good for moving the start download time of an asset closer to the initial request, but preloaded assets land in the memory cache which is tied to the page making the request. preload plays well with the HTTP cache: a network request is never sent if the item is already there in the HTTP cache.
A preload tag can initiate a preload only after the browser has received the HTML from the server and the lookahead parser has found the preload tag. Preloading via the HTTP header could be a bit faster since we don’t to wait for the browser to parse the HTML to start the request (it’s debated though).
Early Hints will help even further, enabling preload to kick in even before the response headers for the HTML are sent (on the roadmap in Chromium, Firefox). Plus, Priority Hints will help us indicate loading priorities for scripts.
Use service workers for caching and network fallbacks. No performance optimization over a network can be faster than a locally stored cache on a user’s machine (there are exceptions though). If your website is running over HTTPS, we can cache static assets in a service worker cache and store offline fallbacks (or even offline pages) and retrieve them from the user’s machine, rather than going to the network.
As suggested by Phil Walton, with service workers, we can send smaller HTML payloads by programmatically generating our responses. A service worker can request just the bare minimum of data it needs from the server (e.g. an HTML content partial, a Markdown file, JSON data, etc.), and then it can programmatically transform that data into a full HTML document. So once a user visits a site and the service worker is installed, the user will never request a full HTML page again. The performance impact can be quite impressive.
Browser support? Service workers are widely supported and the fallback is the network anyway. Does it help boost performance? Oh yes, it does. And it’s getting better, e.g. with Background Fetch allowing background uploads/downloads via a service worker as well.
There is plenty of great resources to get started with service workers:
Service Worker Mindset, which helps you understand how service workers work behind the scenes and things to understand when building one.
Chris Ferdinandi provides a great series of articles on service workers, explaining how to create offline applications and covering a variety of scenarios, from saving recently viewed pages offline to setting an expiration date for items in a service worker cache.
Service Worker: An Introduction with practical tips on how to use service worker for rich offline experiences, periodic background syncs and push notifications.
It’s always worth referring to good ol’ Jake Archibald’s Offline Cookbook with a number of recipes on how to bake your own service worker.
Workbox is a set of service worker libraries built specifically for building progressive web apps.
Are you running servers workers on the CDN/Edge, e.g. for A/B testing? At this point, we are quite used to running service workers on the client, but with CDNs implementing them on the server, we could use them to tweak performance on the edge as well.
Optimize rendering performance. Whenever the application is sluggish, it’s noticeable right away. So we need to make sure that there is no lag when scrolling the page or when an element is animated, and that you’re consistently hitting 60 frames per second. If that’s not possible, then at least making the frames per second consistent is preferable to a mixed range of 60 to 15. Use CSS’ will-change to inform the browser of which elements and properties will change.
Whenever you are experiencing, debug unnecessary repaints in DevTools:
Have you optimized for perceived performance? While the sequence of how components appear on the page, and the strategy of how we serve assets to the browser matter, we shouldn’t underestimate the role of perceived performance, too. The concept deals with psychological aspects of waiting, basically keeping customers busy or engaged while something else is happening. That’s where perception management, preemptive start, early completion and tolerance management come into play.
What does it all mean? While loading assets, we can try to always be one step ahead of the customer, so the experience feels swift while there is quite a lot happening in the background. To keep the customer engaged, we can test skeleton screens (implementation demo) instead of loading indicators, add transitions/animations and basically cheat the UX when there is nothing more to optimize.
In their case study on The Art of UI Skeletons, Kumar McMillan shares some ideas and techniques on how to simulate dynamic lists, text, and the final screen, as well as how to consider skeleton-thinking with React.
Do you prevent layout shifts and repaints? In the realm of perceived performance probably one of the more disruptive experiences is layout shifting, or reflows, caused by rescaled images and videos, web fonts, injected ads or late-discovered scripts that populate components with actual content. As a result, a customer might start reading an article just to be interrupted by a layout jump above the reading area. The experience is often abrupt and quite disorienting: and that’s probably a case of loading priorities that need to be reconsidered.
The community has developed a couple of techniques and workarounds to avoid reflows. In general, it’s a good idea to avoid inserting new content above existing content, unless it happens in response to a user interaction. Always set widthand height attributes on images, so modern browsers allocate the box and reserve the space by default (Firefox, Chrome).
For both images or videos, we can use an SVG placeholder to reserve the display box in which the media will appear in. That means that the area will be reserved properly when you need to maintain its aspect ratio as well. We can also use placeholders, or fallback images for ads and dynamic content, as well as pre-allocate layout slots.
As mentioned above, always group web font repaints and transition from all fallback fonts to all web fonts at once — just make sure that that switch isn’t too abrupt, by adjusting line-height and spacing between the fonts with font-style-matcher.
If you deal with headers that become fixed or sticky positioned to the top of the page on scroll, reserve space for the header when it becomes pineed, e.g. with a placeholder element or margin-top on the content. An exception should be cookie consent banners that shouldn’t have impact on CLS, but sometimes they do: it depends on the implementation. There are a few interesting strategies and takeaways in this Twitter thread.
For a tab component that might include various amount of texts, you can prevent layout shifts with CSS grid stacks. By placing the content of each tab in the same grid area, and hiding one of them at a time, we can ensure that the container always takes the height of the larger element, so no layout shifts will occur.
Ah, and of course, infinite scrolling and “Load more” can cause layout shifts as well if there is content below the list (e.g. footer). To improve CLS, reserve enough space for content that would be loaded in before the user scrolls to that part of the page, remove the footer or any DOM elements at the bottom of the page that may be pushed down by content loading in. Also, prefetch data and images for below-the-fold content so that by the time a user scrolls that far, it’s already there. You can use list virtualization libraries like react-window to optimize long lists as well (thanks, Addy Osmani!).
To ensure that the impact of reflows is contained, measure the layout stability with the Layout Instability API. With it, you can calculate the Cumulative Layout Shift (CLS) score and include it as a requirement in your tests, so whenever a regression appears, you can track it and fix it.
To calculate the layout shift score, the browser looks at the viewport size and the movement of unstable elements in the viewport between two rendered frames. Ideally, the score would be close to 0. There is a great guide by Milica Mihajlija and Philip Walton on what CLS is and how to measure it. It’s a good starting point to measure and maintain perceived performance and avoid disruption, especially for business-critical tasks.
Quick tip: to discover what caused a layout shift in DevTools, you can explore layout shifts under “Experience” in the Performance Panel.
Is OCSP stapling enabled? By enabling OCSP stapling on your server, you can speed up your TLS handshakes. The Online Certificate Status Protocol (OCSP) was created as an alternative to the Certificate Revocation List (CRL) protocol. Both protocols are used to check whether an SSL certificate has been revoked. However, the OCSP protocol does not require the browser to spend time downloading and then searching a list for certificate information, hence reducing the time required for a handshake.
Have you reduced the impact of SSL certificate revocation? In his article on “The Performance Cost of EV Certificates”, Simon Hearne provides a great overview of common certificates, and the impact a choice of a certificate may have on the overall performance.
As Simon writes, in the world of HTTPS, there are a few types of certificate validation levels used to secure traffic:
Domain Validation (DV) validates that the certificate requestor owns the domain,
Organisation Validation (OV) validates that an organisation owns the domain,
Extended Validation (EV) validates that an organisation owns the domain, with rigorous validation.
It’s important to note that all of these certificates are the same in terms of technology; they only differ in information and properties provided in those certificates.
EV certificates are expensive and time-consuming as they require a human to reviewing a certificate and ensuring its validity. DV certificates, on the other hand, are often provided for free — e.g. by Let’s Encrypt — an open, automated certificate authority that’s well integrated into many hosting providers and CDNs. In fact, at the time of writing, it powers over 225 million websites (PDF), although it makes for only 2.69% of the pages (opened in Firefox).
So what’s the problem then? The issue is that EV certificates do not fully support OCSP stapling mentioned above. While stapling allows the server to check with the Certificate Authority if the certificate has been revoked and then add (“staple”) this information to the certificate, without stapling the client has to do all the work, resulting in unnecessary requests during the TLS negotiation. On poor connections, this might end up with noticeable performance costs (1000ms+).
EV certificates aren’t a great choice for web performance, and they can cause a much bigger impact on performance than DV certificates do. For optimal web performance, always serve an OCSP stapled DV certificate. They are also much cheaper than EV certificates and less hassle to acquire. Well, at least until CRLite is available.
Have you adopted IPv6 yet? Because we’re running out of space with IPv4 and major mobile networks are adopting IPv6 rapidly (the US has almost reached a 50% IPv6 adoption threshold), it’s a good idea to update your DNS to IPv6 to stay bulletproof for the future. Just make sure that dual-stack support is provided across the network — it allows IPv6 and IPv4 to run simultaneously alongside each other. After all, IPv6 is not backwards-compatible. Also, studies show that IPv6 made those websites 10 to 15% faster due to neighbor discovery (NDP) and route optimization.
If you’re still running on HTTP, the most time-consuming task will be to migrate to HTTPS first, and then adjust your build process to cater for HTTP/2 multiplexing and parallelization. Bringing HTTP/2 to Gov.uk is a fantastic case study on doing just that, finding a way through CORS, SRI and WPT along the way. For the rest of this article, we assume that you’re either switching to or have already switched to HTTP/2.
Properly deploy HTTP/2. Again, serving assets over HTTP/2 can benefit from a partial overhaul of how you’ve been serving assets so far. You’ll need to find a fine balance between packaging modules and loading many small modules in parallel. At the end of the day, still the best request is no request, however, the goal is to find a fine balance between quick first delivery of assets and caching.
On the other hand, packaging still matters. By using many small scripts, overall compression will suffer and the cost of retrieving objects from the cache will increase. The compression of a large package will benefit from dictionary reuse, whereas small separate packages will not. There’s standard work to address that, but it’s far out for now. Secondly, browsers have not yet been optimized for such workflows. For example, Chrome will trigger inter-process communications (IPCs) linear to the number of resources, so including hundreds of resources will have browser runtime costs.
What to do? Well, if you’re running over HTTP/2, sending around 6–10 packages seems like a decent compromise (and isn’t too bad for legacy browsers). Experiment and measure to find the right balance for your website.
Do we send all assets over a single HTTP/2 connection? One of the main advantages of HTTP/2 is that it allows us to send assets down the wire over a single connection. However, sometimes we might have done something wrong — e.g. have a CORS issue, or misconfigured the crossorigin attribute, so the browser would be forced to open a new connection.
To check whether all requests use a single HTTP/2 connection, or something’s misconfigured, enable the “Connection ID” column in DevTools → Network. E.g., here, all requests share the same connection (286) — except manifest.json, which opens a separate one (451).
Do your servers and CDNs support HTTP/2? Different servers and CDNs (still) support HTTP/2 differently. Use CDN Comparison to check your options, or quickly look up how your servers are performing and which features you can expect to be supported.
While on it, double check if your kernel supports TCP BBR and enable it if possible. It’s currently used on Google Cloud Platform, Amazon Cloudfront, Linux (e.g. Ubuntu).
Is HPACK compression in use? If you’re using HTTP/2, double-check that your servers implement HPACK compression for HTTP response headers to reduce unnecessary overhead. Some HTTP/2 servers may not fully support the specification, with HPACK being an example. H2spec is a great (if very technically detailed) tool to check that. HPACK’s compression algorithm is quite impressive, and it works.
Robin Marx has explained HTTP/3 very well, and the following explanation is based on his explanation. In its core, HTTP/3 is very similar to HTTP/2 in terms of features, but under the hood it works very differently. HTTP/3 provides a number of improvements: faster handshakes, better encryption, more reliable independent streams, better encryption and flow control. A noteable difference is that HTTP/3 uses QUIC as the transport layer, with QUIC packets encapsulated on top of UDP diagrams, rather than TCP.
QUIC fully integrates TLS 1.3 into the protocol, while in TCP it’s layered on top. In the typical TCP stack, we have a few round-trip times of overhead because TCP and TLS need to do their own separate handshakes, but with QUIC both of them can be combined and completed in just a single round trip. Since TLS 1.3 allows us to set up encryption keys for a consequent connection, from the second connection onward, we can already send and receive application layer data in the first round trip, which is called “0-RTT”.
Also, header compression algorithm of HTTP/2 was entirely rewritten, along with its prioritization system. Plus, QUIC support connection migration from Wi-Fi to cellular network via connection IDs in the header of each QUIC packet. Most of the implementations are done in the user space, not kernel space (as it’s done with TCP), so we should be expecting the protocol to be evolving in the future.
Would it all make a big difference? Probably yes, especially having an impact on loading times on mobile, but also on how we serve assets to end users. While in HTTP/2, multiple requests share a connection, in HTTP/3 requests also share a connection but stream independently, so a dropped packet no longer impacts all requests, just the one stream.
The bottom line: if you have an option to use HTTP/3 on the server and on your CDN, it’s probably a very good idea to do so. The main benefit will come from fetching multiple objects simultaneously, especially on high-latency connections. We don’t know for sure yet as there isn’t much research done in that space, but first results are very promising.
If you want to dive more into the specifics and advantages of the protocol, here are some good starting points to check:
HTTP/3 Explained, a collaborative effort to document the HTTP/3 and the QUIC protocols. Available in various languages, also as PDF.
An Academic’s Guide to QUIC with Robin Marx introduces basic concepts of the QUIC and HTTP/3 protocols, explains how HTTP/3 handles head-of-line blocking and connection migration, and how HTTP/3 is designed to be evergreen (thanks, Simon!).
Take a look at the recently released AutoWebPerf, a modular tool that enables automatic gathering of performance data from multiple sources. For example, we could set a daily test on your critical pages to capture the field data from CrUX API and lab data from a Lighthouse report from PageSpeed Insights.
Have you tested in proxy browsers and legacy browsers? Testing in Chrome and Firefox is not enough. Look into how your website works in proxy browsers and legacy browsers. UC Browser and Opera Mini, for instance, have a significant market share in Asia (up to 35% in Asia). Measure average Internet speed in your countries of interest to avoid big surprises down the road. Test with network throttling, and emulate a high-DPI device. BrowserStack is fantastic for testing on remote real devices, and complement it with at least a few real devices in your office as well — it’s worth it.
Have you tested the performance of your 404 pages? Normally we don’t think twice when it comes to 404 pages. After all, when a client requests a page that doesn’t exist on the server, the server is going to respond with a 404 status code and the associated 404 page. There isn’t that much to it, isn’t it?
Make sure to examine and optimize the caching strategy for your 404 pages. Our goal is to serve HTML to the browser only when it expects an HTML response, and return a small error payload for all other responses. According to Matt, “if we place a CDN in front of our origin, we have the chance to cache the 404 page response on the CDN. That’s useful because without it, hitting a 404 page could be used as a DoS attack vector, by forcing the origin server to respond to every 404 request rather than letting the CDN respond with a cached version.”
Not only can 404 errors hurt your performance, but they can also cost in traffic, so it’s a good idea to include a 404 error page in your Lighthouse testing suite, and track its score over time.
Have you tested the performance of your GDPR consent prompts? In times of GDPR and CCPA, it has become common to rely on third-parties to provide options for EU customers to opt in or opt out from tracking. However, like with any other third-party script, their performance can have a quite devastating impact on the entire performance effort.
Of course, the actual consent is likely to change the impact of scripts on the overall performance, so, as Boris Schapira noted, we might want to study a few different web performance profiles:
Consent was entirely refused,
Consent was partially refused,
Consent was entirely given.
User hasn’t acted on consent prompt (or the prompt was blocked by a content blocker),
In general, it’s worth looking into the pop-up performance as you will need to determine the horizontal or vertical offset of the mouse event and correctly position the popup relatively to the anchor. Noam Rosenthal shares Wikimedia team’s learnings in the article Web performance case study: Wikipedia page previews (also available as video and minutes).
Do you keep a performance diagnostics CSS? While we can include all kinds of checks to ensure that non-performant code gets deployed, often it’s useful to get a quick idea of some of the low-hanging fruits that could be solved easily. For that, we could use Tim Kadlec’s brilliant Performance Diagnostics CSS (inspired by Harry Roberts’ snippetwhich highlights lazy-loaded images, unsized images, legacy format images and synchronous scripts.
E.g. you might want to ensure that no images above the fold are lazy loaded. You can customize the snippet for you needs, e.g. to highlight web fonts that aren’t used, or detect icon fonts. A great little tool to ensure that mistakes are visible during debugging, or just to audit the current project very quickly.
Have you tested the impact on accessibility? When the browser starts to load a page, it builds a DOM, and if there is an assistive technology like a screen reader running, it also creates an accessibility tree. The screen reader then has to query the accessibility tree to retrieve the information and make it available to the user — sometimes by default, and sometimes on demand. And sometimes it takes time.
When talking about fast Time to Interactive, usually we mean an indicator of how soon a user can interact with the page by clicking or tapping on links and buttons. The context is slightly different with screen readers. In that case, fast Time to Interactive means how much time passes by until the screen reader can announce navigation on a given page and a screen reader user can actually hit keyboard to interact.
Léonie Watson has given an eye-opening talk on accessibility performance and specifically the impact slow loading has on screen reader announcement delays. Screen readers are used to fast-paced announcements and quick navigation, and therefore might potentially be even less patient than sighted users.
Is continuous monitoring set up? Having a private instance of WebPagetest is always beneficial for quick and unlimited tests. However, a continuous monitoring tool — like Sitespeed, Calibre and SpeedCurve — with automatic alerts will give you a more detailed picture of your performance. Set your own user-timing marks to measure and monitor business-specific metrics. Also, consider adding automated performance regression alerts to monitor changes over time.
Look into using RUM-solutions to monitor changes in performance over time. For automated unit-test-alike load testing tools, you can use k6 with its scripting API. Also, look into SpeedTracker, Lighthouse and Calibre.
This list is quite comprehensive, and completing all of the optimizations might take quite a while. So, if you had just 1 hour to get significant improvements, what would you do? Let’s boil it all down to 17 low-hanging fruits. Obviously, before you start and once you finish, measure results, including Largest Contentful Paint and Time To Interactive on a 3G and cable connection.
Measure the real world experience and set appropriate goals. Aim to be at least 20% faster than your fastest competitor. Stay within Largest Contentful Paint < 2.5s, a First Input Delay < 100ms, Time to Interactive < 5s on slow 3G, for repeat visits, TTI < 2s. Optimize at least for First Contentful Paint and Time To Interactive.
Prepare critical CSS for your main templates, and inline them in the <head> of each template. For CSS/JS, operate within a critical file size budget of max. 170KB gzipped (0.7MB decompressed).
Trim, optimize, defer and lazy-load scripts. Invest in the config of your bundler to remove redundancies and check lightweight alternatives.
Always self-host your static assets and always prefer to self-host third-party assets. Limit the impact of third-party scripts. Use facades, load widgets on interaction and beware of anti-flicker snippets.
Be selective when choosing a framework. For single-page-applications, identify critical pages and serve them statically, or at least prerender them, and use progressive hydration on component-level and import modules on interaction.
Client-side rendering alone isn’t a good choice for performance. Prerender if your pages don’t change much, and defer the booting of frameworks if you can. If possible, use streaming server-side rendering.
Experiment with regrouping your CSS rules and test in-body CSS.
Add resource hints to speed up delivery with faster dns-lookup, preconnect, prefetch, preload and prerender.
Subset web fonts and load them asynchronously, and utilize font-display in CSS for fast first rendering.
Check that HTTP cache headers and security headers are set properly.
Enable Brotli compression on the server. (If that’s not possible, at least make sure that Gzip compression is enabled.)
Enable TCP BBR congestion as long as your server is running on the Linux kernel version 4.9+.
Enable OCSP stapling and IPv6 if possible. Always serve an OCSP stapled DV certificate.
Enable HPACK compression for HTTP/2 and move to HTTP/3 if it’s available.
Download The Checklist (PDF, Apple Pages)
With this checklist in mind, you should be prepared for any kind of front-end performance project. Feel free to download the print-ready PDF of the checklist as well as an editable Apple Pages document to customize the checklist for your needs:
Some of the optimizations might be beyond the scope of your work or budget or might just be overkill given the legacy code you have to deal with. That’s fine! Use this checklist as a general (and hopefully comprehensive) guide, and create your own list of issues that apply to your context. But most importantly, test and measure your own projects to identify issues before optimizing. Happy performance results in 2021, everyone!
A huge thanks to Guy Podjarny, Yoav Weiss, Addy Osmani, Artem Denysov, Denys Mishunov, Ilya Pukhalski, Jeremy Wagner, Colin Bendell, Mark Zeman, Patrick Meenan, Leonardo Losoviz, Andy Davies, Rachel Andrew, Anselm Hannemann, Barry Pollard, Patrick Hamann, Gideon Pyzer, Andy Davies, Maria Prosvernina, Tim Kadlec, Rey Bango, Matthias Ott, Peter Bowyer, Phil Walton, Mariana Peralta, Pepijn Senders, Mark Nottingham, Jean Pierre Vincent, Philipp Tellis, Ryan Townsend, Ingrid Bergman, Mohamed Hussain S. H., Jacob Groß, Tim Swalling, Bob Visser, Kev Adamson, Adir Amsalem, Aleksey Kulikov and Rodney Rehm for reviewing this article, as well as our fantastic community which has shared techniques and lessons learned from its work in performance optimization for everybody to use. You are truly smashing!
About us and this blog
We are a digital marketing company with a focus on helping our customers achieve great results across several key areas.