Dexecure performance engineering blog

Chrome User Experience Report Explained with Google BigQuery

Ratul Saha
Ratul Saha Nov 27, 2017
8 minutes read

Google recently announced Chrome User Experience Report (CrUX), consisting data from the usage of 10,000 websites in Google Chrome. In this post, we will further explain the report, its importance to businesses and the potential pitfalls when analyzing the data. We are also announcing a series of blog posts where we will analyze the data to answer exciting questions regarding performance, user experience and the status of the web. Onwards!

Quantifying User Experience and Performance

HTTP Archive, a trove of data on website performance has lead to some of the most exciting discussions and surveys. However, all of these data points constitute what is known as synthetic measurement, since these numbers are obtained from pages loaded from a browser in a datacenter and not actual users. That changes with the Chrome User Experience Report that Google launched at the recent Chrome Dev Summit. This Real User Monitoring (RUM) data for Google Chrome users include key user experience metrics for top websites.

What it means for developers: The report focuses on quantifying the user experience from real Google Chrome users with data such as the Time to First Paint (TFP) or DOM content loaded. Developers can use these numbers to figure out good performance budgets for their web application (you are loading faster than your competitor, aren’t you?).

What it means for businesses: Analyzed thoroughly, this report will lead to new well-defined Key Performance Indicators (KPIs), based on the industry the business is in. For example, an ecommerce business would be able to compare data for other ecommerce websites like Amazon and see how their own RUM data stands up to them across different connection types and devices.

Using BigQuery to Make Sense of it All

The Chrome User Experience Report is preloaded into Google BigQuery as a public dataset. This means that anyone can play with the data, and start for free using the trial. We will use BigQuery Web UI and its Python Client library to explain and analyze the data. The Web UI is pretty easy to get started with. All queries we used can be found here.

The data is loaded into a BigQuery table and you can also create your own dataset by saving data as tables and views. Views are essentially virtual tables defined by a Query. We will be using standardSQL for all queries, and it can be enabled by clicking ‘view options’ under the query box (in Web UI), or adding #standardSQL at the start of Query. Query definitions and examples can be found here and we will explain most of the queries we use as well.

What Makes the Data Interesting

The detailed definitions of the data fields in Chrome User Experience Report can be found here. In a nutshell, the data includes key performance metrics for a website (interchangably, an origin): - Time to First Paint (TFP), - Time to First Contentful Paint (TFCP), - DOM content loaded, - onload.

The data is defined across multiple dimensions such as effective connection type (from offline, 2G to 4G) and device type (mobile, tablet or desktop).

Data aggregation

To mask details of website visitor data and aggregate (what one can assume) the massive set of data points from users across the world, the results are presented as aggregated metrics across dimensions. This means that you cannot find out the exact load time of a single website loaded with a particular connection (say, 4G) on a particular device type (say, mobile). You can only find out the what the density of the load time is in a given interval. For example, you may be able to conclude that 0.46% of users loading the page in this environment have a load time between 5000 - 6000 ms.

Imagine Google Chrome sent (with user consent, of course) records of 1000 instances when was opened. The device type, the effective connection type and other performance-related data such as the TFP was also recorded for each instance. Instead of making each data point public, the report aggregates it. For example, if 46 users (out of 1000) using a 4G connection from mobile experienced onload times between 5000 and 6000ms, the report will say onload.histogram.bin.start = 5000, onload.histogram.bin.end = 6000, onload.histogram.bin.density = 0.0046 (indicating 0.46%) for origin = '' with = '4G' and = 'phone'. Note that the way that the data is aggregated, even the total number of data points cannot be deduced (since only the density is known). This means that the data has to be analysed on an aggregated basis and cannot be analysed on a per-user level.

So, if you add up the onload densities across all dimensions for a single origin, you will get 1 (or a value very close to 1 due to approximations). Let’s do that with a query!

Query used:

Replace with any website from the data (get the list of all websites here, courtesy Rick Viscomi).

Sidenote: We tried to keep the queries and the overall analysis simple and easily reproducible. Many of the queries could probably be further consolidated. Calling all BigQuery ninjas to fork the gists!

Complex data, handle with care!

The report has data for ‘only’ 10,000 websites (compared to hundreds of thousands in HTTP Archive). However, there are subtle pitfalls you have to avoid when analyzing the data.

Cross-origin analysis

As mentioned in the developer documentation as well, any cross-origin analysis has to be dealt with very carefully. For example, the density values will not add up to 1 for one particular dimension (say, effective connection type). So when comparing user experience across dimensions, ensure that you have normalized the data whenever necessary. An example is present in our next blog post.

Origin-specific Data

Understandably, the origin-specific data is not always present across all dimensions. For example, all websites do not have data for all effective connection types. Let us see that with a query!

Query used (for 4G connections):

Running it for all connection types gives the following result:

Effective Connection Type # origins
4G 10000
3G 8172
slow-2G 489
2G 186
offline 12

As you can see, not all origins have data across all connection types and you need to be extra careful when analyzing origins across effective connection types.

Diving into the Important Questions.. Wheee!

Is it happening? - the dreaded question.

It is well-established that load time is not the ultimate metric for measuring performance. For example, it does not capture an important question that a user might ask: “is it happening”? The universal time limit before the user starts to question responsiveness is 1 second. For a webpage, the important metric that embodies first experience of responsiveness is TFCP - how long did it take between a user navigating to a new page and some content on the screen. Thus, we will focus on the expectation “TFCP within 1 second”. It is definitely an ambitious goal to achieve across real-world connections.

Site Experience Benchmark (SEB) - the probability of the TFCP being less than 1 second for a website.

Due to the way that the data is structured, it is more effective to question the probability that the TFCP is within a certain time limit instead of finding out the exact TFCP. We introduce a new metric - Site Experience Benchmark (SEB) - defined as the probability of the TFCP being less than 1 second for a website. Understandably, SEB will always fall between 0 and 1, inclusive. Higher the SEB, the faster the website starts to load.

SEB can be calculated from the report using the formula SEB = (total density of TFCP under 1 second) / (total density of FCP for all data points). This, in particular, will be the sum of histogram densities of FCP where histogram end is less than of equal to 1000.

One can ask SEB under different conditions, such as what is the SEB for when using a 3G connection? Or what is the SEB for when using phones? Or even a combined metric, what is the SEB for in 3G connections of phones? We can, for example, compute SEB for 3G connection as (total density of FCP under 1 second in 3G) / (total density of FCP in 3G).

The key usefulness of this report is to answer some important questions on user experience from the perspective of real-world usage. In the coming weeks, we will analyze the report for a number of different questions in separate blog posts. We will update the following list as we go along, so do not forget to subscribe to our blog!

(Inian Parameshwaran and Shouvik Sardar contributed to this article. Thanks to Rick Viscomi for reviewing the draft of this article.)
Stay updated, because #perfmatters!

Start using a performance-first content delivery network now

First 14 days are free. No credit card is required. Cancel anytime.