BigQuery & Basics

GA4 Users Explained

Let’s dive into users in GA4 we’re going to answer:

  • The basics about users
    • Why are there so many different user metrics?
    • How does GA4 identify a user?
  • How do users work in BigQuery
    • How can I identify users in BigQuery
    • Example queries

As with all our posts we’ve got the same 3 goals:

  1. Understand the fundamentals of how it all works.
  2. Apply that to using GA4.
  3. Get comfortable using GA4 BigQuery Export knowing the principles.

Let’s dive in.

When getting stuck into GA4, you’ll notice that Google uses the term "User" fairly liberally which doesn’t make it the easiest thing to get your head round.

Before we can dive into anything else we need to get our definitions correct.

Let’s get our names straight, because GA4 makes this confusing

First up:

GA4 has two different metrics for just measuring “Users”:

  • Active Users
  • Total Users

We’ll always use one of these to be as specific as possible.

GA4 also has multiple ways of identifying users

  • Pseudo Users / Non-logged In Users
  • User ID / Logged in Users

We’re going to call those:

  • Non logged users: Clients
  • Logged in Users: User IDs

So it’s super obvious which one we mean we’ll use both e.g. we might talk about Active Users (Clients) Or Total User (User IDs).

What are the different user metrics?

What are “New Users”?

New users are users who have visited the site for the first time. i.e. we haven’t tracked them.

What are “Returning Users“?

Returning users are simply users who have visited the site previously.

Google Analytics keeps track of is someone has visited before with a cookie.

What are “Active Users”?

This is the number of different users who visited your site/ app and had some level interaction with the site.

To be an active user, you either need to:

  1. Be a “New user” (i.e be visiting the site for the first time) or
  2. Be a “Returning user” with an “engaged session” which means that they had at least a single session in which at least one of these criteria are met:
    • The session lasted for more than 10 seconds
    • The session featured a conversion event
    • Included at least two pageviews (or two screenviews on an app).
  3. Google Analytics has collected a engagement_time_msec event

Should a new user count as active? Up for discussion.

What are “Total Users”?

This is every user who visited your site, whether they were engaged or not!

It's important to note that total users can’t be calculated by adding new and returning users.

An example - A single user who visits the site twice would be counted once as a new user and then again as a returning user.

How does GA4 identify a user?

GA4 has two different ways you can identify users.

  • Pseudo user/client (This is the default)
  • User ID

What is a pseudo user/non logged in users/client ID?

By default GA4 shows pseudo users (although it will just call them users).

These aren't actual users, but are unique device-browser pairings which are used to act as a proxy for users.

So, if someone visits your site from their phone and later from their laptop, that counts as two different "Users," even though it’s the same person behind both devices.

Even if they visited from the same phone, but used Chrome for one visit and Firefox for another it would be two different visits.

pseudo_user_example

What are these called in BigQuery?

  • In BigQuery these pseudo users are named user_pseudo_id .
  • In GA4 everything is a “User” for both pseudo users and user ID.
  • In Piped Out, we refer to these as Client IDs
    • We’ve maintained this name from Universal Analytics.

Here’s an example.

default_users_in_interface

What is User ID (user_id )?

User ID offers a more accurate way to track individual users.

An ID is assigned to each user, usually through a login, and follows their activity across numerous sessions, devices, and browsers.

So if a logged in user accessed the site on their laptop in the morning and then on their phone in the afternoon, this is one single user_id.

single_user

How does GA4 show this?

In BigQuery it’s literally called user_id .

In the GA4 interface app, no metrics will change name, it’ll just be using your new User ID metric behind the scenes.

How do you get this?

You have to set this up and then tell GA4 you’re doing it. We’ll write something up on this in the future.

If you set it up, GA4 will then use user_id when available.

Is this important for you?

If you have a lot of users who login then yes, otherwise no.

Why does user get it’s own dimensions?

A super important question for marketing is:

  • Where did this user come from?

Unlike sessions which have one answer, a user has multiple answers.

how_do_we_decide

  • Should it be the very first one?
  • Maybe the afternoon session?
  • Maybe the morning one?

For users GA4 shows you the traffic source which brought them to the site first.

first_user_channel_organic_search

This is why in the interface you see:

  • First User Medium
  • First User Default Channel Grouping

Crucially this isn’t not just today, but over all time. Literally the first time they ever arrived.

first_user_dimensions_all_time

Why does GA4 do this?

It does this (we suspect) because they were thinking about apps rather than websites.

With an app the first ever source is someone installing your app, so it is really important.

With the web it’s not quite the same?

Trying to do better

If you’re rolling your own GA4 user tables (or building a product like us) we think there’s probably a better compromise for users.

And that’s to use the first traffic user that brought a user to the site on a single day.

It allows us to see how users arrive, whilst still acknowledging that particularly on the web, the first ever source just isn’t as important as it is on app.

How does User Level data look in the GA4 BigQuery export?

The BigQuery export has several columns which are specific to users - which are often used with session based parameter in order to derive insights about their traffic acquisition.

The primary columns we care about are:

  • user_id - The “real” user ID which is based on a unique identifier of a logged in user.
  • user_pseudo_id - The “default” user which is actually a unique device browser pairing.
  • user_properties** which are parameterised pieces of information about the user. This is where a lot of the specific information is about the user.

How can I pull the common User Metrics from the GA4 BigQuery export?

Customise the queries.

Set-up variables to customise our BigQuery queries for your setup.

Calculating Total Users (Client IDs)

This is the total number of unique Client IDs (i.e the number of unique device - browser pairings).

We’re just counting user_pseudo_id.

select
    -- This is the distinct count of Client ID
    count(distinct user_pseudo_id) as total_users
from
    -- This table will need to be updated
    `@project.analytics_@analytics_property_id.@event_type*`
where
    -- The date range being selected will need to be updated
    _table_suffix between '@start_date' and '@end_date'

Calculating News Users (Client IDs)

This is the total number of new Client IDs (i.e the number of new unique device - browser pairings).

New users can be identified in BigQuery by either:

  1. Looking if the users logged the first_visit or first_open event (though this isn’t consistently).
  2. Looking at the session_number parameter. This identifies the number of sessions that a user has had leading up to the session. If this number is 1, then we know they are a new user.

This is calculated by counting the distinct number of Client IDs where the ga_session_number = 1.

-- First CTE selects the event name, user_pseudo_id and unnests some event_params 
 with
    unnested_events as (
        select
            event_name,
            user_pseudo_id,
            (
                select params.key
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as param_key,
            (
                select params.value.int_value
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as ga_session_number_value
            from
                -- This table will need to be updated
                `@project.analytics_@analytics_property_id.@event_type*`
            where
                -- The date range being selected will need to be updated
                _table_suffix between '@start_date' and '@end_date'
    )
    -- Counts the distinct number of users which match the critera
    select count(distinct(user_pseudo_id)) as new_users
    from unnested_events
    where
        -- Sets the conditions which are that the
        -- this is the first session for a user.
        param_key = 'ga_session_number' and ga_session_number_value = 1
;

Calculating Active Users (Client IDs)

This is the total number of unique Client IDs (i.e the number of unique device - browser pairings) where the user was active.

A user is active if:

  • They’re a new user
  • Have an engaged session OR
  • Google Analytics has collected an engagement_time_msec.

That means we’re calculating:

  • New users - To detect new users, we are looking for ga_session_number = 1 (i.e. a first session) instead of looking for first_visit/ first_open as this event is sometimes missing in a session.
  • Engaged sessions - We are looking for an session_engaged = 1
  • Engagement Time - We are looking for an engagement_time_msec greater than 0.
-- This extracts the Client ID (user_pseudo_id) and unnests some event_params
with
    unnest_data as (
        select
            user_pseudo_id,
            event_name,
            (
                select value.string_value
                from unnest(event_params)
                where key = 'session_engaged'
            ) as session_engaged,
        
            (
                select params.key
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as session_number_key,

            (
                select params.value.int_value
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as session_number_value,

            ( 
                select value.int_value
                from unnest(event_params)
                where key = 'engagement_time_msec'
            ) as engagement_time_msec,

            from
                -- This table will need to be updated
                `@project.analytics_@analytics_property_id.@event_type*`
            where
                -- The date range being selected will need to be updated
                _table_suffix between '@start_date' and '@end_date'
    )

    -- Count distinct users based on engagement time, session engagement
    -- or whether there was a new user session.
    select
        count(
            distinct case
                when
                    (session_engaged = '1')
                    or (session_number_key = 'ga_session_number'
                        or session_number_value = 1
                    ) or (engagement_time_msec > 0)
                then user_pseudo_id
            end
        ) as active_users
        from unnest_data
;

Calculating Returning Users (Client IDs)

This is the total number of returning Client IDs (i.e the number of new unique device - browser pairings).

This is calculated by counting the distinct number of Client IDs where the ga_session_number is greater than 1.

-- First CTE selects the event name, user_pseudo_id and unnests some event_params 
with
    unnested_events as (
        select
            event_name,
            user_pseudo_id,
            (
                select params.key
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as param_key,
            (
                select params.value.int_value
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as ga_session_number_value
            from
                -- This table will need to be updated
                `@project.analytics_@analytics_property_id.@event_type*`
            where
                -- The date range being selected will need to be updated
                _table_suffix between '@start_date' and '@end_date'
    )
    -- Counts the distinct number of users which match the critera
    select count(distinct(user_pseudo_id)) as new_users
    from unnested_events
    where
        -- Sets the conditions which are that the
        -- this is the first session for a user.
        param_key = 'ga_session_number' and ga_session_number_value > 1
    ;

User ID

The following queries are the exact same but looking at User IDs (i.e logged in users) instead of pseudo_user_id .

Calculating Total Users (User IDs)

select
    -- This is the distinct count of User ID
    count(distinct user_id) as total_users
from
    -- This table will need to be updated
    `@project.analytics_@analytics_property_id.@event_type*`
where
    -- The date range being selected will need to be updated
    _table_suffix between '@start_date' and '@end_date'

Calculating Active Users (User IDs)

-- This extracts the User ID and unnests some event_params
with
    unnest_data as (
        select
            user_id,
            event_name,
            (
                select value.string_value
                from unnest(event_params)
                where key = 'session_engaged'
            ) as session_engaged,
        
            (
                select params.key
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as session_number_key,

            (
                select params.value.int_value
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as session_number_value,

            ( 
                select value.int_value
                from unnest(event_params)
                where key = 'engagement_time_msec'
            ) as engagement_time_msec,

            from
                -- This table will need to be updated
                `@project.analytics_@analytics_property_id.@event_type*`
            where
                -- The date range being selected will need to be updated
                _table_suffix between '@start_date' and '@end_date'
    )

    -- Count distinct users based on engagement time, session engagement
    -- or whether there was a new user session.
    select
        count(
            distinct case
                when
                    (session_engaged = '1')
                    or (session_number_key = 'ga_session_number'
                        or session_number_value = 1
                    ) or (engagement_time_msec > 0)
                then user_id
            end
        ) as active_users
        from unnest_data
    ;

Calculating News Users (User IDs)

-- First CTE selects the event name, user_id and unnests some event_params 
 with
    unnested_events as (
        select
            event_name,
            user_id,
            (
                select params.key
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as param_key,
            (
                select params.value.int_value
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as ga_session_number_value
            from
                -- This table will need to be updated
                `@project.analytics_@analytics_property_id.@event_type*`
            where
                -- The date range being selected will need to be updated
                _table_suffix between '@start_date' and '@end_date'
    )
    -- Counts the distinct number of users which match the critera
    select count(distinct(user_id)) as new_users
    from unnested_events
    where
        -- Sets the conditions which are that the
        -- this is the first session for a user.
        param_key = 'ga_session_number' and ga_session_number_value = 1
    ;

Calculating Returning Users (User IDs)

-- First CTE selects the event name, user_id and unnests some event_params 
 with
    unnested_events as (
        -- First CTE selects the event name, client ID and 
        -- unnests event_params
        select
            event_name,
            user_id,
            (
                select params.key
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as param_key,
            (
                select params.value.int_value
                from unnest(event_params) as params
                where params.key = 'ga_session_number'
            ) as ga_session_number_value
            from
                -- This table will need to be updated
                `@project.analytics_@analytics_property_id.@event_type*`
            where
                -- The date range being selected will need to be updated
                _table_suffix between '@start_date' and '@end_date'
    )
    -- Counts the distinct number of users which match the critera
    select count(distinct(user_id)) as new_users
    from unnested_events
    where
        -- Sets the conditions which are that the
        -- this is the first session for a user.
        param_key = 'ga_session_number' and ga_session_number_value > 1
    ;