BigQuery & Basics

GA4 Sessions Explained

Lets dive into the humble session. We’re going to cover:

  • Basics & Interface
    • What is a session?
    • How can we make sessions more useful?
    • What are engaged sessions & how can we customise them?
  • How do sessions work in BigQuery?
    • How does it work?
    • Why don’t BigQuery sessions match GA4 UI sessions?
    • Example GA4 BQ Queries

Our goal here is to give you a solid understanding of how sessions work on GA4.

Once you've got that, we'll show you how to start using it in BigQuery. If you already know the basics skip ahead to:

Let’s dive in.

What is a session on GA4?

When someone visits a site, the actions they take during that visit are grouped into a “session”.

This session includes everything they do, such as viewing different pages, making a purchase or clicking on specific links.

A single session could be landing on a blog page, scrolling down for a few seconds, moving on and never returning.

simple session

Another single session could be visiting numerous pages, adding a million items into their cart and then spending £4000 on Oasis tickets.

advanced session

How does GA4 decide where a session starts and ends?

A session starts when you visit a website and then lasts (by default) for up to 30 minutes.

If you visit after 30 minutes it’s a new session.

So if we visit a 3 times, here’s how it breaks into sessions:

  • 00:00 - Visit a page - Session 1
  • 10:00 - Visit a page - Session 1
  • 40:01 - Visit a page - Session 2

Our 3rd visit is after 30 minutes (by one second) so it’s a new session.

Technically speaking, the session would start when either:

  • A user opens your app in the foreground
  • A user views a page or screen

And the session timeout period can be adjusted in GA4.

All session aren’t equal

However eventful or uneventful your visit is to the site, it will still only count as a single session.

So there are many metrics in GA4 to help you tell the difference.

Measuring which sessions are valuable

Why not just use conversions?

You can! And should.

But there are many times where the path from visit to conversion is long and convoluted.

Our goal here is to measure a funnel. Not many people convert. Lots of people visit.

Having steps in the middle like engaged sessions, help us find out what is doing well without needing to get full conversions.

What are engaged sessions?

An engaged session is a default metric provided by GA4 to let you to measure the “higher quality” sessions.

It makes use of the better event tracking that GA4 has to give you a better metric.

Any session which does any one of these 3 things is considered “engaged”:

  1. The session lasts for more than 10 seconds.
  2. The session features a conversion event.
  3. Includes at least two pageviews (or two screenviews on an app).

engaged session

They wanted this to replace bounce rate.

Bounce Rate was a metric from Universal Analytics that shows the percentage of sessions where a user leaves the site without a single event firing.

People used this to measure sessions where you visited one page.

But it had problems:

  • Bounces weren’t always bad: The problem was sometimes a bounce is a good experience. If I visit a tutorial, spend 10 minutes reading it and making notes and leave. That’s a bounce.
  • If you added other metrics bounce rate could change wildly: Remember our definition was “sessions where there was only 1 event” that means if you fired anything else e.g. scroll rate, any scrolling would immediately remove the bounce. This meant it was easy to break and a lot of people made metrics like “adjusted bounce rate”.

The goal with engaged sessions is to make something more accurate and less robust to breaking.

It's the Looker Studio Connector and our Pipeline product

They don't show it off in the interface, but you can still find it in the Looker Studio Connector and our Pipeline product.

But once you’ve gotten the basics down we’d recommend moving onto engaged sessions and customising it!

How can we customise engaged sessions?

Each business and website is different. A valuable session for a new user, might look different to a returning one.

If someone is top of the funnel and looking for content, a valuable visit from them probably isn’t the same as someone looking to buy then and there.

A valuable content visit might be:

  • Signing to the newsletter
  • Watching a video
  • Revisiting the same content over the last 7 days

A valuable visit to a product page might be:

  • Engaged with reviews
  • Using the FAQ accordion
  • Looking at the different colour options, for example.

In order to make this kind of custom metric, you’ll need to use the raw GA4 BigQuery export.

In our sessionisation example you'll see more of how this could work.

You can also take a look at our Pipeline documentation to see how we logically construct these in our product.

How do you count/identify sessions in BigQuery?

To get the most out of GA4 you’re going to want to set-up the BigQuery export.

But it’s quite complex so let’s talk about how to use it.

In the GA4 export every single row of data is an event - whether it be a page_view, purchase, or view_item_list.

So how do sessions work? There are two ways we can identify sessions in BigQuery.

The session_start event

The first is the session_start event itself which signifies the start of the session. GA4 automatically fires this at the start of a session.

session start

We can session_starts to get the number of sessions.

And we can look at the information that event gets sent with to pull out other dimensions like:

  • page_location - Sessions Landing Page
  • page_title - Sessions Landing Page Title

The ga_session _id

The second thing which shows a session is the ga_session_id .

This is the number GA4 automatically assigns to a session and is fired with all events in the session.

The example below shows a view_item event with the ga_session_id.

ga_session_id example event

This can technically repeat across different users so to truly identify a session we need to combine it with user_pseudo_id.

We could concatenate user_pseudo_id and ga_session_id and then count the unique values.

We’d suggest using the second

Typically we’d recommend this as sometimes (very occasionally) we’ve found session_start has been missing.

Sidenote: If you consent mode is on and user does not accept the tracking then ga_session_id and user_pseudo_id will not be returned. So if you're seeing a lot of missing data it's typicall this.

Aligning on traffic source

This is the other largest challenge that you run into with sessions, which is getting traffic source correct.

This is long enough that we made our own article.

Sessionisation: How do we group by the ga_session_id to make a session?

That's how we count sessions, but how can we build a session complete with all the dimensions and metrics we want?

Broadly we will: get all the events with the same ga_session_id and then run some SQL across all of them to get the dimensions and metrics we want.

We want our output to look like this:

countrydevicesession_id
francedesktopcnc2390ni43ijhbd7892
usamobilemnkmopd32mp23o2lkdmnl

You know with bells and whistles and a lot more columns.

Here's a diagram of how we do it:

session start

There are a couple core problems we need to handle in our query:

  1. Extracting the information for the session. The session ID lives in the event_params array.
  2. What if dimensions change across sessions?

Then there are some other problems we can also start to worry about:

  1. GA4 traffic attribution bugs.
  2. Last non direct click attribution.
  3. Use less data so everything is cheaper (and slightly faster).

We'll mostly talk about 1 & 2 for the moment.

If you want a full example query with all the dimensions to run and pull apart check out our starter query in our resource library.

But otherwise let's jump into the SQL.

Step 1: Unnest the event_params array

We'll start by grabbing out the pieces we specifically need from the event_params array.

You could get traffic source from here, we will for this example, but there are a lot of different ways to get it.

We have a whole article on how traffic source works in BigQuery.

with
    unnest_values as (
        select
            -- Get user ID to create unique session key
            user_pseudo_id,

            -- Get geo location
            geo.country as geo_country,

            -- Unnest values: ga session ID and source.
            (select value.string_value from unnest(event_params) where key = 'source') as source,
            (select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id
        from `project_id.dataset_id.table_id`
    )

Step 2: Group by the session ID

Once we've extracted the values we want, we've got a list of all the events alongside the extracted values we want.

Now it's time for the window function.

We use a window function rather than a group by because we want access to FIRST_VALUE.

The problem is that dimensions have a lot of edge cases where they can change across a session e.g.:

  • If you're using a phone in a taxi and drive from one town to another your city will change.
  • The page location will change with each page they visit.
  • Traffic source typically isn't set for the later events in a session after you've navigated from the landing page.

If we run a group by here, we're going to end up with our single session appearing in two different rows.

This means we'll end up double counting it, as for example we could end up with:

citydevicesession_id
framlinghamdesktopcnc2390ni43ijhbd7892
yoxforddesktopcnc2390ni43ijhbd7892

It's one session but split across two places.

Instead with a window function we can convert that table into:

citydevicesession_id
framlinghamdesktopcnc2390ni43ijhbd7892
framlinghamdesktopcnc2390ni43ijhbd7892

At which point we can now group by our values and correct the number of overall sessions.

In SQL it looks like this:

window
    session_window as (
        partition by ga_session_id
        order by event_timestamp asc
        rows between unbounded preceding and unbounded following
    )

Which says:

  • Partition by the ga_session_id - i.e. build a window and select everything with the same ga_session_id
  • Order by the event_timestamp - i.e. order the rows in that window by the event_timestamp
  • Get all the rows which match this criteria - i.e. if anything matches this session id then include it, we dont care about the order

We then use that window with a first value function to get the first value that appears for that session.

We use ignore nulls, because sometimes GA4 will send a null value for source even if it appears in future events in the session.

All together that looks like this:

with
    unnest_values as (
        select
            -- Get user ID to create unique session key
            user_pseudo_id,

            -- Get geo location
            geo.country as geo_country,

            -- Unnest values: ga session ID and source.
            (select value.string_value from unnest(event_params) where key = 'source') as source,
            (select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id
        from `project_id.dataset_id.table_id`
    ),
    -- We need to build a unique key for each session first and technically session id is only unique
    -- when combined with user_pseudo_id.
    create_a_session_key as (
        select
            concat(user_pseudo_id, session_id) as session_key,
            user_pseudo_id,
            session_id,
            geo_country,
            source
        from unnest_values
    ),
    -- Here is where our window function begins.
    generate_window as(
        select
            user_pseudo_id,
            session_id,
            first_value(geo_country ignore nulls) over session_window as geo_country,
            first_value(source ignore nulls) over session_window as source
        from create_a_session_key
        window
            session_window as (
                partition by session_key
                order by event_timestamp asc
                rows between unbounded preceding and unbounded following
            )
    ),
    group_after_window as (
        select
            user_pseudo_id,
            geo_country,
            source,
            session_id
        from generate_window
        group by 1, 2, 3, 4
    )
select * from group_after_window;

Additional problems

We've solved the core of it, but there's a couple more big pieces to consider before rolling into production for a large site.

3. GA4 traffic attribution bugs.

In our traffic sources article we cover the ga4 bigquery attribution bug.

4. Last non direct click attribution.

GA4 doesn't use session level attribution which is what we've created here.

It uses last non direct click attribution. (Again see the traffic sources article for more detail.)

In order to get that working, you'll need to adjust the above query. The broad steps are:

  1. Run a 90 day lookback window. Filter out all the direct sessions per user.
  2. When you find a direct session look up against this table to find the last non direct click and use this instead.

We'll have a blog post about this in the future which goes into more detail.

5. Incremental builds

A sessions table can get really large. You don't want to be building this entire table from scratch everytime.

You're going to want to use some sort of pipelining here with incremental builds. It will massively speed up and crucially reduce your BigQuery costs.

This is one of the crucial difference between something like our Pipeline product and a single BigQuery query.

And going through all the steps needed to do that is notably more than a single blog post.

You can roll this yourself with something like DBT or Dataform, or please take a look at our Pipeline product if you're looking for something that's ready to go!

Example Queries for GA4 BigQuery

If you’re just looking for copy paste queries you can use those below.

If you’d like direct access to this data without having to handle all the complex SQL we’ve got a product coming soon and please get on our email list if you’re interested!

Customise the queries.

Set-up variables to customise our BigQuery queries for your setup.

Calculating Sessions

This query pulls the number of sessions in the selected time period.

WITH unnest_data AS (
    SELECT
        user_pseudo_id,
        (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
    FROM
    -- This table will need to be updated
        `@project.analytics_@analytics_property_id.@event_type*`
    WHERE
    -- The date range being selected will need to be updated
        _table_suffix BETWEEN '@start_date' AND '@end_date'
)

-- Count distinct user-session combinations
SELECT
    COUNT(DISTINCT CONCAT(user_pseudo_id, session_id)) AS sessions
FROM
    unnest_data;

Calculating Engaged Sessions

This query generates the number of sessions throughout a defined time period.

WITH unnest_data AS (
    SELECT
        user_pseudo_id,
        (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS session_engaged,
        (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
    FROM
    -- This table will need to be updated
        `@project.analytics_@analytics_property_id.@event_type*`
    WHERE
    -- The date range being selected will need to be updated
         _table_suffix BETWEEN '@start_date' AND '@end_date'
)

-- Count distinct engaged user-session combinations
SELECT
    COUNT(DISTINCT CASE WHEN session_engaged = '1' THEN CONCAT(user_pseudo_id, session_id) END) AS engaged_sessions
FROM
    unnest_data;

How do I see Sessions on the GA4 interface?

If you navigate to Acquisition > Traffic Acquisition you can see the Sessions and Engaged Sessions broken down by Primary Channel Group.

Where are sessions in the GA4 interface