GA4 Sessions Explained
Lets dive into the humble session. We’re going to cover:
- Basics & Interface
- What is a session?
- How can we make sessions more useful?
- What are engaged sessions & how can we customise them?
- How do sessions work in BigQuery?
- How does it work?
- Why don’t BigQuery sessions match GA4 UI sessions?
- Example GA4 BQ Queries
Our goal here is to give you a solid understanding of how sessions work on GA4.
Once you've got that, we'll show you how to start using it in BigQuery. If you already know the basics skip ahead to:
Let’s dive in.
What is a session on GA4?
When someone visits a site, the actions they take during that visit are grouped into a “session”.
This session includes everything they do, such as viewing different pages, making a purchase or clicking on specific links.
A single session could be landing on a blog page, scrolling down for a few seconds, moving on and never returning.
Another single session could be visiting numerous pages, adding a million items into their cart and then spending £4000 on Oasis tickets.
How does GA4 decide where a session starts and ends?
A session starts when you visit a website and then lasts (by default) for up to 30 minutes.
If you visit after 30 minutes it’s a new session.
So if we visit a 3 times, here’s how it breaks into sessions:
- 00:00 - Visit a page - Session 1
- 10:00 - Visit a page - Session 1
- 40:01 - Visit a page - Session 2
Our 3rd visit is after 30 minutes (by one second) so it’s a new session.
Technically speaking, the session would start when either:
- A user opens your app in the foreground
- A user views a page or screen
And the session timeout period can be adjusted in GA4.
All session aren’t equal
However eventful or uneventful your visit is to the site, it will still only count as a single session.
So there are many metrics in GA4 to help you tell the difference.
Measuring which sessions are valuable
Why not just use conversions?
You can! And should.
But there are many times where the path from visit to conversion is long and convoluted.
Our goal here is to measure a funnel. Not many people convert. Lots of people visit.
Having steps in the middle like engaged sessions, help us find out what is doing well without needing to get full conversions.
What are engaged sessions?
An engaged session is a default metric provided by GA4 to let you to measure the “higher quality” sessions.
It makes use of the better event tracking that GA4 has to give you a better metric.
Any session which does any one of these 3 things is considered “engaged”:
- The session lasts for more than 10 seconds.
- The session features a conversion event.
- Includes at least two pageviews (or two screenviews on an app).
They wanted this to replace bounce rate.
Bounce Rate was a metric from Universal Analytics that shows the percentage of sessions where a user leaves the site without a single event firing.
People used this to measure sessions where you visited one page.
But it had problems:
- Bounces weren’t always bad: The problem was sometimes a bounce is a good experience. If I visit a tutorial, spend 10 minutes reading it and making notes and leave. That’s a bounce.
- If you added other metrics bounce rate could change wildly: Remember our definition was “sessions where there was only 1 event” that means if you fired anything else e.g. scroll rate, any scrolling would immediately remove the bounce. This meant it was easy to break and a lot of people made metrics like “adjusted bounce rate”.
The goal with engaged sessions is to make something more accurate and less robust to breaking.
It's the Looker Studio Connector and our Pipeline product
They don't show it off in the interface, but you can still find it in the Looker Studio Connector and our Pipeline product.
But once you’ve gotten the basics down we’d recommend moving onto engaged sessions and customising it!
How can we customise engaged sessions?
Each business and website is different. A valuable session for a new user, might look different to a returning one.
If someone is top of the funnel and looking for content, a valuable visit from them probably isn’t the same as someone looking to buy then and there.
A valuable content visit might be:
- Signing to the newsletter
- Watching a video
- Revisiting the same content over the last 7 days
A valuable visit to a product page might be:
- Engaged with reviews
- Using the FAQ accordion
- Looking at the different colour options, for example.
In order to make this kind of custom metric, you’ll need to use the raw GA4 BigQuery export.
In our sessionisation example you'll see more of how this could work.
You can also take a look at our Pipeline documentation to see how we logically construct these in our product.
How do you count/identify sessions in BigQuery?
To get the most out of GA4 you’re going to want to set-up the BigQuery export.
But it’s quite complex so let’s talk about how to use it.
In the GA4 export every single row of data is an event - whether it be a page_view
, purchase
, or view_item_list
.
So how do sessions work? There are two ways we can identify sessions in BigQuery.
The session_start event
The first is the session_start
event itself which signifies the start of the session. GA4 automatically fires this at the start of a session.
We can session_starts to get the number of sessions.
And we can look at the information that event gets sent with to pull out other dimensions like:
page_location
- Sessions Landing Pagepage_title
- Sessions Landing Page Title
The ga_session _id
The second thing which shows a session is the ga_session_id
.
This is the number GA4 automatically assigns to a session and is fired with all events in the session.
The example below shows a view_item
event with the ga_session_id
.
This can technically repeat across different users so to truly identify a session we need to combine it with user_pseudo_id
.
We could concatenate user_pseudo_id
and ga_session_id
and then count the unique values.
We’d suggest using the second
Typically we’d recommend this as sometimes (very occasionally) we’ve found session_start
has been missing.
Sidenote: If you consent mode is on and user does not accept the tracking then ga_session_id
and user_pseudo_id
will not be returned. So if you're seeing a lot of missing data it's typicall this.
Aligning on traffic source
This is the other largest challenge that you run into with sessions, which is getting traffic source correct.
This is long enough that we made our own article.
Building GA4 BigQuery yourself?
Need a example GA4 BigQuery query to get started? We've got a session table query with all the default dimensions and metrics you can paste into Looker Studio.
Get up and running in 15 minutes
Our Pipeline product lets you skip all the data engineering and SQL. GA4 BigQuery data with custom dimensions, user properties, multi touch attribution, lower BQ costs and more out of the box.
Sessionisation: How do we group by the ga_session_id to make a session?
That's how we count sessions, but how can we build a session complete with all the dimensions and metrics we want?
Broadly we will: get all the events with the same ga_session_id and then run some SQL across all of them to get the dimensions and metrics we want.
We want our output to look like this:
country | device | session_id |
---|---|---|
france | desktop | cnc2390ni43ijhbd7892 |
usa | mobile | mnkmopd32mp23o2lkdmnl |
You know with bells and whistles and a lot more columns.
Here's a diagram of how we do it:
There are a couple core problems we need to handle in our query:
- Extracting the information for the session. The session ID lives in the event_params array.
- What if dimensions change across sessions?
Then there are some other problems we can also start to worry about:
- GA4 traffic attribution bugs.
- Last non direct click attribution.
- Use less data so everything is cheaper (and slightly faster).
We'll mostly talk about 1 & 2 for the moment.
If you want a full example query with all the dimensions to run and pull apart check out our starter query in our resource library.
But otherwise let's jump into the SQL.
Step 1: Unnest the event_params array
We'll start by grabbing out the pieces we specifically need from the event_params array.
You could get traffic source from here, we will for this example, but there are a lot of different ways to get it.
We have a whole article on how traffic source works in BigQuery.
with
unnest_values as (
select
-- Get user ID to create unique session key
user_pseudo_id,
-- Get geo location
geo.country as geo_country,
-- Unnest values: ga session ID and source.
(select value.string_value from unnest(event_params) where key = 'source') as source,
(select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id
from `project_id.dataset_id.table_id`
)
Step 2: Group by the session ID
Once we've extracted the values we want, we've got a list of all the events alongside the extracted values we want.
Now it's time for the window function.
We use a window function rather than a group by because we want access to FIRST_VALUE
.
The problem is that dimensions have a lot of edge cases where they can change across a session e.g.:
- If you're using a phone in a taxi and drive from one town to another your city will change.
- The page location will change with each page they visit.
- Traffic source typically isn't set for the later events in a session after you've navigated from the landing page.
If we run a group by here, we're going to end up with our single session appearing in two different rows.
This means we'll end up double counting it, as for example we could end up with:
city | device | session_id |
---|---|---|
framlingham | desktop | cnc2390ni43ijhbd7892 |
yoxford | desktop | cnc2390ni43ijhbd7892 |
It's one session but split across two places.
Instead with a window function we can convert that table into:
city | device | session_id |
---|---|---|
framlingham | desktop | cnc2390ni43ijhbd7892 |
framlingham | desktop | cnc2390ni43ijhbd7892 |
At which point we can now group by our values and correct the number of overall sessions.
In SQL it looks like this:
window
session_window as (
partition by ga_session_id
order by event_timestamp asc
rows between unbounded preceding and unbounded following
)
Which says:
- Partition by the ga_session_id - i.e. build a window and select everything with the same ga_session_id
- Order by the event_timestamp - i.e. order the rows in that window by the event_timestamp
- Get all the rows which match this criteria - i.e. if anything matches this session id then include it, we dont care about the order
We then use that window with a first value function to get the first value that appears for that session.
We use ignore nulls, because sometimes GA4 will send a null value for source even if it appears in future events in the session.
All together that looks like this:
with
unnest_values as (
select
-- Get user ID to create unique session key
user_pseudo_id,
-- Get geo location
geo.country as geo_country,
-- Unnest values: ga session ID and source.
(select value.string_value from unnest(event_params) where key = 'source') as source,
(select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id
from `project_id.dataset_id.table_id`
),
-- We need to build a unique key for each session first and technically session id is only unique
-- when combined with user_pseudo_id.
create_a_session_key as (
select
concat(user_pseudo_id, session_id) as session_key,
user_pseudo_id,
session_id,
geo_country,
source
from unnest_values
),
-- Here is where our window function begins.
generate_window as(
select
user_pseudo_id,
session_id,
first_value(geo_country ignore nulls) over session_window as geo_country,
first_value(source ignore nulls) over session_window as source
from create_a_session_key
window
session_window as (
partition by session_key
order by event_timestamp asc
rows between unbounded preceding and unbounded following
)
),
group_after_window as (
select
user_pseudo_id,
geo_country,
source,
session_id
from generate_window
group by 1, 2, 3, 4
)
select * from group_after_window;
Additional problems
We've solved the core of it, but there's a couple more big pieces to consider before rolling into production for a large site.
3. GA4 traffic attribution bugs.
In our traffic sources article we cover the ga4 bigquery attribution bug.
4. Last non direct click attribution.
GA4 doesn't use session level attribution which is what we've created here.
It uses last non direct click attribution. (Again see the traffic sources article for more detail.)
In order to get that working, you'll need to adjust the above query. The broad steps are:
- Run a 90 day lookback window. Filter out all the direct sessions per user.
- When you find a direct session look up against this table to find the last non direct click and use this instead.
We'll have a blog post about this in the future which goes into more detail.
5. Incremental builds
A sessions table can get really large. You don't want to be building this entire table from scratch everytime.
You're going to want to use some sort of pipelining here with incremental builds. It will massively speed up and crucially reduce your BigQuery costs.
This is one of the crucial difference between something like our Pipeline product and a single BigQuery query.
And going through all the steps needed to do that is notably more than a single blog post.
You can roll this yourself with something like DBT or Dataform, or please take a look at our Pipeline product if you're looking for something that's ready to go!
Example Queries for GA4 BigQuery
If you’re just looking for copy paste queries you can use those below.
If you’d like direct access to this data without having to handle all the complex SQL we’ve got a product coming soon and please get on our email list if you’re interested!
Customise the queries.
Set-up variables to customise our BigQuery queries for your setup.
Calculating Sessions
This query pulls the number of sessions in the selected time period.
WITH unnest_data AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
FROM
-- This table will need to be updated
`@project.analytics_@analytics_property_id.@event_type*`
WHERE
-- The date range being selected will need to be updated
_table_suffix BETWEEN '@start_date' AND '@end_date'
)
-- Count distinct user-session combinations
SELECT
COUNT(DISTINCT CONCAT(user_pseudo_id, session_id)) AS sessions
FROM
unnest_data;
Calculating Engaged Sessions
This query generates the number of sessions throughout a defined time period.
WITH unnest_data AS (
SELECT
user_pseudo_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS session_engaged,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
FROM
-- This table will need to be updated
`@project.analytics_@analytics_property_id.@event_type*`
WHERE
-- The date range being selected will need to be updated
_table_suffix BETWEEN '@start_date' AND '@end_date'
)
-- Count distinct engaged user-session combinations
SELECT
COUNT(DISTINCT CASE WHEN session_engaged = '1' THEN CONCAT(user_pseudo_id, session_id) END) AS engaged_sessions
FROM
unnest_data;
How do I see Sessions on the GA4 interface?
If you navigate to Acquisition > Traffic Acquisition you can see the Sessions and Engaged Sessions broken down by Primary Channel Group.