GA4 Scoping Explained
Do you understand scope? If you don’t there’s a good chance you’re looking at wrong numbers.
Take this example:
Have you put session channel grouping (or any other session dimension) into a table with sessions and users?
GA4 will let you, but you have now broken the numbers.
If you want to see correct numbers you need to understand scoping.
Let’s dive in.
What is scoping in GA4?
Scope in GA4 is the level of grouping we apply to our analytics e.g.
- Session
- User
- Hit
It’s easier to understand often by thinking about the goal.
When you open up your analytics you might look for different things:
- I want to know how my users are behaving
- I want to know how each individual session is going
- I care about each individual page-view
The choice you’re making is scope.
GA4 has 4 main levels of scoping
These each represent different levels of aggregation.
- User (
user_id
): Data tied to a specific logged in user. - Client (
user_pseudo_id
): Data tied to a specific browser-device pair. - Session: Data tied to a session or journey.
- Event: Data tied to a specific action (e.g., clicking a button).
Also if you work in eCommerce, there’s also Item Level Scoping.
- Item: Data tied to specific items, like products in an e-commerce transaction.
To report on the right numbers, you need to pick the right scope. (If you’d like more information on user vs client, we have a whole post on how users work in GA4.) NBED.
What is the fundamental rule when choosing scopes?
Remember those broken numbers? Here’s the rule.
Pick your dimensions at one scope, then you must always use a metric that is either at that scope or a lower level in the hierarchy.
That’s a bit of a mouthful so let’s have some examples.
If you are using a User Scoped Dimension (user_id
), you can use metrics from any scope as User Level scoping is the highest scoping level.
If you are working with a Client Scoped Dimension (user_pseudo_id
), you must work with metrics that are at the same scoping level or less.
This means you should not use User (user_id
) scoped metrics (such as User ID count). You can, though, use Client (user_pseudo_id
) Level, Session Level & Event Level scoped metrics.
If you are working with a Session Scoped Dimension (let’s say Session medium), you can only work with Session Scoped or Event Scoped metrics.
This means that you can use Session or Event level metrics, but you couldn’t work with Client (user_pseudo_id
) or User (user_id
) Scoped metrics.
If you are working with an Event Scoped Dimension, you can only work with event based metrics. You can’t work with any other type of metric except for event scoped dimensions.
Why does using the wrong scope break your numbers?
In the example below, we can see how a single user interacts with the site over the course of a day.
- User 1
- Session 1 - organic visit
- Session 2 - paid visit
In this example, we can see there has been a single user, with 2 sessions.
With no dimensions, we have no issues.
That was nice and easy. Let’s make a problem.
We’ll add a session level dimension to our table with a user metric.
Now we’ve got double counting.
We double count our user, because that user has 2 sessions, 1 via organic and 1 via cpc and we have a session level dimension.
There is technically 1 user in each of those buckets, but it’s the same one.
The error we’ve made is mixing a session level dimension with a client/user metric.
If we want to look with session medium alongside a user based metric (i.e Users) then we need to use a user based dimension as that is higher up the scoping hierarchy.
But then we miss out on exactly where each session came from. There is no right answer, just the different metrics for different moments.
Doesn’t GA4 mix together incompatible sessions & metrics though?
In short, yes.
This is particularly easy when using explorations where you can just select whatever metrics and dimensions you want.
Take these numbers below. We’re doubling counting all the users who have had multiple sessions.
There are more guard rails on the GA4 interface such as separating the Session & First User Acquisition tables, but it’s still very possible.
If you used to use UA and you’re wondering about the old User Acquisition report? Yep it was wrong. We were double counting there and no-one noticed.
How should you set-up your BigQuery GA4 export to avoid scoping issues
When you’re working with GA4 BigQuery tables, scope is an excellent principle for setting base tables for people to use.
It’s difficult to totally prevent it, but you can structure your tables in a way to minimise scoping issues.
How would we recommend setting up tables?
- Create separate tables for each scope.
- When possible build one table sourced from the other.
In practice that means we end up with something like this:
- Items Table
- Pageviews Table
- Sessions Table
- Client Table
- User ID table
And you can see in this diagram how both the user tables build atop the sessions table.
This allows us to take advantage of constructs like sessions, in our client/user tables and lowers the chance of us miscalculating something and having different session numbers from the sessions to the client table.
Let’s show an example of this:
We’d build a the Session Table.
We have a row for each session which would look something like this:
Unique Session Key | Unique User ID | Session Start Date | Session Medium | Session Medium | Pageviews Count | Purchases Count |
---|---|---|---|---|---|---|
1234 | ah12 | 2023-09-01 | organic | 4 | 1 | |
2343 | gh32 | 2023-09-01 | cpc | 2 | 0 | |
4324 | kj43 | 2023-09-01 | cpc | 1 | 0 |
Then for client/user ID tables we’d re-aggregate the session data to convert it to client/user.
For example to create a client version of our session table we could:
WITH
creating_partitions as (
SELECT
Unique_User_ID,
FIRST_VALUE(Session_Medium) OVER (PARTITION BY Unique_User_ID ORDER BY Session_Start_Date ASC) AS First_Session_Medium,
FIRST_VALUE(Session_Source) OVER (PARTITION BY Unique_User_ID ORDER BY Session_Start_Date ASC) AS First_Session_Source,
COUNT(Unique_Session_Key) AS Sessions,
SUM(Pageviews_Count) AS Pageviews,
SUM(Purchases_Count) AS Purchases
FROM
Session_Table
)
SELECT
Unique_User_ID,
First_Session_Medium,
First_Session_Source,
SUM(Sessions)as Sessions,
SUM(Pageviews) as Pageviews,
SUM(Purchases) as Purchases
FROM creating_partitions
GROUP BY
Unique_User_ID
So, what should I take away from this?
Whenever you combine a metric with a dimension on GA4, you are grouping by that dimension.
To avoid double counting or producing broken numbers:
- Either use a dimension at the same scoping as the metric you are using OR
- Use a dimension at a higher scoping as the metric you are using.