BigQuery & Basics

Explaining the GA4 BigQuery Export Schema

Do you know what event stream analytics is?

  1. No. Go and read this Introduction to GA4 & BigQuery and come back.
  2. Yes. Great let's get started and talk about the GA4 BigQuery export schema.

In this post we talk through all the different common pieces of information that GA4 collects and you can access in your GA4 BigQuery export.

This is focused primarily on web analytics - we’ll mention app analytics, but it’s not the focus.

If you’d like to jump straight to list of all the fields:

Otherwise read on.

GA4 exports event stream data to BigQuery

That means it’s exporting a list of every single event happening on your website separately.

Most things you do a website can have their own event which is tracked e.g.:

  • Someone opened a page
  • Someone scrolled down the page
  • Someone clicked a video
  • Someone purchased a product

GA4 collects a lot of information about these events

For each of these events GA4 collects a lot of pieces of information.

This includes default pieces like:

  • The country of the user who sent the event.

And custom pieces you can configure like:

  • Send the company a user wrote into a contact us lead form.

When this data is exported to BigQuery it goes into tables. And the schema of those tables describes all the data you have access to!

Google talks about any piece of data that comes with an event as a property (however it was collected). So that's what we'll do.

We’ll bucket GA4 event properties into groups

It’s useful to group the types of information an event can have.

We’ve bucketed these into 7 groups:

  1. Event properties that every event has
  2. Event properties that help you tie events together
  3. Event properties that tell you about the user who sent the event
  4. Event properties that tell you where an event came from
  5. Event properties that tell you the traffic source of the event
  6. Event properties that tell you information about products
  7. Custom event properties that you set-up for your site.

Here are some examples of each before we get into the full schema.

Event properties that every event has

These generic properties includes:

  • event_name - Every event needs a name.
  • event_timestamp - Every event arrives at a time.

Event properties that help you tie events together

These properties help you tie the events together.

It might be:

  • session_id - These events were all part of the same session.
  • user_pseudo_id - These events all belonged to the same browser + device.

Event properties that tell you about the user who sent the event

These properties tell you about the user behind the event.

User fields include:

  • is_active_user - Is the user currently considered active?
  • user_id - Is the user a logged in user?

Event properties that tell you where an event came from

This includes:

  • device_category - Which device did the the event come from?
  • geo_country - Which countries did the events come from?

Event properties that tell you the traffic source of the event

This may include:

  • traffic_source_medium - What was the medium that the user used the first time they ever visited your site?
  • medium - What is the medium of the current event?

Event properties that tell you information about products

These are designed specifically for e-commerce product and includes fields such as:

  • item_revenue_in_usd - How much did an item get purchased for?
  • price - How much is an item on sale for?
  • shipping_value - What is the shipping value?

Custom event properties that you set-up just for you

And then of course you can just set your own values.

Event parameters (event_params) are the real magic of the GA4 BigQuery export schema

Each event has a set of parameters that you can customise and submit whatever you want.

This is the real core of the system and it’s worth explaining a bit.

All of these event parameters are properties, but not all properties are event parameters.

What are event_params?

The extra clever part of the GA4 data schema is event_params (Event Parameters).

This is what allows it to be so flexible. Each event can have 25 totally customisable properties.

These parameters let you send whatever you want to GA4 as long as you give it a consistent name.

On the backend, these are stored as nested fields. They are stored in a single column, with a object/dict/key-value pair structure. Here's an example with ga_session_id and blog_content_type:

session_id_image

Event parameters consist of a few different parts:

  • Key (event_params.key): This is the name of the event parameter (i.e Session ID, Page Location, Medium).

Then, based on what kind of data it is (text or numbers), you'll see different fields which will either be populated or return null:

  • String Value (event_params.value.string_value): Text info such as a URL or Page Title (i.e. https://www.example.com).
  • Integer Value (event_params.value.int_value): Whole numbers such as Session ID number (i.e. 1724112663)
  • Float Value (event_params.value.float_value): Decimal numbers such as Revenue Figures (i.e. $19.99)
  • Double Value (event_params.value.double_value): More precise decimal numbers for big or detailed values.

In the example above, we have the event properties:

  • Session ID - This is automatically sent with each event and means we can tie together into sessions.
  • Blog Content Type - This is a custom event parameter. This custom information is much easier to access with the raw BigQuery data than with custom dimensions and metrics.

With a little bit of SQL, we can turn this into:

ga_session_idblog_content_typesourcedevice
2cUS7862jdf239Ahero_contentgoogledesktop

We’re going to include common event_parameters in our schema

To explain the different parts of Google’s data schema, we’re going to treat event_params that Google consistently sends as if they’re columns consistently set by GA4.

This is because they’re often crucially important!

You have to do a little SQL wrangling to get them to appear in this way, but you are on a GA4 BigQuery post 🤷

Here’s a basic example of unnesting the GA Session ID alongside event_name

-- This CTE extracts the Client ID (user_pseudo_id) and unnests event_params to extact
-- the session_id
with
    unnest_data as (
        select
    event_name,
    (
        select value.int_value from unnest(event_params) where key = 'ga_session_id'
    ) as ga_session_id
    -- This table will need to be updated.
from `project-name.dataset-name.events_intraday_xxxxxxxx`

This will produce a result like this:

event_namega_session_id
purchase1724140538
account_opened1724123538
menu_icon_clicked1724123538

Export schema for GA4 BigQuery - Grouped by type

Let’s break these down.

Every event has these fields

Events properties that help you tie things together

Sometimes various events will seem very similar and these properties can help tie them together.

Event properties that tell you about the user who submitted the event

Event properties that tell you where an event came from

Event properties that tell you the traffic source

If you haven’t dug deep into traffic acquisition before then check out this blog post

The key thing to understand is the scoping level that the different traffic sources are collected at.

Scope in GA4 is the level of grouping we apply to our analytics e.g.

  • Session
  • User
  • Hit

It’s easier to understand often by thinking about the goal.

When you open up your analytics you might look for different things:

  • I want to know how my users are behaving; You’d use User Level scoping dimensions.
  • I want to know how each individual session is going; You’d use Session Level scoping dimensions.
  • I care about each individual pageview; ; You’d use Event Level scoping dimensions.

The choice you’re making is scope.

See the full GA4 BigQuery export schema