BigQuery & Basics

How to set up the GA4 BigQuery export?

So you're looking to start exporting your GA4 data into BigQuery?

Let's dive in. This will take you probably about 15 minutes to set-up.

Step 1: Setting up Google Cloud

In order to allow GA4 data to be pumped in BigQuery, you need to have a Google Cloud account set up.

In the process of setting up this cloud account, you'll set-up a project. This is where everything will live!

Google does a pretty good of walking you through these basic parts.

You can get started on this page.

get_started_with_google_cloud

Once you are set up, you will need to have a self-serve Cloud Billing account.

This allows you to automatically pay for your Google Cloud usage costs.

How much does it this cost you?

You might’ve heard terrible things about cloud bills, but don’t worry BigQuery won’t cause you any issues.

For context, for a site the size of Piped Out, which gets a couple thousand visitors a month costs us approximately $0.03 a month to run. And the first $6.25 every month are free!

Are you running a large property?

If you’re notably larger, then please take a look at the BigQuery costs page. The most important one is the cost to query a TB which is currently $6.25 per TiB.

It’s also worth noting we’ve helped a lot of customers massively reduce their data costs (sometimes 100x, depends a lot on your current maturity) with Pipeline.

If you are finding yourself spending a lot of money on BigQuery please reach out. There is a good chance we can help!

Back to setting up your billing account

Billing accounts can be set-up and updated on this page in Google Cloud

Important: Without billing details on your Google Cloud account, you’ll use the BigQuery Sandbox—a free, limited environment where data automatically expires every 60 days. Expiring data is bad as hopefully goes without saying.

Step 2: Setting up Google Analytics 4 to BigQuery Connection.

For 80% of people we can walk you through how to set this up relatively simply.

If you do run into issues then Google has a page covering all the possible scenarios

Let’s get started.

Go to Admin on GA4

You need to have Editor or above permission at the GA4 property level to set up this connection.

You will then need to clicks BigQuery links.

click_bigquery_links

Then select Link.

bigquery_links_link

You will then need to select a BigQuery Project you manage and pick a location.

Which project do I pick?

You can either select an existing project or create one yourself.

See previous section on setting up Google Cloud.

What do I put for Location?

If you already have a BigQuery account and existing datasets, then put it in the same region as them.

If you don’t, then just pick the region that your business is based in.

bigquery_setup_choose_bigquery_project

Step 3: Selecting either Daily or Steaming Export (or both)

This part does require making a choice.

So it's worth having a separate step to discuss the relative trade offs here.

data_steams_selection

What should I do for data steams and events?

You can configure the data streams and events to exclude specific events if you want.

In most cases, we recommend just picking the main stream and not excluding anything.

There are some edge cases where you might want to do something different.

An example would be that you want to use the daily streaming but don't want to go over the event limit... more on this later.

When setting up GA4 data storage in BigQuery, you have two options:

  • Streaming
  • Daily

You need to select one (or both!) of these when setting up your GA4 BigQuery export.

They both have their relative benefits: The Streaming tables are (almost!) realtime and have no data limits, whereas the daily tables take longer to come through, but are more complete and their is a data limit.

Let's dig into this in more detail...

What's the difference between streaming and daily tables?

Streaming tables:

  • These tables are exported continuously throughout the day.
  • This allows you to get very close to realtime data.
  • You can export as many events as you want.
  • This is described as a “best-effort operation” meaning that the data may contain gaps.
    • Usually this usually only 0.5-2% difference, however there was a bug in the past 6 months. (See below).
  • The table name is in the format: events_intraday_YYYYMMDD
  • They charge you approximately $0.05 for each 600K events.

Important note: If you select both streaming and daily tables, the streaming (intraday table) is removed once the daily events_YYYYMMDD table is finalised.

Daily tables:

  • These produce daily tables.
  • These are created after Analytics collects all of the events for the previous day.
  • This data more complete than the streaming tables with less data gaps.
  • You can't export more than 1M events per day.
  • The table name is in the format: events_YYYYMMDD
  • You get some additional metrics and tables.

Should I pick the streaming or daily tables?

If you don't care about real time and export less than 1 million (the export will show you), then go with daily.

If you're over 1 million events, you should see if you need all the events or if someone has added some pointless ones. If excluding them gets you under 1 million, then go with daily.

If you're over 1 million events and you need all of them, then go with streaming.

What are the tables/metrics missing from the streaming tables?

This the new user and new session traffic source data is excluded.

The good news is you can just add this back in as long as your SQL is good.

Once you are done, hit Save.

You'll then start to see your data start appearing in BigQuery in no time.

Importantly, this data won't be backdated. It will only run from the date you've selected.

streaming_vs_daily_bigquery