How to extract real-time intraday data from Google Analytics 4 and Firebase in BigQuery

and always have an up-to-date data your custom reports

© Mike Shakhomirov November 19, 2021 1 Comments
Image by author @mshakhomirov

Intro

If you are a Firebase or Google Analytics 4 user and you have setup data imports into your BigQuery data warehouse then you might want to create real-time custom reports with your data in intraday schema. The problem is that this integrated dataset is being deleted automatically by Google every day. So if you choose to connect it as a datasource to your report in Google Data Studio you won't find it the day after.

For example, I want to do real-time analytics on intraday tables in our BI solution built with Data Studio. Due to it's being constantly deleted I can't connect it as a dataset to Google Data Studio. Also when it's deleted it stops being a partitioned table and you can't use wildcard suffix for date querying.

Having said that, I would need the following:

  1. intraday tables shouldn't be deleted automatically. I would want to do it manually.
  2. As this Firebase daily data export/integration feature exists. I would like integration data transfer to be delivered the next day within an hour guaranteed.

I'm sure it's very common. Google Analytics 4

I have raised a feature request with Google Engineering to enable manual deletion if needed but at the moment of writing this article it is still in progress.

"intraday tables shouldn't be deleted automatically. We would want to do it manually." It might take a while though. So here is the solution to overcome this.

How to extract data from intraday table

Ideally you would want to run a scheduled script where parameter is an intraday table's date suffix.

However it is not that simple...

Intraday tables are being deleted automatically

So you don't know the exact time to schedule the script. Alternatively you would use a daily export table but again you would have to wait unitl that extract is ready. I have been looking for information about intraday tables deletion time, and I found that it is possible for the intraday table from the past day to still exist for a couple of hours. One of the reason is the process requiring more time for large amounts of data Stackoverflow. So there is no set time for the past table to be deleted, but generally, it can still exist a couple of hours after the new day. The timezone you set in GA also affects this.

Impossible to use parameters for table names in BigQuery scripts.

BigQuery scripting is a powerful tool but at the moment (2021-11-20) Google's BigQuery scripts can't use parameters for table names. According to [Parameterized queries in BigQuery documentation5, query parameters can't be used for SQL object identifiers

Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query.

Step 1. Check if original intraday table exists

You would want to use this script returning true if data exists:

Step 2. Create a BigQuery SQL script (IF exists DO something, i.e. INSERT)

Then you add into you script that would perform an update:

Step 3. Decide what to do with INSERT statement for your new intraday table copy

You could:

  • Replace the whole table with new data
  • Add only new records for selected columns

Copy and Replace intraday events table

This script wil copy whole table:

Update intraday_events_copy table with only columns you need and control partitioning

You would want to use the same SQL script as in Copy and Replace intraday events table but change copyAndReplaceQuery to selectAndInsert query. Let's create this script that would do the following:

  • create a partitioned table.
  • select only new data for partition to update.
  • insert new data.

Solution:

Conclusion

Intraday tables might have minor discrepancies in comparison to the daily session tables. These discrepancies are due to the following:

  • Google Analytics allows for hits that are sent up to four hours late, and these hits may not be available in intraday tables at the time data is requested.
  • Google performs some processing of data after hits are collected and before data is exported to BigQuery. Generally, the affected fields are traffic sources and linked marketing products (AdWords, Campaign Manager, etc.)

In general if those fields don't make any difference in your data stack this is the solution to improve data availability with *Firebase or Google Analytics 4 integrations.

blog

Mike

Mike is a Machine Learning Engineer / Python / Java Script Full stack dev / Middle shelf tequila connoisseur and accomplished napper. Stay tuned, read me on Medium https://medium.com/@mshakhomirov/membership and receive my unique content.

Comments

Blog Image 44

Jen Lopez

November 13, 2021

Very useful.

Leave a Comment