BigQuery: Scale It Up – Part I

In its simplest form, big data is a collection of information from many sources. That is what BigQuery essentially facilitates. The arrival of enterprise-grade data warehouse has allowed access to ‘Big Data’ for everyday digital marketers.

What we will look at:

  • What is it? Why is it?
  • Sneak Peek
  • Alternatives to BigQuery
  • Two major differences in the way that data is collected in BigQuery and in GA
  • Coming up
  • Resources

This article will serve as a prelude to a second article on Google Cloud’s BigQuery (BQ) We will be looking at why we need a solution like Big Query and how it is different from just extracting data from Google Analytics (GA) reports. So where does BigQuery fall in the marketing technology stack of a company? If you are working with analytics tools normally you have default capabilities for data collection and visualization without any visibility with regards to how is the data store or warehoused. For the end user only the first and last steps are visible, what happens in between is not accessible.

Let’s look at the image below to understand how big query further segments the analytics process for better visibility and control.

No alt text provided for this image

What is it? Why is it?

“BigQuery is an enterprise data warehouse that solves this problem by enabling super-fast SQL queries using the processing power of Google’s infrastructure. Simply move your data into BigQuery and let us handle the hard work.”

Source: Google

You can think of BigQuery as a place where you dump all the possible data from all the possible channels and that worry about making sense of it later. BigQuery helps you perfect your data collection infrastructure. BigQuery can be the link between third-party data and marketing analytics data. And can be  further connected to different third party visualization tools. A little visual aid to let it sink in;

No alt text provided for this image

The integration of different analytics tools with Google BigQuery serves three primary purposes:
Querying the data in the most raw form, importing from external data sources and exporting to data visualization tools. Each feature develops from the last and can serve as an extension to what GA or GA 360 can offer with regards to getting more flexibility to use data to one’s own advantage.

Within the GA interface, data is always imported as session-based and for larger amounts it’s mostly aggregated whereas data in BigQuery comes as hit-level data. For all those who use GA custom reports and advanced segmentation, combining dimensions serves as a silent but a very real problem. You can only use hit-level metrics to measure hit-level dimensions, and you can only use session-level metrics to measure session-level dimensions. For e.g., it is considered incorrect to have search keyword as your dimension and measure page value against it. 

The session-based data is fine for simplified views of marketing questions but does bode well when you start a deep dive. For e.g. in GA we can easily count the number of sessions that came from a mobile device. But if we wanted to count the number of video play/pause events by a specific user across multiple sessions, that would be much more difficult to achieve. In order to account for these kinds of discrepancies we need to perform additional calibration while extracting reports in GA. In BigQuery the desired data can be queried and you can combine any combination of fields and tables to perform your marketing analytics

A little sneak peek into the capabilities of BigQuery:

No alt text provided for this image

Impressive isn’t it? Now you want to take a look inside the tool right? Worry not, google provides a sandbox. The BigQuery sandbox gives you free access to the power of BigQuery subject to the sandbox’s limits. The sandbox lets you use the web UI in the Cloud Console without creating a billing account or enabling billing for your project.

Sneak Peek

Here is the link – https://cloud.google.com/bigquery/docs/sandbox, Navigate to cloud console that will be provided in the link.

Let’s dive in and go to BigQuery option in the dashboard;

No alt text provided for this image

Then to access the BigQuery interface go to query editor;

No alt text provided for this image

This is it, this is where you can play with your big data any which way. In sandbox we can try BigQuery in two ways; either with public datasets or with your own data.

Add datasets, and search for public ones, and start running your SQL queries;

No alt text provided for this image

Lots of public datasets to play with;

No alt text provided for this image

For in-depth tutorials on how to use SQL in BigQuery with various practical examples, be sure to check out this course on Udemy: https://www.udemy.com/course/sql-for-data-science-with-google-big-query/

Since BigQuery is hosted on Google’s extremely fast cloud-based server architecture, the queries can traverse billions of rows of data and return results with blazing speeds.

Alternatives to BigQuery

  • Amazon Redshift, 
  • Snowflake, 
  • Microsoft Azure SQL Data Warehouse,
  • Apache Hive

Two major differences in the way that data is collected in BigQuery and in GA

1) Sampling
One of the most noticeable limitations within the GA interface is “sampling,” which will kick in when you try to create a custom report or data requiring a large date range. With BigQuery, sampling is bypassed. We have all of the raw data so we can query it, slice it, splice, merge it, any way we want and still have 100% true data in our results.

2) Users vs Sessions

Querying and combining data across user level metrics across different sessions is not possible to be achieved inside the GA reports. If you use the enhanced e-commerce reports inside GA, you know that those reports are session-based. But in many real-world scenarios, a person might add an item to their cart in one visit and wait to complete the purchase in another visit. BigQuery allows you to see purchasing behavior from users who take more than one session to go from consumer to customer.

Published by : Hammad Ashraf
Read the Previous blog here