How to "unpivot" a table in BigQuery
Recently, I was presented with a problem in BigQuery: How do I make a table's metrics be returned as rows? Normally, metrics are presented as columns and dimensions as rows. But in this case, I needed the metrics to appear as rows too.
In short, I needed to "unpivot" a table like this:
to this:
The solutions lay with three BigQuery features:
But the results need to be returned such that the "Impressions", "Clicks" and "Conversions" are in their own rows.
Each metric is organised into one STRUCT that has one key, "Metric", and one value, "Data". So in my sample data where I have three Metrics:
So for each row of my data, instead of having three separate STRUCTs, I have one ARRAY containing three STRUCTs like this:
[
STRUCT('Impressions' AS Metric, Impressions AS Data),
STRUCT('Clicks' AS Metric, Clicks AS Data),
STRUCT('Conversions' AS Metric, Conversions AS Data)
] AS Metrics_Data
I have also named my ARRAY as "Metrics_Data". This will be used in the next, final step.
The ARRAY arranges my metrics in a "vertical" manner now:
The final step is to get the dimensions returned with every "row" of data. … Well, that's what non-BigQuery practitioners would say. In BigQuery terms, I want to "flatten" the data.
The BigQuery documentation describes how to perform this flattening, mentioned in the instructions for querying STRUCTs in an ARRAY. It involves a CROSS JOIN with BigQuery's own UNNEST operator.
So in my final SELECT statement, I CROSS JOIN my "Campaign_Results_Metrics_Data" temporary table with its "Metrics_Data" column (the ARRAY of STRUCTs):
SELECT
...,
Metric_Data
FROM
Campaign_Results_Metrics_Data
CROSS JOIN
UNNEST(Campaign_Results_Metrics_Data.Metrics_Data) AS Metric_Data
That gets my data into the final form that satisfies my requirement:
This form should be sufficient to be used with programming code (e.g. Python, Google Apps Script) and visualisation programs (e.g. Tableau, Data Studio). The latter should also be able to handle aggregation (e.g. sums, counts, averages, etc.) of the unpivoted metrics without further scripting.
WITH
My_Table_Metrics_Data AS (
SELECT
...,
[
STRUCT('...' AS Metric, ... AS Data),
STRUCT('...' AS Metric, ... AS Data),
] AS Metrics_Data
FROM
`My_Dataset.My_Table`
WHERE
...
)
SELECT
...,
Metric_Data
FROM
My_Table_Metrics_Data
CROSS JOIN
UNNEST(My_Table_Metrics_Data.Metrics_Data) AS Metric_Data
In short, I needed to "unpivot" a table like this:
Dimension1 | Dimension2 | Metric1 | Metric2 | Metric3 |
to this:
Dimension1 | Dimension2 | Metric | Data |
Metric1 | |||
Metric2 | |||
Metric3 | |||
Metric1 | |||
Metric2 | |||
Metric3 | |||
Metric1 | |||
Metric2 | |||
Metric3 |
The solutions lay with three BigQuery features:
- STRUCT type
- ARRAYs
- CROSS JOIN with the UNNEST operator
How to get unpivoted results in BigQuery
Let's say I have data like this in a table called "Campaign_Results":Row | Superhero | Title | Date | Impressions | Clicks | Conversions |
1 | Superman | Man of Steel | 2013-06-14 | 5670096 | 796191 | 82538 |
2 | Batman | Batman v Superman: Dawn of Justice | 2016-03-25 | 9742525 | 876292 | 63812 |
3 | Wonder Woman | Wonder Woman | 2017-06-02 | 4847506 | 420674 | 23428 |
But the results need to be returned such that the "Impressions", "Clicks" and "Conversions" are in their own rows.
Organise metrics into a STRUCT
The BigQuery documentation doesn't really give a good explanation of what a STRUCT type is. But I've found that it is really a key-value object. If you're a programmer, then you would be familiar with such data types (called "object" in JavaScript, "dict" in Python, "hash" in Ruby, etc.)Each metric is organised into one STRUCT that has one key, "Metric", and one value, "Data". So in my sample data where I have three Metrics:
- Impressions
- Clicks
- Conversions
- STRUCT('Impressions' AS Metric, Impressions AS Data)
- STRUCT('Clicks' AS Metric, Clicks AS Data)
- STRUCT('Conversions' AS Metric, Conversions AS Data)
Row | Superhero | Title | Date | f0_.Metric | f0_.Data | f1_.Metric | f1_.Data | f2_.Metric | f2_.Data |
1 | Superman | Man of Steel | 2013-06-14 | Impressions | 5670096 | Clicks | 796191 | Conversions | 82538 |
2 | Batman | Batman v Superman: Dawn of Justice | 2016-03-25 | Impressions | 9742525 | Clicks | 876292 | Conversions | 63812 |
3 | Wonder Woman | Wonder Woman | 2017-06-02 | Impressions | 4847506 | Clicks | 420674 | Conversions | 23428 |
Put all STRUCTs into an ARRAY
My next step is to re-arrange the metrics STRUCTs so that they are not "side-by-side", but "up-and-down". An ARRAY helps me do this. Just as with most regular programming languages, an array is a list of values. Unlike those arrays, BigQuery requires all of the values in an ARRAY to be of the same type. The BigQuery documentation describes more limitations of ARRAYs.So for each row of my data, instead of having three separate STRUCTs, I have one ARRAY containing three STRUCTs like this:
[
STRUCT('Impressions' AS Metric, Impressions AS Data),
STRUCT('Clicks' AS Metric, Clicks AS Data),
STRUCT('Conversions' AS Metric, Conversions AS Data)
] AS Metrics_Data
I have also named my ARRAY as "Metrics_Data". This will be used in the next, final step.
The ARRAY arranges my metrics in a "vertical" manner now:
Row | Superhero | Title | Date | Metrics_Data.Metric | Metrics_Data.Data |
1 | Superman | Man of Steel | 2013-06-14 | Impressions | 5670096 |
Clicks | 796191 | ||||
Conversions | 82538 | ||||
2 | Batman | Batman v Superman: Dawn of Justice | 2016-03-25 | Impressions | 9742525 |
Clicks | 876292 | ||||
Conversions | 63812 | ||||
3 | Wonder Woman | Wonder Woman | 2017-06-02 | Impressions | 4847506 |
Clicks | 420674 | ||||
Conversions | 23428 |
Flatten the ARRAY with a CROSS JOIN and UNNEST
The first two steps organise my data into a "Campaign_Results_Metrics_Data" temporary table that is almost ready to be used.The final step is to get the dimensions returned with every "row" of data. … Well, that's what non-BigQuery practitioners would say. In BigQuery terms, I want to "flatten" the data.
The BigQuery documentation describes how to perform this flattening, mentioned in the instructions for querying STRUCTs in an ARRAY. It involves a CROSS JOIN with BigQuery's own UNNEST operator.
So in my final SELECT statement, I CROSS JOIN my "Campaign_Results_Metrics_Data" temporary table with its "Metrics_Data" column (the ARRAY of STRUCTs):
SELECT
...,
Metric_Data
FROM
Campaign_Results_Metrics_Data
CROSS JOIN
UNNEST(Campaign_Results_Metrics_Data.Metrics_Data) AS Metric_Data
That gets my data into the final form that satisfies my requirement:
Row | Superhero | Title | Date | Metric_Data.Metric | Metric_Data.Data |
1 | Superman | Man of Steel | 2013-06-14 | Impressions | 5670096 |
2 | Superman | Man of Steel | 2013-06-14 | Clicks | 796191 |
3 | Superman | Man of Steel | 2013-06-14 | Conversions | 82538 |
4 | Batman | Batman v Superman: Dawn of Justice | 2016-03-25 | Impressions | 9742525 |
5 | Batman | Batman v Superman: Dawn of Justice | 2016-03-25 | Clicks | 876292 |
6 | Batman | Batman v Superman: Dawn of Justice | 2016-03-25 | Conversions | 63812 |
7 | Wonder Woman | Wonder Woman | 2017-06-02 | Impressions | 4847506 |
8 | Wonder Woman | Wonder Woman | 2017-06-02 | Clicks | 420674 |
9 | Wonder Woman | Wonder Woman | 2017-06-02 | Conversions | 23428 |
This form should be sufficient to be used with programming code (e.g. Python, Google Apps Script) and visualisation programs (e.g. Tableau, Data Studio). The latter should also be able to handle aggregation (e.g. sums, counts, averages, etc.) of the unpivoted metrics without further scripting.
BigQuery Standard SQL Recipe for "Unpivoting" a Table
WITH
My_Table_Metrics_Data AS (
SELECT
...,
[
STRUCT('...' AS Metric, ... AS Data),
STRUCT('...' AS Metric, ... AS Data),
] AS Metrics_Data
FROM
`My_Dataset.My_Table`
WHERE
...
)
SELECT
...,
Metric_Data
FROM
My_Table_Metrics_Data
CROSS JOIN
UNNEST(My_Table_Metrics_Data.Metrics_Data) AS Metric_Data
Super helpful, thanks!
ReplyDeleteAwesomly helpful! That missing closing parenthesis in line 13 into the last example makes me feel it a bit tastier even.
ReplyDeleteThanks
This was extremely helpful. Clearest explanation I have found in 3 days. Thanks a lot!
ReplyDeleteGreat post, how would you do the reverse of this essentially pivoting the end result into the inital one
ReplyDeleteThis is EXACTLY what I needed. Thanks so much!
ReplyDeleteHi, This is helpful. How to handle if I have many columns?
ReplyDeleteUnfortunately, I don't know that.
DeleteYou're great!
ReplyDeleteHi, This is really helpful but how can we do pivot from long to wide table
ReplyDeleteThere are some solutions provided in other websites / blogs already. Search for "bigquery pivot table".
DeleteThanks man! Super helped me!
ReplyDeleteYou are amazing! Thank you a million times over!
ReplyDeleteAwesome content, thank you very much!
ReplyDeleteThis was very useful for me, thank you very much!
ReplyDeleteHi, I am not able to segregate the individual metric and Value in different columns. Could you please help?
ReplyDeletei am getting this value in my Metric_Data column
{
Metric: "Actuals",
Data: 5000
}
Did you make an ARRAY out of your STRUCTs?
DeleteThank's a lot! You solved my problem!
ReplyDeleteExcellent mate! Keep doing this kind of post!
ReplyDeleteVery helpful mate.. Exactly what i was looking for..
ReplyDelete