Posts

Showing posts from 2017

Setting up R Studio Server on Google Compute Engine

Image
For many data scientists, R is a must-have tool for running all kinds of data analyses. And many of them would prefer to use R Studio Server to take advantage of server-level machine specifications (e.g. access more RAM memory than a desktop computer would provide) and allow for collaboration with colleagues and peers. The good people at rocker-org can help you get up and running with R Studio Server using a Docker container . But if you want to setup R Studio Server from scratch on Google Compute Engine, then you can follow this guide. Recipe to run R Studio Server on Google Compute Engine (with screenshots) Set up a Compute Engine VM instance in Google Cloud. Install R Studio Server. Create users and groups. This recipe assumes that you have a Google Cloud account already. If not, create one with billing enabled. Then create a project, if one hasn't been created automatically. 1. Set up a Compute Engine VM instance in Google Cloud Actually, before setting up the V...

Querying Google Analytics data as flat tables from BigQuery: The Definitive Guide and Recipe

Image
If you're using Google Analytics 360, the premium version of Google Analytics, then chances are you have setup an automatic export of your Google Analytics data into BigQuery. This allows your hit-level data in Google Analytics to be exported to BigQuery datasets every day. But Google does not provide much help with figuring out how to query your data! Google Analytics provides a cookbook of sample BigQuery queries . But these are very limited in scope. And they omit the most trivial -- yet important -- query that you will ever run: Get a flat table of results that you can export into a CSV file or a SQL database Flat tables are essential to perform further work on the results with Python, R and other data science languages. Another flaw in the cookbook is that it uses BigQuery's older Legacy SQL. BigQuery is already moving to its Standard SQL. But examples based on Google Analytics data were either difficult to find or based on guesswork that had not been tested...

Do not use blank "campaignId" strings with Google Analytics tags

Image
Recently, I encountered this curious problem with a client's Google Analytics Source/Medium report: the top Source/Medium by sessions was "(not set) / (not set)". This was very, very unusual. What could possibly cause GA to not recognise Source/Mediums correctly? The short answer was that a blank string had been set as the "campaignId" in the GA tags. Since GA couldn't interpret the blank string, it simply saved the Source/Medium as "(not set) / (not set)". Special thanks to Jason Packer (@jhpacker in measure.slack.com ) for his help here.