Setting up googleAnalyticsR on Google Cloud Platform's Compute Engine

When it comes to using Google Analytics data in R, you have quite a number of packages to work with. One such package that has seen recent, frequent updates is googleAnalyticsR, written by Mark Edmondson.

googleAnalyticsR works great when you're running it in your own script. But when you need to use it on a remote server, for example, to generate scheduled reports, you may find yourself jumping through hoops and hurdles just to set this up, particularly with authorization.

Here's a guide based on my experience of setting up googleAnalyticsR on a Google Cloud Platform (GCP) Compute Engine instance. It covers all the required setup steps, from software installation to authorization to running a quick test.

How to setup googleAnalyticsR on a Google Cloud Platform Compute Engine instance

Setup a Compute Engine instance

The bare minimum specifications for your Compute Engine instance should be:
  • n1-standard-1, i.e. 1 vCPU with 3.75GB memory
  • 10GB boot disk running the latest version of Ubuntu
  • Allow full access to all Cloud APIs.
You can host it in any zone that you want.

Note that the above specification will incur a cost. I strongly discourage you from using an instance that is within GCP's free tier. This is because that machine would be so underpowered that your R script will run very slowly or require a lot of memory (since R stores all of its data in memory) that the operating system could auto-kill your script.

Install R and required software

You can install R and get up and running. But to use googleAnalyticsR, you'll also need to install
  • libcurl
  • libxml
So you might as well install everything at one shot.

SSH into your Compute Engine instance, then run these commands:

sudo apt update
sudo apt -y install libssl-dev libcurl4-openssl-dev libxml2-dev r-base
sudo apt -y upgrade
sudo apt -y autoclean
sudo apt --purge -y autoremove


After all of the software has been installed, install the googleAnalyticsR package.

sudo Rscript -e "install.packages('googleAnalyticsR')"

Important! You should install the package as a super-user for 2 reasons:
  1. The package can be used by any other user in the same instance.
  2. You avoid creating an R library in your home folder, which can lead to unintended consequences.

Enable the Analytics APIs

In your GCP project, you need to allow it to work with the Google Analytics-related APIs.
  1. Go to your GCP project.
  2. Navigate to API Library.
  3. Search for "analytics".
You should see these two APIs:
  • Google Analytics Reporting API
  • Analytics API
Click each of them and enable them.

Create a Client ID and Secret

By itself, googleAnalyticsR will be able to run using its default authorization. But that means you're sharing that authorization with everyone else in the world who is using googleAnalyticsR with its default setting, which can then lead to going over the Google Analytics API request quota. That won't work when you need your R script to run continuously on its own.

So you will need to create your own Client ID and Secret to authorize googleAnalyticsR's requests against your own GCP project.
  1. Go to your GCP project.
  2. Navigate to Credentials.
  3. Click the "Create credentials" button, then choose "OAuth client ID".
  4. Choose "Other" application type. Provide a suitable name, e.g. "googleAnalyticsR client ID and secret".
  5. Back in the main Credentials page, look for your new Client ID. Download its JSON.
  6. (optional but recommended) Rename your downloaded JSON file to something familiar, e.g. "googleAnalyticsR-client-id-secret.json".
Remember where you have downloaded the JSON file on your computer. You will need to upload it to the Compute Engine instance. If you're using macOS or a Unix computer, you can copy the file with scp.

For example, to copy it to your home folder in the Compute Engine instance:

scp googleAnalyticsR-client-id-secret.json username@ip-address:~/.

(Replace username and ip-address appropriately.)

Run googleAnalyticsR manually once to authorize

SSH into your Compute Engine instance, then run this command:

R

You should now be in the R environment, where you can run familiar R commands. Run these commands:

library(googleAuthR)
library(googleAnalyticsR)

setwd("~") # set your working directory to your home folder in the instance

gar_set_client("googleanalyticsR-client-ID-secret.json")
gar_auth('.httr-oauth')

# verify that your Google Analytics authorisation is working by getting a list of GA accounts
ga_account_list()


You will see some messages about trying to open a URL. The crucial part is in the last line, where you can see a http://... URL.
  1. Select and copy the URL.
  2. Open the URL in your web browser.
  3. Login to Google if you haven't already done so.
  4. Allow your GCP project to be authorized to work with your Google Analytics data.
At the end, your browser should show you a pretty empty page with a blue header and a seemingly random alphanumeric string.
  1. Copy the alphanumeric string
  2. Go back to your R script that was left hanging waiting for authorization.
  3. Paste the string that you had copied and press Enter.
After that, you should see a list of all of the Google Analytics accounts, properties and views that you can access.

Use googleAnalyticsR with your R file

Now that you have authorized googleAnalyticsR in the Compute Engine instance, you can use googleAnalyticsR with your R file, as long as you always run gar_auth() with .httr-oauth, a file that stores the authorization token.

E.g. at the start of your R file, you should have these lines:

library(googleAuthR)
library(googleAnalyticsR)

setwd("~") # set your working directory to your home folder in the instance

gar_set_client("googleanalyticsR-client-ID-secret.json")
gar_auth('.httr-oauth')

# the rest of your Google Analytics code goes here


Your R file can now run all of its Google Analytics queries in the Compute Engine instance without needing any further authorization from you.

Recipe for using googleAnalyticsR on a Compute Engine instance

  1. Create a n1-standard-1 Compute Engine instance.
  2. Install R and other required software.
  3. Enable the Analytics APIs.
  4. Create a Client ID and secret.
  5. Authorize googleAnalyticsR by running it manually.
  6. Add googleAnalyticsR to your R file.

Comments

Popular posts from this blog

How to "unpivot" a table in BigQuery

Adobe Analytics and Google Analytics terminologies cheat sheet

Track Brightcove IFRAME video playback (bonus: with Adobe Launch)