Setting up googleAnalyticsR on Google Cloud Platform's Compute Engine
When it comes to using Google Analytics data in R, you have quite a number of packages to work with. One such package that has seen recent, frequent updates is googleAnalyticsR, written by Mark Edmondson.
googleAnalyticsR works great when you're running it in your own script. But when you need to use it on a remote server, for example, to generate scheduled reports, you may find yourself jumping through hoops and hurdles just to set this up, particularly with authorization.
Here's a guide based on my experience of setting up googleAnalyticsR on a Google Cloud Platform (GCP) Compute Engine instance. It covers all the required setup steps, from software installation to authorization to running a quick test.
Note that the above specification will incur a cost. I strongly discourage you from using an instance that is within GCP's free tier. This is because that machine would be so underpowered that your R script will run very slowly or require a lot of memory (since R stores all of its data in memory) that the operating system could auto-kill your script.
SSH into your Compute Engine instance, then run these commands:
sudo apt update
sudo apt -y install libssl-dev libcurl4-openssl-dev libxml2-dev r-base
sudo apt -y upgrade
sudo apt -y autoclean
sudo apt --purge -y autoremove
After all of the software has been installed, install the googleAnalyticsR package.
sudo Rscript -e "install.packages('googleAnalyticsR')"
Important! You should install the package as a super-user for 2 reasons:
So you will need to create your own Client ID and Secret to authorize googleAnalyticsR's requests against your own GCP project.
For example, to copy it to your home folder in the Compute Engine instance:
scp googleAnalyticsR-client-id-secret.json username@ip-address:~/.
(Replace username and ip-address appropriately.)
R
You should now be in the R environment, where you can run familiar R commands. Run these commands:
library(googleAuthR)
library(googleAnalyticsR)
setwd("~") # set your working directory to your home folder in the instance
gar_set_client("googleanalyticsR-client-ID-secret.json")
gar_auth('.httr-oauth')
# verify that your Google Analytics authorisation is working by getting a list of GA accounts
ga_account_list()
You will see some messages about trying to open a URL. The crucial part is in the last line, where you can see a http://... URL.
E.g. at the start of your R file, you should have these lines:
library(googleAuthR)
library(googleAnalyticsR)
setwd("~") # set your working directory to your home folder in the instance
gar_set_client("googleanalyticsR-client-ID-secret.json")
gar_auth('.httr-oauth')
# the rest of your Google Analytics code goes here
Your R file can now run all of its Google Analytics queries in the Compute Engine instance without needing any further authorization from you.
googleAnalyticsR works great when you're running it in your own script. But when you need to use it on a remote server, for example, to generate scheduled reports, you may find yourself jumping through hoops and hurdles just to set this up, particularly with authorization.
Here's a guide based on my experience of setting up googleAnalyticsR on a Google Cloud Platform (GCP) Compute Engine instance. It covers all the required setup steps, from software installation to authorization to running a quick test.
How to setup googleAnalyticsR on a Google Cloud Platform Compute Engine instance
Setup a Compute Engine instance
The bare minimum specifications for your Compute Engine instance should be:- n1-standard-1, i.e. 1 vCPU with 3.75GB memory
- 10GB boot disk running the latest version of Ubuntu
- Allow full access to all Cloud APIs.
Note that the above specification will incur a cost. I strongly discourage you from using an instance that is within GCP's free tier. This is because that machine would be so underpowered that your R script will run very slowly or require a lot of memory (since R stores all of its data in memory) that the operating system could auto-kill your script.
Install R and required software
You can install R and get up and running. But to use googleAnalyticsR, you'll also need to install- libcurl
- libxml
SSH into your Compute Engine instance, then run these commands:
sudo apt update
sudo apt -y install libssl-dev libcurl4-openssl-dev libxml2-dev r-base
sudo apt -y upgrade
sudo apt -y autoclean
sudo apt --purge -y autoremove
After all of the software has been installed, install the googleAnalyticsR package.
sudo Rscript -e "install.packages('googleAnalyticsR')"
Important! You should install the package as a super-user for 2 reasons:
- The package can be used by any other user in the same instance.
- You avoid creating an R library in your home folder, which can lead to unintended consequences.
Enable the Analytics APIs
In your GCP project, you need to allow it to work with the Google Analytics-related APIs.- Go to your GCP project.
- Navigate to API Library.
- Search for "analytics".
- Google Analytics Reporting API
- Analytics API
Create a Client ID and Secret
By itself, googleAnalyticsR will be able to run using its default authorization. But that means you're sharing that authorization with everyone else in the world who is using googleAnalyticsR with its default setting, which can then lead to going over the Google Analytics API request quota. That won't work when you need your R script to run continuously on its own.So you will need to create your own Client ID and Secret to authorize googleAnalyticsR's requests against your own GCP project.
- Go to your GCP project.
- Navigate to Credentials.
- Click the "Create credentials" button, then choose "OAuth client ID".
- Choose "Other" application type. Provide a suitable name, e.g. "googleAnalyticsR client ID and secret".
- Back in the main Credentials page, look for your new Client ID. Download its JSON.
- (optional but recommended) Rename your downloaded JSON file to something familiar, e.g. "googleAnalyticsR-client-id-secret.json".
For example, to copy it to your home folder in the Compute Engine instance:
scp googleAnalyticsR-client-id-secret.json username@ip-address:~/.
(Replace username and ip-address appropriately.)
Run googleAnalyticsR manually once to authorize
SSH into your Compute Engine instance, then run this command:R
You should now be in the R environment, where you can run familiar R commands. Run these commands:
library(googleAuthR)
library(googleAnalyticsR)
setwd("~") # set your working directory to your home folder in the instance
gar_set_client("googleanalyticsR-client-ID-secret.json")
gar_auth('.httr-oauth')
# verify that your Google Analytics authorisation is working by getting a list of GA accounts
ga_account_list()
You will see some messages about trying to open a URL. The crucial part is in the last line, where you can see a http://... URL.
- Select and copy the URL.
- Open the URL in your web browser.
- Login to Google if you haven't already done so.
- Allow your GCP project to be authorized to work with your Google Analytics data.
- Copy the alphanumeric string
- Go back to your R script that was left hanging waiting for authorization.
- Paste the string that you had copied and press Enter.
Use googleAnalyticsR with your R file
Now that you have authorized googleAnalyticsR in the Compute Engine instance, you can use googleAnalyticsR with your R file, as long as you always run gar_auth() with .httr-oauth, a file that stores the authorization token.E.g. at the start of your R file, you should have these lines:
library(googleAuthR)
library(googleAnalyticsR)
setwd("~") # set your working directory to your home folder in the instance
gar_set_client("googleanalyticsR-client-ID-secret.json")
gar_auth('.httr-oauth')
# the rest of your Google Analytics code goes here
Your R file can now run all of its Google Analytics queries in the Compute Engine instance without needing any further authorization from you.
Recipe for using googleAnalyticsR on a Compute Engine instance
- Create a n1-standard-1 Compute Engine instance.
- Install R and other required software.
- Enable the Analytics APIs.
- Create a Client ID and secret.
- Authorize googleAnalyticsR by running it manually.
- Add googleAnalyticsR to your R file.
Comments
Post a Comment