[PYTHON] Export access data for each user of Google Analytics.

goal

Create an instance on Google Cloud Compute (GCE), use a Python program to export Google Analytics (GA) data, import it into MySQL, and export the formatted data. I will aim.

Environment (GCP)

OS: Ubuntu 14.04 LTS Machine Type: g1-small (1 vCPU, 1.7 GB memory) Other settings are the default, but please grant permission to use the API as an exception.

procedure

Advance preparation

Get the IP of the access user with GA.

The access log for each user is acquired by default in the free version. But there is no axis to make the user unique. Therefore, the IP of the access user is acquired as the axis for grouping. The flow is as follows. ① Define a custom dimension in GA. ② Add the code to acquire the IP to the JavaScript of the website. ③ Confirm that you can get it on GA.

■ Detailed explanation ① Define a custom dimension in GA. First, define a custom dimension. Use custom dimensions when you want to send more than the standard information provided by a web page. image

Create a new custom dimension. (Created in the figure) image

Create it with any name. image

Review the code to get the value of the custom dimension you created. If you can paste this code into the header of a web page and set a value for the added custom dimension, you can check it on Google Analytics. image

Specifying trackCommonMethod.getIP () for dimensionValue sets the IP for this custom dimension and sends it to the GA. Add this script to your site. Be sure to add it before ga ('send','pageview') ;. This service seems to have been created by Mr. Kei, who runs the following page. Thank you very much. http://lfll.blog73.fc2.com/blog-entry-258.html

If you are reluctant to use this, it seems that you can do it by writing your own script. In my case, I embedded it in Jimdo's site, so I couldn't write the script freely, so I put it sideways.


<script type="text/javascript" src="//www.analyticsip.net/getIP/public_html/ra/script.php">

//Omission

</script>
  ga('set', 'dimension1', trackCommonMethod.getIP());
  ga('send', 'pageview');

Next, when the GA data is updated, I think that the IP will be displayed by specifying the custom dimension created earlier for the secondary dimension of each report.

image

Export access data from GA.

Now that you are ready, refer to the official page and create a service account. https://developers.google.com/analytics/

Create a Python code like the one below to export the GA data to CSV.

#!/usr/bin/python
# -*- coding: utf-8 -*-
 
import sys
import HelloAnalytics
from apiclient.errors import HttpError
from oauth2client.client import AccessTokenRefreshError
import csv
import codecs
import subprocess
import datetime
import os
 
# information for call API.
scope = ['https://www.googleapis.com/auth/analytics.readonly']
service_account_email = 'Email the service account you created'
key_file_location = 'Placement path of the created key file'
view_id = 'View ID that can be obtained on the GA management screen'
 
# Authenticate and construct service.
#The connection settings here are from other sources, so please correct them.
service = HelloAnalytics.get_service('analytics', 'v3', scope, key_file_location, service_account_email)
args = sys.argv
exec_date = args[1]
   
print ""
print "exec_date:" + exec_date
 
#--------------------------------------------
#Data collection(Main)
#Get daily information about page views
#--------------------------------------------
results = service.data().ga().get(
  ids='ga:' + view_id,
  start_date='7daysAgo',
  end_date='today',
  metrics='ga:pageviews',
  dimensions='ga:dimension1,ga:pagePath',
  max_results=20000
).execute()
rows = results.get('rows')
 
# Write Output File(CSV)
csvFileName = "GoogleAnalyticsData_" + exec_date + ".csv"
fout = codecs.open(csvFileName, 'w', 'utf-8')
filestr = "analysis_date" \
  + "," + "dimension1" \
  + "," + "pagepath" \
  + "," + "pageviews" + "\n"
fout.write(filestr);
 
count = 0;
for row in rows:
  dimension1 = row[0]
  pagepath = row[1];
  pageviews = row[2];
  filestr = exec_date \
    + "," + dimension1 \
    + "," + pagepath \
    + "," + pageviews + "\n" 

  fout.write(filestr)
  count = count+1;
 
print count;

Import the exported data into MySQL and format it.

In my case, I wanted to export from GA, import to MySQL, and format all at once, so I created the following shell. Please change the file path, ID and password as appropriate.

#!bin/bash
today=`date '+%F'`
lastWeek=`date --date '7days ago' +%F`
# Export CSY From GoogleAnalytics
#The previous program
python exportData.py $today
mv GoogleAnalyticsData_$today.csv files/

# Import Data From CSV To MySql
mysql -u analytics -panalytics  -D analytics -e "LOAD DATA LOCAL INFILE '/usr/local/analytics/files/GoogleAnalyticsData_$today.csv' IGNORE INTO TABLE raw_data FIELDS TERMINATED BY ',' ENCLOSED BY '""' IGNORE 1 LINES;"
#We have prepared a SQL file to format the imported data.
mysql -u analytics -panalytics  -D analytics < ImportData.sql

# Create CSV For TensorFlow
#Since MySQL could not output the column name, I made it feel like outputting the data to the template file with only the column name.
mv /var/lib/mysql-files/ExportData.csv files/
if [ -f files/tfData_$today.csv ]; then
  rm files/tfData_$today.csv
fi
touch files/tfData_$today.csv
cat files/columns.csv  >> files/tfData_$today.csv
cat files/ExportData.csv  >> files/tfData_$today.csv

Impressions

Since GA has a lot of information on the Internet, I was able to achieve my goal with almost no difficulty.

Recommended Posts

Export access data for each user of Google Analytics.
Regular export of Google Analytics raw data to BigQuery using cloud functions
Get data from analytics API with Google API Client for python
Export Google Analytics Standard to BigQuery
Limit ssh with iptables for each user
Visualize the export data of Piyo log
Data acquisition from analytics API with Google API Client for python Part 2 Web application
Basics of pandas for beginners ② Understanding data overview
What to use for Python stacks and queues (speed comparison of each data structure)