Showing posts with label google-analytics. Show all posts
Showing posts with label google-analytics. Show all posts

QuickTip: Ingesting Google Analytics API with Apache NiFi

QuickTip:   Ingesting Google Analytics API with Apache NiFi 



Design your query / test the API here:

https://ga-dev-tools.appspot.com/query-explorer/



Building this NiFi flow is trivial.



Add your URL with tokens from the Query Explorer console.




You will need to reference the JRE that NiFi is using and it's cacerts if you don't want to build your own trust store.   The default password for JDK 8 is changeit.   No really.



Here are our results in clean JSON



Here are some attributes NiFi shows.


Example JSON Results

{
  "kind": "analytics#gaData",
  "id": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:33&metrics=ga:users,ga:percentNewSessions,ga:sessions&start-date=30daysAgo&end-date=yesterday",
  "query": {
    "start-date": "30daysAgo",
    "end-date": "yesterday",
    "ids": "ga:33",
    "metrics": [
      "ga:users",
      "ga:percentNewSessions",
      "ga:sessions"
    ],
    "start-index": 1,
    "max-results": 1000
  },
  "itemsPerPage": 1000,
  "totalResults": 0,
  "selfLink": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:33&metrics=ga:users,ga:percentNewSessions,ga:sessions&start-date=30daysAgo&end-date=yesterday",
  "profileInfo": {
    "profileId": "333",
    "accountId": "333",
    "webPropertyId": "UA-333-3",
    "internalWebPropertyId": "33",
    "profileName": "monitorenergy.blogspot.com/",
    "tableId": "ga:33"
  },
  "containsSampledData": false,
  "columnHeaders": [
    {
      "name": "ga:users",
      "columnType": "METRIC",
      "dataType": "INTEGER"
    },
    {
      "name": "ga:percentNewSessions",
      "columnType": "METRIC",
      "dataType": "PERCENT"
    },
    {
      "name": "ga:sessions",
      "columnType": "METRIC",
      "dataType": "INTEGER"
    }
  ],
  "totalsForAllResults": {
    "ga:users": "0",
    "ga:percentNewSessions": "0.0",
    "ga:sessions": "0"
  }
}

You should have a lot more data depending on what you have Google Analytics pointing to.   From here you can use QueryRecord or another record processor to automatically covert, query or route this data.   You can infer a schema or build up a permanent one and store it in Cloudera Schema Registry.   I recommend doing that if this is a frequent process.

Download a reference NiFi flow here:

https://github.com/tspannhw/flows

References:

https://developers.google.com/analytics/devguides/reporting/core/v4

https://developers.google.com/analytics