Showing posts with label rest. Show all posts
Showing posts with label rest. Show all posts

QuickTip: Ingesting Google Analytics API with Apache NiFi

QuickTip:   Ingesting Google Analytics API with Apache NiFi 



Design your query / test the API here:

https://ga-dev-tools.appspot.com/query-explorer/



Building this NiFi flow is trivial.



Add your URL with tokens from the Query Explorer console.




You will need to reference the JRE that NiFi is using and it's cacerts if you don't want to build your own trust store.   The default password for JDK 8 is changeit.   No really.



Here are our results in clean JSON



Here are some attributes NiFi shows.


Example JSON Results

{
  "kind": "analytics#gaData",
  "id": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:33&metrics=ga:users,ga:percentNewSessions,ga:sessions&start-date=30daysAgo&end-date=yesterday",
  "query": {
    "start-date": "30daysAgo",
    "end-date": "yesterday",
    "ids": "ga:33",
    "metrics": [
      "ga:users",
      "ga:percentNewSessions",
      "ga:sessions"
    ],
    "start-index": 1,
    "max-results": 1000
  },
  "itemsPerPage": 1000,
  "totalResults": 0,
  "selfLink": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:33&metrics=ga:users,ga:percentNewSessions,ga:sessions&start-date=30daysAgo&end-date=yesterday",
  "profileInfo": {
    "profileId": "333",
    "accountId": "333",
    "webPropertyId": "UA-333-3",
    "internalWebPropertyId": "33",
    "profileName": "monitorenergy.blogspot.com/",
    "tableId": "ga:33"
  },
  "containsSampledData": false,
  "columnHeaders": [
    {
      "name": "ga:users",
      "columnType": "METRIC",
      "dataType": "INTEGER"
    },
    {
      "name": "ga:percentNewSessions",
      "columnType": "METRIC",
      "dataType": "PERCENT"
    },
    {
      "name": "ga:sessions",
      "columnType": "METRIC",
      "dataType": "INTEGER"
    }
  ],
  "totalsForAllResults": {
    "ga:users": "0",
    "ga:percentNewSessions": "0.0",
    "ga:sessions": "0"
  }
}

You should have a lot more data depending on what you have Google Analytics pointing to.   From here you can use QueryRecord or another record processor to automatically covert, query or route this data.   You can infer a schema or build up a permanent one and store it in Cloudera Schema Registry.   I recommend doing that if this is a frequent process.

Download a reference NiFi flow here:

https://github.com/tspannhw/flows

References:

https://developers.google.com/analytics/devguides/reporting/core/v4

https://developers.google.com/analytics

NiFi Toolkit - CLI - For NiFi 1.10

NiFi Toolkit - CLI - For NiFi 1.10

Along with the updated Apache NiFi server, the NiFi 1.10 release also updated the Command Line Interface with some updated and new features.   Let's check them out.

Cool Tools

S2S.sh - send data to Apache NiFi via the CLI.

Formatted as such:
[{"attributes":{"key":"value"},"data":"stuff"}]

Examples

registry import-flow-version


Get Into Interactive Mode

./cli.sh




Get Parameter Contexts (simple or json format)

 nifi list-param-contexts -u http://localhost:8080 -ot simple



Export Parameter Context

nifi export-param-context -u http://localhost:8080 -verbose --paramContextId 8067d863-016e-1000-f0f7-265210d3e7dc 




Get Services

 nifi get-services -u http://localhost:8080


NiFi Dump

../bin/nifi.sh dump filedump.txt

NiFi home: /Users/tspann/Documents/nifi-1.10.0

Bootstrap Config File: /Users/tspann/Documents/nifi-1.10.0/conf/bootstrap.conf

2019-11-18 17:08:04,921 INFO [main] org.apache.nifi.bootstrap.Command Successfully wrote thread dump to /Users/tspann/Documents/nifi-1.10.0/filedump.txt

NiFi Diagnostics

../bin/nifi.sh diagnostics diag.txt

Java home:
NiFi home: /Users/tspann/Documents/nifi-1.10.0

Bootstrap Config File: /Users/tspann/Documents/nifi-1.10.0/conf/bootstrap.conf

2019-11-18 17:11:09,844 INFO [main] org.apache.nifi.bootstrap.Command Successfully wrote diagnostics information to /Users/tspann/Documents/nifi-1.10.0/diag.txt

2019-11-18 17:11:10,041 INFO [main] org.apache.nifi.bootstrap.Command gopherProxySet = false
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command awt.toolkit = sun.lwawt.macosx.LWCToolkit
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command java.specification.version = 11
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command sun.cpu.isalist =
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command sun.jnu.encoding = UTF-8
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command java.class.path = /Users/tspann/Documents/nifi-1.10.0/./conf:/Users/tspann/Documents/nifi-1.10.0/./lib/jetty-schemas-3.1.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/slf4j-api-1.7.26.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/stanford-english-corenlp-2018-02-27-models.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/jcl-over-slf4j-1.7.26.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/javax.servlet-api-3.1.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/logback-classic-1.2.3.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-properties-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-nar-utils-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-api-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-framework-api-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/jul-to-slf4j-1.7.26.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/logback-core-1.2.3.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/log4j-over-slf4j-1.7.26.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-runtime-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/javax.annotation-api-1.3.2.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/jaxb-core-2.3.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/javax.activation-api-1.2.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/jaxb-impl-2.3.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/jaxb-api-2.3.0.jar
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command java.vm.vendor = Amazon.com Inc.
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command javax.security.auth.useSubjectCredsOnly = true
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.arch.data.model = 64
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.font.fontmanager = sun.font.CFontManager
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command java.vendor.url = https://aws.amazon.com/corretto/
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command user.timezone = America/New_York
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command org.apache.nifi.bootstrap.config.log.dir = /Users/tspann/Documents/nifi-1.10.0/logs
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command os.name = Mac OS X
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command java.vm.specification.version = 11
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command nifi.properties.file.path = /Users/tspann/Documents/nifi-1.10.0/./conf/nifi.properties
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.java.launcher = SUN_STANDARD
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command user.country = US
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.boot.library.path = /Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home/lib
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command app = NiFi
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.java.command = org.apache.nifi.NiFi
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command jdk.debug = release
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command org.apache.jasper.compiler.disablejsr199 = true
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.cpu.endian = little
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command user.home = /Users/tspann
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command user.language = en
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.specification.vendor = Oracle Corporation
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.version.date = 2019-07-16
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.home = /Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command file.separator = /
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.vm.compressedOopsMode = Zero based
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command line.separator =

2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.specification.name = Java Platform API Specification
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.vm.specification.vendor = Oracle Corporation
2019-11-18 17:11:10,050 INFO [main] org.apache.nifi.bootstrap.Command javax.xml.xpath.XPathFactory:http://saxon.sf.net/jaxp/xpath/om = net.sf.saxon.xpath.XPathFactoryImpl
2019-11-18 17:11:10,050 INFO [main] org.apache.nifi.bootstrap.Command java.awt.graphicsenv = sun.awt.CGraphicsEnvironment
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.awt.headless = true
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.protocol.handler.pkgs = sun.net.www.protocol
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command sun.management.compiler = HotSpot 64-Bit Tiered Compilers
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.runtime.version = 11.0.4+11-LTS
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command user.name = tspann
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.net.preferIPv4Stack = true
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command path.separator = :
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.security.egd = file:/dev/urandom
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command org.jruby.embed.localvariable.behavior = persistent
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command os.version = 10.14.6
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.runtime.name = OpenJDK Runtime Environment
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command file.encoding = UTF-8
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command sun.net.http.allowRestrictedHeaders = true
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command jnidispatch.path = /var/folders/t5/xz5j50wx2rl8kd3021lkbn800000gn/T/jna--864347536/jna3349001211756681540.tmp
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.vm.name = OpenJDK 64-Bit Server VM
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command jna.platform.library.path = /usr/lib:/usr/lib
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.vendor.version = Corretto-11.0.4.11.1
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command jna.loaded = true
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.vendor.url.bug = https://github.com/corretto/corretto-11/issues/
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command jetty.git.hash = afcf563148970e98786327af5e07c261fda175d3
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.io.tmpdir = /var/folders/t5/xz5j50wx2rl8kd3021lkbn800000gn/T/
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.version = 11.0.4
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command user.dir = /Users/tspann/Documents/nifi-1.10.0
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command os.arch = x86_64
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command nifi.bootstrap.listen.port = 55105
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.vm.specification.name = Java Virtual Machine Specification
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.awt.printerjob = sun.lwawt.macosx.CPrinterJob
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command sun.os.patch.level = unknown
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command bridj.quiet = true
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.library.path = /Users/tspann/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.vendor = Amazon.com Inc.
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.vm.info = mixed mode
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.vm.version = 11.0.4+11-LTS
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command sun.io.unicode.encoding = UnicodeBig
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.class.version = 55.0


Resources

Using GrovePi with Raspberry Pi and MiNiFi Agents for Data Ingest to Parquet, Kudu, ORC, Kafka, Hive and Impala

Using GrovePi with Raspberry Pi and MiNiFi Agents for Data Ingest


Source Code:  https://github.com/tspannhw/minifi-grove-sensors

Acquiring sensor data from Grove sensors is easy using a GrovePi Hat and some compatible sensors.


Just before my talk at the Future of Data Meetup @ Bell Works in Holmdel, NJ, I thought I should ingest some data from a grove sensor interface.

It's so easy a sleeping cat could do it.




So what does this device look like?  



I have a temperature and humidity sensor on there.




The distance sonic sensor is in there too, that's for the next article.




Let's do this with minimal RAM.




That's a 64GB hard drive underneath in the white case with the RPI.





I need more data and BACON.



We design our MiNiFi Agent Flow in CEM/EFM.   Grab JSON data stream and run sensors.


Apache NiFi 1.9.2 / CFM 1.0 Received HTTPS S2S Events From MiNiFi Agent




A simple flow to query and convert our JSON data, then store it to Kudu and HDFS (ORC) as well as push it to Kafka with a schema.




Let's read that Kafka message and store to Parquet, we will push to MQTT and JMS in the next article.   This is our universal proxy/gateway.



We could infer a schema and not save it.   But by saving a schema to the schema registry it makes SMM, Kafka, NiFi and others schema aware and easy to automagically query and convert between CSV/JSON/XML/AVRO/Parquet and more.

Let's store the data in Parquet files on HDFS with an Impala table.   In Apache NiFi 1.10 there is a ParquetWriter



Before we push to Kafka, let's create a topic for it with Cloudera SMM



Let's build an impala table for that Kudu data.



We can query our tables with ease as data rapidly is added.





Let's Examine the Parquet Files that NiFi Generated





 Let's query that parquet data with Impala in Hue



 Let's monitor that data in Kafka with Cloudera SMM






That was easy from device to enterprise cloud data store(s) with enterprise messages, security, governance, lineage, data catalog, SDX, monitoring and more.   How easy can you ingest IoT data, query it mid stream and store it in multiple data stores.   It took longer to write the article then to do the project and code.   All graphical, Single Sign On, multiple schemas/verisons/data types/engines, multiple OSs, edge, cloud and laptop.   Easy.

Table DDL


CREATE EXTERNAL TABLE IF NOT EXISTS grovesensors2 
(humidity STRING, uuid STRING, systemtime STRING, runtime STRING, cpu DOUBLE, id STRING, te STRING, host STRING, `end` STRING, 
macaddress STRING, temperature STRING, diskusage STRING, memory DOUBLE, ipaddress STRING, host_name STRING) 
STORED AS ORC
LOCATION '/tmp/grovesensors'

CREATE TABLE grovesensors ( uuid STRING,  `end` STRING,humidity STRING, systemtime STRING, runtime STRING, cpu DOUBLE, id STRING, te STRING, 
host STRING,
macaddress STRING, temperature STRING, diskusage STRING, memory DOUBLE, ipaddress STRING, host_name STRING,
PRIMARY KEY (uuid, `end`)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU
TBLPROPERTIES ('kudu.num_tablet_replicas' = '1')

hdfs dfs -mkdir -p /tmp/grovesensors
hdfs dfs -mkdir -p /tmp/groveparquet

CREATE  EXTERNAL TABLE grove_parquet 
 (
 diskusage STRING, 
  memory DOUBLE,  host_name STRING,
  systemtime STRING,
  macaddress STRING,
  temperature STRING,
  humidity STRING,
  cpu DOUBLE,
  uuid STRING,  ipaddress STRING,
  host STRING,
  `end` STRING,  te STRING,
  runtime STRING,
  id STRING
)
STORED AS PARQUET
LOCATION '/tmp/groveparquet/'

Parquet Format



message org.apache.nifi.grove {
  optional binary diskusage (STRING);
  optional double memory;
  optional binary host_name (STRING);
  optional binary systemtime (STRING);
  optional binary macaddress (STRING);
  optional binary temperature (STRING);
  optional binary humidity (STRING);
  optional double cpu;
  optional binary uuid (STRING);
  optional binary ipaddress (STRING);
  optional binary host (STRING);
  optional binary end (STRING);
  optional binary te (STRING);
  optional binary runtime (STRING);
  optional binary id (STRING);
}

References







Migrating Apache Flume Flows to Apache NiFi: Kafka Source to HTTP REST Sink and HTTP REST Source to Kafka Sink

Migrating Apache Flume Flows to Apache NiFi:  Kafka Source to HTTP REST Sink and HTTP REST Source to Kafka Sink


This is a simple use case of being a gateway between REST API and Kafka.   We can do a lot more than that in NiFi.  We can be a Kafka Consumer and Producer as well as POST REST calls and receive any REST calls on configurable ports.  All with No Code.

NiFi can act as a listener for HTTP Requests and provide HTTP Responses in a scriptable full Web Server mechanism with JETTY.   Or it can listen for HTTP REST calls on a port and route those files anywhere. https://community.cloudera.com/t5/Community-Articles/Parsing-Web-Pages-for-Images-with-Apache-NiFi/ta-p/248415 .  We can also do websockets https://community.cloudera.com/t5/Community-Articles/An-Example-WebSocket-Application-in-Apache-NiFi-1-1/ta-p/248598.  https://community.cloudera.com/t5/Community-Articles/Accessing-Feeds-from-EtherDelta-on-Trades-Funds-Buys-and/ta-p/248316

It is extremely easy to do this in NiFi.




Kafka Consumer to REST POST



HTTP REST to Kafka Producer




Full Monitoring on Apache NiFi






A Very Common Use Case:  Ingesting Stock Feeds From REST to Kafka



References