Showing posts with label apache-nifi. Show all posts
Showing posts with label apache-nifi. Show all posts

Year and Decade End Report : 201*

A Year in Big Data 2019

This has been an amazing year for Big Data, Streaming, Tech, The Cloud, AI, IoT and everything in between.   I got to witness the merger of the two Big Data giants into one unstoppable cloud data machine from the inside.   The sum of the new Cloudera is far greater than just Hortonworks + Cloudera.   It has been a great year working with amazing engineers, leaders, clients, partners, prospects, community members, data scientists, analysts, marketing mavens and everyone I have gotten to see this year.   It's been busy traveling the globe spreading the good word and solving some tough problems.   

In 2019, Edge2AI became something we could teach and implement in a single day to newbies.   The combination of MiNiFi + NiFi + Kafka + KuDu + Cloud is unstoppable.  Once we added Flink later this year, the FLaNK stack became amazing.   I see amazing stuff for this in the 20's.     I got to use new projects like Kudu (awesome), Impala, Cloudera Manager and new tools from the Data in Motion team.   Streams Messaging Manager became the best way to manage, monitor, create, alert on and use Kafka across clusters anywhere.   This is now my favorite way to demo anything.   So much transparency, awesome.   Having the power of Apache Flink is just making any problem solve-able, even those that scale to thousands of nodes.   Running just one node of Flink has been awesome.   I am a Squirrel Dev now!

Strata, DataWorksSummit and NoSQL Day were awesome, but working with charities and non-profits solving real world problems was amazing.     Helping at Nethope is the highlight of my professional year.   I am so thankful to the Cloudera Foundation for having me help.   I am really impressed with the Cloudera Foundation, Nethope and everyone involved.  I am hoping to speak to a few different conferences in 2020, but we'll see where Edge2AI takes me.

There's a lot to wrap up for 2019, so I attempted to put most of it following this break.


IoT Series: MiNiFi Agent on Raspberry Pi 4 with Enviro+ Hat For Environmental Monitoring and Analytics


IoT Series:  MiNiFi Agent on Raspberry Pi 4 with Enviro+ Hat For Environmental Monitoring and Analytics


Summary:  Our powerful edge device streams sensor readings for environmental readings while also performing edge analytics with deep learning libraries and enhanced edge VPU.   We can perform complex running calculations on sensor data locally on the box before making potentially expense network calls.  We can also decide when to send out data based on heuristics, machine learning or simple if-then logic.

Use Case:   Monitor Environment.   Act Local, Report Global.


Stack:   FLANK


Category:   AI, IoT, Edge2AI, Real-Time Streaming, Sensors, Cameras, Telemetry.


Hardware:  Intel Movidius NCC 2 VPU (Neural Computing), Pimoroni Enviro Plus pHAT, Raspberry Pi 4 (4GB Edition).


Software:  Python 3 + Java + MiNiFi Agents + Cloudera Edge Flow Manager (EFM/CEM) + Apache NiFi.   Using Mm... FLaNK Stack.


Keywords:  Edge2AI, CV, AI, Deep Learning, Cloudera, NiFi, Raspberry Pi, Deep Learning, Sensors, IoT, IIoT, Devices, Java, Agents, FLaNK Stack, VPU, Movidius.








Open Source Assets:  https://github.com/tspannhw/minifi-enviroplus


I am running a Python script that streams sensor data continuously to MQTT to be picked up by MiNiFi agents or NiFi.   For development I am just running my Python script with a shell script and nohup.


enviro.sh
python3 /opt/demo/enviro.py


nohup ./enviro.sh &


Example Enviro Plus pHAT Sensor Data


 {
  "uuid" : "rpi4_uuid_xki_20191220215721",
  "ipaddress" : "192.168.1.243",
  "host" : "rp4",
  "host_name" : "rp4",
  "macaddress" : "dc:a6:32:03:a6:e9",
  "systemtime" : "12/20/2019 16:57:21",
  "cpu" : 0.0,
  "diskusage" : "46958.1 MB",
  "memory" : 6.3,
  "id" : "20191220215721_938f2137-5adb-4c22-867d-cdfbce6431a8",
  "temperature" : "33.590520852237226",
  "pressure" : "1032.0433707836428",
  "humidity" : "7.793797584651376",
  "lux" : "0.0",
  "proximity" : "0",
  "gas" : "Oxidising: 3747.82 Ohms\nReducing: 479652.17 Ohms\nNH3: 60888.05 Ohms"

}



We are also running a standard MiNiFi Java Agent 0.6 that is running a Python application to do sensors, edge AI with Intel's OpenVino and some other analytics.


test.sh


#!/bin/bash


DATE=$(date +"%Y-%m-%d_%H%M")
source /opt/intel/openvino/bin/setupvars.sh
fswebcam -q -r 1280x720 --no-banner /opt/demo/images/$DATE.jpg
python3 -W ignore /opt/intel/openvino/build/test.py /opt/demo/images/$DATE.jpg 2>/dev/null


test.py

https://github.com/tspannhw/minifi-enviroplus/blob/master/test.py



Example OpenVino Data


{"host": "rp4", "cputemp": "67", "ipaddress": "192.168.1.243", "endtime": "1577194586.66", "runtime": "0.00", "systemtime": "12/24/2019 08:36:26", "starttime": "12/24/2019 08:36:26", "diskfree": "46889.0", "memory": "15.1", "uuid": "20191224133626_55157415-1354-4137-8472-424779645fbe", "image_filename": "20191224133626_9317880e-ee87-485a-8627-c7088df734fc.jpg"}


In our flow I convert to Apache Avro, as you can see Avro schema is embedded.







The flow is very simple, consume MQTT messages from our broker on the topic we are pushing messages to from our field sensors.   We also ingest MiNiFi events through standard Apache NiFi HTTP(s) Site-to-Site (S2S).   We route images to our image processor and sensor data right to Kudu tables.





Now that the data is stored to Apache Kudu we can do our analytics.




Easy to Run an MQTT Broker



References:



Demo Info:


https://subscription.packtpub.com/book/application_development/9781787287815/1/ch01lvl1sec12/installing-a-mosquitto-broker-on-macos


Run Mosquitto MQTT on Local Machine (RPI, Mac, Win10, ...)


On OSX, brew install mosquitto


 /usr/local/etc/mosquitto/mosquitto.conf


To have launchd start mosquitto now and restart at login:
  brew services start mosquitto


Or, if you don't want/need a background service you can just run:
  mosquitto -c /usr/local/etc/mosquitto/mosquitto.conf


For Python, we need pip3 install paho-mqtt.


Run Sensors on Device that pushes to MQTT


Python pushes continuous stream of sensor data to MQTT


MiNiFi Agent Reads Broker


Send to Kafka and/or NiFi


Example Image Grabbed From Webcam in Dark Office (It's Christmas Eve!)








 When ready we can push to a CDP Data Warehouse in AWS.



With CDP, it's very easy to have a data environment in many clouds to store my live sensor data.



 I can now use this data in Kudu tables from Cloudera Data Science Workbench for real Data Science, machine learning and insights.




What do we do with all of this data?   Check in soon for real-time analytics and dash boarding.








































Easy Deep Learning in Apache NiFi with DJL


Custom Processor for Deep Learning



 Happy Mm.. FLaNK Day!


I have been experimenting with the awesome new Apache 2.0 licensed Java Deep Learning Library, DJL.   In NiFi I was trying to figure out a quick use case and demo.   So I use my Web Camera processor to grab a still shot from my Powerbook webcam and send it to the processor.   The results are sent to slack.

Since it's the holidays I think of my favorite holiday movies:   The Matrix and Blade Runner.   So I thought a Voight-Kampf test would be fun.   Since I don't have a Deep Learning QA piece built yet, let's start by seeing if you look human.  We'll call them 'person'.   I am testing to see if I am a replicant.  Sometimes hard to tell.   Let's see if DJL thinks I am human.

See:   http://nautil.us/blog/-the-science-behind-blade-runners-voight_kampff-test



Okay, so it least it thinks I am a person.   The classification of a Christmas tree is vaguely accurate.






It did not identify my giant french bread.

Building A New Custom Processor for Deep Learning




The hardest part of was a good NiFi Integration test.   The DJL team provide some great examples and it's really easy to plug into their models.

ZooModel<BufferedImage, DetectedObjects> model =
                     MxModelZoo.SSD.loadModel(criteria, new ProgressBar())
Predictor<BufferedImage, DetectedObjects> predictor = model.newPredictor()
DetectedObjects detection = predictor.predict(img);

All the source is in github and references the below DJL sites and repos.

Using a New Custom Processor as part of a Real-time Holiday Flow

We first add the DeepLearningProcessor to our canvas.



An example flow:
  • GetWebCameraProcessor:  grab an image from an attached webcamera
  • UpdateAttribute:  Add media type for image
  • DeepLearningProcessor:   Run our DJL deep learning model from a zoo
  • PutSlack:   Put DJL results in a text window in slack
  • PostSlack:  Send our DJL altered image to slack
  • Funnel:   Send all failures to Valhalla



If we example the provenance we can see how long it took to run and some other interesting attributes.


We place the results of our image analysis in attributes while we return a new image that has a bounding box on the found object(s).




 We now a full classification workflow for real-time deep learning analysis on images, could be used for Santa watching, Security, Memes and other important business purposes.


The initial release is available here:   https://github.com/tspannhw/nifi-djl-processor/releases/tag/v1.0
Using library and example code from the Deep Java Library (https://djl.ai/).



Source Code:   https://github.com/tspannhw/nifi-djl-processor/

And now for something completely different, Christmas Cats:












NiFi Toolkit - CLI - For NiFi 1.10

NiFi Toolkit - CLI - For NiFi 1.10

Along with the updated Apache NiFi server, the NiFi 1.10 release also updated the Command Line Interface with some updated and new features.   Let's check them out.

Cool Tools

S2S.sh - send data to Apache NiFi via the CLI.

Formatted as such:
[{"attributes":{"key":"value"},"data":"stuff"}]

Examples

registry import-flow-version


Get Into Interactive Mode

./cli.sh




Get Parameter Contexts (simple or json format)

 nifi list-param-contexts -u http://localhost:8080 -ot simple



Export Parameter Context

nifi export-param-context -u http://localhost:8080 -verbose --paramContextId 8067d863-016e-1000-f0f7-265210d3e7dc 




Get Services

 nifi get-services -u http://localhost:8080


NiFi Dump

../bin/nifi.sh dump filedump.txt

NiFi home: /Users/tspann/Documents/nifi-1.10.0

Bootstrap Config File: /Users/tspann/Documents/nifi-1.10.0/conf/bootstrap.conf

2019-11-18 17:08:04,921 INFO [main] org.apache.nifi.bootstrap.Command Successfully wrote thread dump to /Users/tspann/Documents/nifi-1.10.0/filedump.txt

NiFi Diagnostics

../bin/nifi.sh diagnostics diag.txt

Java home:
NiFi home: /Users/tspann/Documents/nifi-1.10.0

Bootstrap Config File: /Users/tspann/Documents/nifi-1.10.0/conf/bootstrap.conf

2019-11-18 17:11:09,844 INFO [main] org.apache.nifi.bootstrap.Command Successfully wrote diagnostics information to /Users/tspann/Documents/nifi-1.10.0/diag.txt

2019-11-18 17:11:10,041 INFO [main] org.apache.nifi.bootstrap.Command gopherProxySet = false
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command awt.toolkit = sun.lwawt.macosx.LWCToolkit
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command java.specification.version = 11
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command sun.cpu.isalist =
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command sun.jnu.encoding = UTF-8
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command java.class.path = /Users/tspann/Documents/nifi-1.10.0/./conf:/Users/tspann/Documents/nifi-1.10.0/./lib/jetty-schemas-3.1.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/slf4j-api-1.7.26.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/stanford-english-corenlp-2018-02-27-models.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/jcl-over-slf4j-1.7.26.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/javax.servlet-api-3.1.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/logback-classic-1.2.3.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-properties-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-nar-utils-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-api-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-framework-api-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/jul-to-slf4j-1.7.26.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/logback-core-1.2.3.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/log4j-over-slf4j-1.7.26.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/nifi-runtime-1.10.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/javax.annotation-api-1.3.2.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/jaxb-core-2.3.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/javax.activation-api-1.2.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/jaxb-impl-2.3.0.jar:/Users/tspann/Documents/nifi-1.10.0/./lib/java11/jaxb-api-2.3.0.jar
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command java.vm.vendor = Amazon.com Inc.
2019-11-18 17:11:10,042 INFO [main] org.apache.nifi.bootstrap.Command javax.security.auth.useSubjectCredsOnly = true
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.arch.data.model = 64
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.font.fontmanager = sun.font.CFontManager
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command java.vendor.url = https://aws.amazon.com/corretto/
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command user.timezone = America/New_York
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command org.apache.nifi.bootstrap.config.log.dir = /Users/tspann/Documents/nifi-1.10.0/logs
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command os.name = Mac OS X
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command java.vm.specification.version = 11
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command nifi.properties.file.path = /Users/tspann/Documents/nifi-1.10.0/./conf/nifi.properties
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.java.launcher = SUN_STANDARD
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command user.country = US
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.boot.library.path = /Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home/lib
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command app = NiFi
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.java.command = org.apache.nifi.NiFi
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command jdk.debug = release
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command org.apache.jasper.compiler.disablejsr199 = true
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command sun.cpu.endian = little
2019-11-18 17:11:10,043 INFO [main] org.apache.nifi.bootstrap.Command user.home = /Users/tspann
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command user.language = en
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.specification.vendor = Oracle Corporation
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.version.date = 2019-07-16
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.home = /Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command file.separator = /
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.vm.compressedOopsMode = Zero based
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command line.separator =

2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.specification.name = Java Platform API Specification
2019-11-18 17:11:10,044 INFO [main] org.apache.nifi.bootstrap.Command java.vm.specification.vendor = Oracle Corporation
2019-11-18 17:11:10,050 INFO [main] org.apache.nifi.bootstrap.Command javax.xml.xpath.XPathFactory:http://saxon.sf.net/jaxp/xpath/om = net.sf.saxon.xpath.XPathFactoryImpl
2019-11-18 17:11:10,050 INFO [main] org.apache.nifi.bootstrap.Command java.awt.graphicsenv = sun.awt.CGraphicsEnvironment
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.awt.headless = true
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.protocol.handler.pkgs = sun.net.www.protocol
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command sun.management.compiler = HotSpot 64-Bit Tiered Compilers
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.runtime.version = 11.0.4+11-LTS
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command user.name = tspann
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.net.preferIPv4Stack = true
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command path.separator = :
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.security.egd = file:/dev/urandom
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command org.jruby.embed.localvariable.behavior = persistent
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command os.version = 10.14.6
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command java.runtime.name = OpenJDK Runtime Environment
2019-11-18 17:11:10,051 INFO [main] org.apache.nifi.bootstrap.Command file.encoding = UTF-8
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command sun.net.http.allowRestrictedHeaders = true
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command jnidispatch.path = /var/folders/t5/xz5j50wx2rl8kd3021lkbn800000gn/T/jna--864347536/jna3349001211756681540.tmp
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.vm.name = OpenJDK 64-Bit Server VM
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command jna.platform.library.path = /usr/lib:/usr/lib
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.vendor.version = Corretto-11.0.4.11.1
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command jna.loaded = true
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.vendor.url.bug = https://github.com/corretto/corretto-11/issues/
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command jetty.git.hash = afcf563148970e98786327af5e07c261fda175d3
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.io.tmpdir = /var/folders/t5/xz5j50wx2rl8kd3021lkbn800000gn/T/
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.version = 11.0.4
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command user.dir = /Users/tspann/Documents/nifi-1.10.0
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command os.arch = x86_64
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command nifi.bootstrap.listen.port = 55105
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.vm.specification.name = Java Virtual Machine Specification
2019-11-18 17:11:10,052 INFO [main] org.apache.nifi.bootstrap.Command java.awt.printerjob = sun.lwawt.macosx.CPrinterJob
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command sun.os.patch.level = unknown
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command bridj.quiet = true
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.library.path = /Users/tspann/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.vendor = Amazon.com Inc.
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.vm.info = mixed mode
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.vm.version = 11.0.4+11-LTS
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command sun.io.unicode.encoding = UnicodeBig
2019-11-18 17:11:10,053 INFO [main] org.apache.nifi.bootstrap.Command java.class.version = 55.0


Resources

Ingest Salesforce Data into Hive Using Apache Nifi

Ingest Salesforce Data into Hive Using Apache NiFi

TOOLS AND ACCOUNT USED: 

Salesforce JDBC driver : https://www.progress.com/jdbc/salesforce
Java : 1.8.0_221 
OSS: MacOS Mojave 10.14.5
SDFC Developer account: https://developer.salesforce.com/signup - 15 day trial


Overall Flow :

Install Hive - Thrift, Metastore, and Hive Table schema that matches SDFC opportunity table. 
Install NiFi - Add PutHive3streaming nar 
Install and Configure Drivers : SalesforceConnect and Avroreader
Add Processors - QueryDatabaseTable and Puthive3Streaming


Install Salesforce JDBC and Hive drivers

  • Download DataDirect Salesforce JDBC driver from here.
  • Download Cloudera Hive driver from here.
  • To install the driver, execute the .jar package by running the following command in the terminal or just by double clicking on the jar package.


java -jar PROGRESS_DATADIRECT_JDBC_SF_ALL.jar
java -jar PROGRESS_DATADIRECT_JDBC_INSTALL.jar




Add Drivers to the Apache NiFi Classpath

Copy drivers

From - /home/user/Progress/DataDirect/Connect_for_JDBC_51/{sforce.jar,hive.jar}
To - /path/to/nifi/lib [/usr/local/Cellar/nifi/1.9.2/libexec/lib/

Restart Apache NiFi



Build the Apache NiFi Flow

Launch Apache NiFi: http://localhost:8080/nifi

Add and Configure the JDBC drivers
DBCPConnectionPool
AvroReader
Add DBCPConnectionPool

Choose DBCPConnectionPool as your controller service

Add Properties below:

Database Connection URL = jdbc:datadirect:sforce://login.salesforce.com;SecurityToken=****

- NOTE below on SecurityToken

  • Database Driver Class Name : com.ddtek.jdbc.sforce.SForceDriver
  • Database User - Developer account username 
  • Password - the Password for the Developer account
  • Name - SalesforceConnect


Add Avro reader Controller : 
This controller is used by the subsequent co-processor(PutHive3Streaming) to read Avro format data
from the previous processor(QueryDatabaseTable). 



Add and Configure Processors


QueryDatabaseTable
PutHive3Streaming


QueryDatabaseTable

To read the data from Salesforce use a processor called QueryDatabaseTable that supports
incremental data pull.

Choose QueryDatabaseTable and add it on to canvas as below.





Configure QueryDatabaseTable processor:

Right click on the QueryDatabaseTable, Choose scheduling type, manual or cron.

Under Properties tab and add following properties


Database Connection Pooling Service : SalesForceConnect

Database Type : Generic

Table Name : Opportunity 

Apply and save the processor configuration.


PutHive3Streaming 


NOTE: PutHivestreaming was used in the beginning, but for the downstream hive version 3.x 
@Matthew Burgess recommended to use PutHive3Streaming processor.
Restart NiFi and PutHive3Streaming should be available for use. PutHiveStreaming kept throwing
error not able connect to hivemetastore URI. Like below:




Configure PutHive3Streaming Processor 

Drag another processor from the menu and choose PutHive3streaming as your processor from the list.
Under Properties tab and add following properties:

Record Reader = AvroReader
Hive Metastore URI = thrift://localhost:9083
Hive Configuration Resources = /usr/local/Cellar/hive/3.1.2/libexec/conf/hive-site.xml
Database Name = default 
Table Name = opportunity


             Properties: 


              Relationship termination settings:



Connect the processor QueryDatabaseTable to PutHive3streaming.

Make sure to check success relationship. Then click on Add/Apply.




Opportunity table schema in Hive:


create table opportunity2
(ID string,
ISDELETED boolean,
ACCOUNTID string,
ISPRIVATE boolean,
NAME string,
DESCRIPTION string,
STAGENAME string,
AMOUNT string,
PROBABILITY string,
EXPECTEDREVENUE string,
TOTALOPPORTUNITYQUANTITY double,
CLOSEDATE string,
TYPE string,
NEXTSTEP string,
LEADSOURCE string,
ISCLOSED boolean,
ISWON boolean,
FORECASTCATEGORY string,
FORECASTCATEGORYNAME string,
CAMPAIGNID string,
HASOPPORTUNITYLINEITEM boolean,
PRICEBOOK2ID string,
OWNERID string,
CREATEDDATE string,
CREATEDBYID string,
LASTMODIFIEDDATE string,
LASTMODIFIEDBYID string,
SYSTEMMODSTAMP string,
LASTACTIVITYDATE string,
FISCALQUARTER int,
FISCALYEAR int,
FISCAL string,
LASTVIEWEDDATE string,
LASTREFERENCEDDATE string,
HASOPENACTIVITY boolean,
HASOVERDUETASK boolean,
DELIVERYINSTALLATIONSTATUS__C string,
TRACKINGNUMBER__C string,
ORDERNUMBER__C string,
CURRENTGENERATORS__C string,
MAINCOMPETITORS__C string )
STORED AS ORC TBLPROPERTIES ('transactional'='true');

NOTES: 


SDFC Security Token : 


This Token is needed to connect to SDFC via jdbc. 
Login to sdfc dev env and reset it, it will send the token in the email attached to the account.



Error while connecting to sdfc without security token.


Hive3 Acid table issues: 


Hive version being used was hive 3.1.2. By default table created was not transactional. This led to the error below. 


Error
2019-11-07 13:25:39,730 ERROR [Timer-Driven Process Thread-3] o.a.h.streaming.HiveStreamingConnection HiveEndPoint { metaStoreUri: thrift://localhost:9083, database: default, table: opportunity2 } must use an acid table. 
2019-11-07 13:25:39,731 ERROR [Timer-Driven Process Thread-3] o.a.n.processors.hive.PutHive3Streaming PutHive3Streaming[id=46a1d083-016e-1000-280f-5a026f1ceef7] PutHive3Streaming[id=46a1d083-016e-1000-280f-5a026f1ceef7] failed to process session due to java.lang.NullPointerException; Processor Administratively Yielded for 1 sec: java.lang.NullPointerException 


Note:{ metaStoreUri: thrift://localhost:9083, database: default, table: opportunity2 } must use an acid table.


Hive Command line properties set:

set hive.support.concurrency = true;
set hive.enforce.bucketing = true;
set hive.exec.dynamic.partition.mode = nonstrict;
set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on = true;
set hive.compactor.worker.threads = 1 


Only after this creating table with TBLPROPERTIES ('transactional'='true') was successful.

END PRODUCT


Apache NiFi Flow:



HIVE Result: 

Beeline >  !connect jdbc:hive2://localhost:10000;
0: jdbc:hive2://localhost:10000> select opportunity2.id, opportunity2.accountid,
opportunity2.name, opportunity2.stagename, opportunity2.amount 
from opportunity2 limit 10;

hive