DataWorks Summit DC 2019 Report

While some lucky people were in DataWorkSummit Training, others of us were in the NoSQL Day.

After NoSQL Day's end party, it was time for meetups includings Apache NiFi and Apache Kafka sessions!    The Apache NiFi meetup was packed and had most of the Apache NiFi team on-site.

Tuesday May 21, 2019


Tracking Crime ... Phoenix/HBase/NiFi

Wednesday May 22, 2019

Expo Theatre 20 minute talk 1:35 pm - 
Apache Deep Learning 202

Thursday May 23, 2019

Cold Supply Chain Logistics using Sensors, Apache NiFi and the Hyperledger Fabric Blockchain Platform

1:35 - Expo Theatre 20 minute talk - Introduction to Apache NiFi

Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine Data

Reading OpenData JSON and Storing into Apache HBase / Phoenix Tables - Part 1

JSON Batch to Single Row Phoenix
I grabbed open data on Crime from Philly's Open Data (, after a free sign up you get access to JSON crime data ( You can grab individual dates or ranges for thousands of records. I wanted to spool each JSON record as a separate HBase row. With the flexibility of Apache NiFi 1.0.0, I can specify run times via cron or other familiar setup. This is my master flow.
First I use GetHTTP to retrieve the SSL JSON messages, I split the records up and store them as RAW JSON in HDFS as well as send some of them via Email, format them for Phoenix SQL and store them in Phoenix/HBase. All with no coding and in a simple flow. For extra output, I can send them to Reimann server for monitoring.
Setting up SSL for accessing HTTPS data like Philly Crime, require a little configuration and knowing what Java JRE you are using to run NiFi. You can run service nifi status to quickly get which JRE.
Split the Records
The Open Data set has many rows of data, let's split them up and pull out the attributes we want from the JSON.
Another part that requires specific formatting is setting up the Phoenix connection. Make sure you point to the correct driver and if you have security make sure that is set.
Load the Data (Upsert)
Once your data is loaded you can check quickly with /usr/hdp/current/phoenix-client/bin/ localhost:2181:/hbase-unsecure
The SQL for this data set is pretty straight forward.
  1. CREATE TABLE phillycrime (dc_dist varchar,
  2. dc_key varchar not null primary key,dispatch_date varchar,dispatch_date_time varchar,dispatch_time varchar,hour varchar,location_block varchar,psa varchar,
  3. text_general_code varchar,ucr_general varchar);
  6. {"dc_dist":"18","dc_key":"200918067518","dispatch_date":"2009-10-02","dispatch_date_time":"2009-10-02T14:24:00.000","dispatch_time":"14:24:00","hour":"14","location_block":"S 38TH ST / MARKETUT ST","psa":"3","text_general_code":"Other Assaults","ucr_general":"800"}
  7. upsert into phillycrime values ('18', '200918067518', '2009-10-02','2009-10-02T14:24:00.000','14:24:00','14', 'S 38TH ST / MARKETUT ST','3','Other Assaults','800');
  8. !tables
  9. !describe phillycrime
The DC_KEY is unique so I used that as the Phoenix key. Now all the data I get will be added and any repeats will safely update. Sometimes during the data we may reget some of the same data, that's okay, it will just update to the same value.

Cloudera Edge Management Introduction

Using CEM - Adding a Processor to a Flow

Looking at Events From CEM

Designing a Java Flow

Configure A Stream Execution

Event Details

Example Apache NiFi Receiver 

CEM Design - Open Flow Screen

Configure a PutFile Processor

If you want to revert your current changes to a previous version

     An Example Flow Java Agent

An Example CPP Flow

Example of Data received in NiFi from CPP Agent

                          How to simulate data in GenerateDataFlow

Receiving Agent Data

Agent Logs Showing C2 Activities

Publish Flow to Agents


You can download CEM and NiFi Registry from Cloudera.   You need the Registry to be able to save and version the flows you will be deploying.

For a simple proof of concept, development test, you can setup both without needing a full fledged database.   You can use the H2 database for learning how to use the system.

I installed CEM on a few versions of Ubuntu and on Centos 7.

First thing you need to do is to install NiFi Registry, run it and create a bucket for EFM to use.

CEM Configuration Basics

conf/   - turn on nifi registry
Create a bucket

EFM Settings
# Web Server Properties
#  address: the hostname or ip address of the interface to bind to; to bind to all, use


New Features in MiniFi 0.6.0 C++ Agent

Python Processors

These are great, but first you will need to make sure you have Python installed and know where your Python modules are:

python -c "import site; print(site.getsitepackages())"python -m sitepython -m site --user-site

You will need a precompiled C++ agent for your environment or build it yourself.   You can also choose the Java agent if you do not wish to compile C++.   The C++ agent is smaller with a smaller footprint.

Configuring a MiNiFi Java Agent to Talk to EFM

# MiNiFi Command & Control Configuration
# C2 Properties
# Enabling C2 Uncomment each of the following options
# define those with missing options
## define protocol parameters
## heartbeat in milliseconds.  defaults to once a second
## define parameters about your agent
# Optional.  Defaults to a hardware based unique identifier
## Define TLS security properties for C2 communications

Configuring a MiNiFi C++ Agent to Talk to EFM



EFM Ports

EFM Server UI 10080
NiFi Registry 18080
CoAP 8989



















You will also want an Apache NiFi 1.9.x server to receive calls from the MiNiFi Agents.