Skip to main content


Showing posts from May, 2019

DataWorks Summit DC 2019 Report

While some lucky people were in DataWorkSummit Training, others of us were in the NoSQL Day. After NoSQL Day's end party, it was time for meetups includings Apache NiFi and Apache Kafka sessions!    The Apache NiFi meetup was packed and had most of the Apache NiFi team on-site. Tuesday May 21, 2019 NoSQL Day Tracking Crime ... Phoenix/HBase/NiFi Wednesday May 22, 2019 Expo Theatre 20 minute talk 1:35 pm -  Apache Deep Learning 202 Thursday May 23, 2019 Cold Supply Chain Logistics using Sensors, Apache NiFi and the Hyperledger Fabric Blockchain Platform

Reading OpenData JSON and Storing into Apache HBase / Phoenix Tables - Part 1

JSON Batch to Single Row Phoenix I grabbed open data on Crime from Philly's Open Data ( ), after a free sign up you get access to JSON crime data ( ) You can grab individual dates or ranges for thousands of records. I wanted to spool each JSON record as a separate HBase row. With the flexibility of Apache NiFi 1.0.0, I can specify run times via cron or other familiar setup. This is my master flow. First I use GetHTTP to retrieve the SSL JSON messages, I split the records up and store them as RAW JSON in HDFS as well as send some of them via Email, format them for Phoenix SQL and store them in Phoenix/HBase. All with no coding and in a simple flow. For extra output, I can send them to Reimann server for monitoring. Setting up SSL for accessing HTTPS data like Philly Crime, require a little configuration and knowing what Java JRE you are using to run NiFi. You can run ser

Cloudera Edge Management Introduction

Using CEM - Adding a Processor to a Flow Looking at Events From CEM Designing a Java Flow Configure A Stream Execution Event Details Example Apache NiFi Receiver  CEM Design - Open Flow Screen Configure a PutFile Processor If you want to revert your current changes to a previous version      An Example Flow Java Agent An Example CPP Flow Example of Data received in NiFi from CPP Agent                           How to simulate data in GenerateDataFlow Receiving Agent Data Agent Logs Showing C2 Activities Publish Flow to Agents CEM You can download CEM and NiFi Registry from Cloudera.   You need the Registry to be able to save and version the flows you will be deploying. For a simple proof of concept, development test, you can setup both without needing a full fledged database.   You can use the H2 database for learning how to use the system. I installed C