Year and Decade End Report : 201*

A Year in Big Data 2019

This has been an amazing year for Big Data, Streaming, Tech, The Cloud, AI, IoT and everything in between.   I got to witness the merger of the two Big Data giants into one unstoppable cloud data machine from the inside.   The sum of the new Cloudera is far greater than just Hortonworks + Cloudera.   It has been a great year working with amazing engineers, leaders, clients, partners, prospects, community members, data scientists, analysts, marketing mavens and everyone I have gotten to see this year.   It's been busy traveling the globe spreading the good word and solving some tough problems.   

In 2019, Edge2AI became something we could teach and implement in a single day to newbies.   The combination of MiNiFi + NiFi + Kafka + KuDu + Cloud is unstoppable.  Once we added Flink later this year, the FLaNK stack became amazing.   I see amazing stuff for this in the 20's.     I got to use new projects like Kudu (awesome), Impala, Cloudera Manager and new tools from the Data in Motion team.   Streams Messaging Manager became the best way to manage, monitor, create, alert on and use Kafka across clusters anywhere.   This is now my favorite way to demo anything.   So much transparency, awesome.   Having the power of Apache Flink is just making any problem solve-able, even those that scale to thousands of nodes.   Running just one node of Flink has been awesome.   I am a Squirrel Dev now!

Strata, DataWorksSummit and NoSQL Day were awesome, but working with charities and non-profits solving real world problems was amazing.     Helping at Nethope is the highlight of my professional year.   I am so thankful to the Cloudera Foundation for having me help.   I am really impressed with the Cloudera Foundation, Nethope and everyone involved.  I am hoping to speak to a few different conferences in 2020, but we'll see where Edge2AI takes me.

There's a lot to wrap up for 2019, so I attempted to put most of it following this break.


IoT Series: MiNiFi Agent on Raspberry Pi 4 with Enviro+ Hat For Environmental Monitoring and Analytics


IoT Series:  MiNiFi Agent on Raspberry Pi 4 with Enviro+ Hat For Environmental Monitoring and Analytics


Summary:  Our powerful edge device streams sensor readings for environmental readings while also performing edge analytics with deep learning libraries and enhanced edge VPU.   We can perform complex running calculations on sensor data locally on the box before making potentially expense network calls.  We can also decide when to send out data based on heuristics, machine learning or simple if-then logic.

Use Case:   Monitor Environment.   Act Local, Report Global.


Stack:   FLANK


Category:   AI, IoT, Edge2AI, Real-Time Streaming, Sensors, Cameras, Telemetry.


Hardware:  Intel Movidius NCC 2 VPU (Neural Computing), Pimoroni Enviro Plus pHAT, Raspberry Pi 4 (4GB Edition).


Software:  Python 3 + Java + MiNiFi Agents + Cloudera Edge Flow Manager (EFM/CEM) + Apache NiFi.   Using Mm... FLaNK Stack.


Keywords:  Edge2AI, CV, AI, Deep Learning, Cloudera, NiFi, Raspberry Pi, Deep Learning, Sensors, IoT, IIoT, Devices, Java, Agents, FLaNK Stack, VPU, Movidius.








Open Source Assets:  https://github.com/tspannhw/minifi-enviroplus


I am running a Python script that streams sensor data continuously to MQTT to be picked up by MiNiFi agents or NiFi.   For development I am just running my Python script with a shell script and nohup.


enviro.sh
python3 /opt/demo/enviro.py


nohup ./enviro.sh &


Example Enviro Plus pHAT Sensor Data


 {
  "uuid" : "rpi4_uuid_xki_20191220215721",
  "ipaddress" : "192.168.1.243",
  "host" : "rp4",
  "host_name" : "rp4",
  "macaddress" : "dc:a6:32:03:a6:e9",
  "systemtime" : "12/20/2019 16:57:21",
  "cpu" : 0.0,
  "diskusage" : "46958.1 MB",
  "memory" : 6.3,
  "id" : "20191220215721_938f2137-5adb-4c22-867d-cdfbce6431a8",
  "temperature" : "33.590520852237226",
  "pressure" : "1032.0433707836428",
  "humidity" : "7.793797584651376",
  "lux" : "0.0",
  "proximity" : "0",
  "gas" : "Oxidising: 3747.82 Ohms\nReducing: 479652.17 Ohms\nNH3: 60888.05 Ohms"

}



We are also running a standard MiNiFi Java Agent 0.6 that is running a Python application to do sensors, edge AI with Intel's OpenVino and some other analytics.


test.sh


#!/bin/bash


DATE=$(date +"%Y-%m-%d_%H%M")
source /opt/intel/openvino/bin/setupvars.sh
fswebcam -q -r 1280x720 --no-banner /opt/demo/images/$DATE.jpg
python3 -W ignore /opt/intel/openvino/build/test.py /opt/demo/images/$DATE.jpg 2>/dev/null


test.py

https://github.com/tspannhw/minifi-enviroplus/blob/master/test.py



Example OpenVino Data


{"host": "rp4", "cputemp": "67", "ipaddress": "192.168.1.243", "endtime": "1577194586.66", "runtime": "0.00", "systemtime": "12/24/2019 08:36:26", "starttime": "12/24/2019 08:36:26", "diskfree": "46889.0", "memory": "15.1", "uuid": "20191224133626_55157415-1354-4137-8472-424779645fbe", "image_filename": "20191224133626_9317880e-ee87-485a-8627-c7088df734fc.jpg"}


In our flow I convert to Apache Avro, as you can see Avro schema is embedded.







The flow is very simple, consume MQTT messages from our broker on the topic we are pushing messages to from our field sensors.   We also ingest MiNiFi events through standard Apache NiFi HTTP(s) Site-to-Site (S2S).   We route images to our image processor and sensor data right to Kudu tables.





Now that the data is stored to Apache Kudu we can do our analytics.




Easy to Run an MQTT Broker



References:



Demo Info:


https://subscription.packtpub.com/book/application_development/9781787287815/1/ch01lvl1sec12/installing-a-mosquitto-broker-on-macos


Run Mosquitto MQTT on Local Machine (RPI, Mac, Win10, ...)


On OSX, brew install mosquitto


 /usr/local/etc/mosquitto/mosquitto.conf


To have launchd start mosquitto now and restart at login:
  brew services start mosquitto


Or, if you don't want/need a background service you can just run:
  mosquitto -c /usr/local/etc/mosquitto/mosquitto.conf


For Python, we need pip3 install paho-mqtt.


Run Sensors on Device that pushes to MQTT


Python pushes continuous stream of sensor data to MQTT


MiNiFi Agent Reads Broker


Send to Kafka and/or NiFi


Example Image Grabbed From Webcam in Dark Office (It's Christmas Eve!)








 When ready we can push to a CDP Data Warehouse in AWS.



With CDP, it's very easy to have a data environment in many clouds to store my live sensor data.



 I can now use this data in Kudu tables from Cloudera Data Science Workbench for real Data Science, machine learning and insights.




What do we do with all of this data?   Check in soon for real-time analytics and dash boarding.








































Easy Deep Learning in Apache NiFi with DJL


Custom Processor for Deep Learning



 Happy Mm.. FLaNK Day!


I have been experimenting with the awesome new Apache 2.0 licensed Java Deep Learning Library, DJL.   In NiFi I was trying to figure out a quick use case and demo.   So I use my Web Camera processor to grab a still shot from my Powerbook webcam and send it to the processor.   The results are sent to slack.

Since it's the holidays I think of my favorite holiday movies:   The Matrix and Blade Runner.   So I thought a Voight-Kampf test would be fun.   Since I don't have a Deep Learning QA piece built yet, let's start by seeing if you look human.  We'll call them 'person'.   I am testing to see if I am a replicant.  Sometimes hard to tell.   Let's see if DJL thinks I am human.

See:   http://nautil.us/blog/-the-science-behind-blade-runners-voight_kampff-test



Okay, so it least it thinks I am a person.   The classification of a Christmas tree is vaguely accurate.






It did not identify my giant french bread.

Building A New Custom Processor for Deep Learning




The hardest part of was a good NiFi Integration test.   The DJL team provide some great examples and it's really easy to plug into their models.

ZooModel<BufferedImage, DetectedObjects> model =
                     MxModelZoo.SSD.loadModel(criteria, new ProgressBar())
Predictor<BufferedImage, DetectedObjects> predictor = model.newPredictor()
DetectedObjects detection = predictor.predict(img);

All the source is in github and references the below DJL sites and repos.

Using a New Custom Processor as part of a Real-time Holiday Flow

We first add the DeepLearningProcessor to our canvas.



An example flow:
  • GetWebCameraProcessor:  grab an image from an attached webcamera
  • UpdateAttribute:  Add media type for image
  • DeepLearningProcessor:   Run our DJL deep learning model from a zoo
  • PutSlack:   Put DJL results in a text window in slack
  • PostSlack:  Send our DJL altered image to slack
  • Funnel:   Send all failures to Valhalla



If we example the provenance we can see how long it took to run and some other interesting attributes.


We place the results of our image analysis in attributes while we return a new image that has a bounding box on the found object(s).




 We now a full classification workflow for real-time deep learning analysis on images, could be used for Santa watching, Security, Memes and other important business purposes.


The initial release is available here:   https://github.com/tspannhw/nifi-djl-processor/releases/tag/v1.0
Using library and example code from the Deep Java Library (https://djl.ai/).



Source Code:   https://github.com/tspannhw/nifi-djl-processor/

And now for something completely different, Christmas Cats:












Princeton, New Jersey, USA - Meetup - 10 - December - 2019 - HBase, Flink, NiFi 1.10, MiNiFi, MXNet, Kafka, Kudu

10-Dec-2019 Meetup:  HBase, Cloud, Flink, NiFi, Kafka, IoT, AI


Come network, socialize, trade notes and learn about Cloudera’s new Cloud offering for Operational Databases (powered by Apache HBase). Learn how developers are using Cloudera’s OpDB to support mission critical applications that need very high availability, resiliency and scalability. Learn how easy it is becoming to now do the same in the Cloud and why it is uniquely situated to support cloud native applications.



What you can expect:
6:00 – 6:45: Networking & Food
6:45 – 7:30: Presentations / Demo of Cloudera OpDB - HBase



7:30 - 7:45 Lightning Talk: Introduction to NiFi 1.10
7:45 - 8:15 Lightning Talk: Introduction to Mm.. FLaNK Stack
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html



8:15 – 8:30: Ask Me Anything



The three topics we will cover:
- An overview of the new Cloud offering and key capabilities of the database

- Delicious Food & Drinks

Hosted By PGA Fund at:
https://pga.fund/coworking-space/

Princeton Growth Accelerator
5 Independence Way, 4th Floor, Princeton, NJ

https://www.meetup.com/futureofdata-princeton/events/266496424/

See:  https://www.datainmotion.dev/2019/12/hbase-20-on-cdp-on-aws.html

For Code and Slides: 
https://github.com/tspannhw/HBase2
https://github.com/tspannhw/MmFLaNK
https://github.com/tspannhw/nifi-1.10-templates
https://www.slideshare.net/bunkertor/cloudera-operational-db-apache-hbase-apache-phoenix
https://github.com/tspannhw/stateless-examples
https://www.slideshare.net/bunkertor/mm-flank-stack-minifi-mxnet-flink-nifi-kudu-kafka
https://www.slideshare.net/bunkertor/introduction-to-apache-nifi-110