Using Apache NiFi with Apache MXNet GluonCV for YOLO 3 Deep Learning Workflows

Using GluonCV 0.3 with Apache MXNet 1.3
source code:
*Captured and Processed Image Available for Viewing in Stream in Apache NiFi 1.7.x
use case:
I need to easily monitor the contents of my security vault. It is a fixed number of known things.
What we need in the real world is a nice camera(s) (maybe four to eight depending on angles of the room), a device like an NVidia Jetson TX2, MiniFi 0.5 Java Agent, JDK 8, Apache MXNet, GluonCV, Lots of Python Libraries, a network connection and a simple workflow. Outside of my vault, I will need a server(s) or clusters to do the more advanced processing, though I could run it all on the local box. If the number of items or certain items I am watching are no longer in the screen, then we should send an immediate alert. That could be to an SMS, Email, Slack, Alert System or other means. We had most of that implemented below. If anyone wants to do the complete use case I can assist.
demo implementation:
I wanted to use the new YOLO 3 model which is part of the new 0.3 stream, so I installed a 0.3. This may be final by the time you read this. You can try to do a regular pip3.6 install -U gluoncv and see what you get.
  1. pip3.6 install -U gluoncv==0.3.0b20180924
Yolo v3 is a great pretrained model to use for object detection.
See: https://gluon-cv.mxnet.io/build/examples_detection/demo_yolo.html
The GluonCV Model Zoo is very rich and incredibly easy to use. So we just grab the model "yolo3_darknet53_voc" with an automatic one time download and we are ready to go. They provide easy to customize code to start with. I write my processed image and JSON results out for ingest by Apache NiFi. You will notice this is similar to what we did for the Open Computer Vision talks: https://community.hortonworks.com/articles/198939/using-apache-mxnet-gluoncv-with-apache-nifi-for-de.html
This is updated and even easier. I dropped the MQTT and just output image files and some JSON to read.
GluonCV makes working with Computer Vision extremely clean and easy.
why Apache NiFi For Deep Learning Workflows
Let me count the top five ways:
#1 Provenance - lets me see everything, everywhere, all the time with the data and the metadata.
#2 Configurable Queues - queues are everywhere and they are extremely configurable on size and priority. There's always backpressure and safety between every step. Sinks, Sources and steps can be offline as things happen in the real-world internet. Offline, online, wherever, I can recover and have full visibility into my flows as they spread between devices, servers, networks, clouds and nation-states.
#3 Security - secure at every level from SSL and data encryption. Integration with leading edge tools including Apache Knox, Apache Ranger and Apache Atlas. See: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_security/content/ch_enabling-knox-for-nifi.html
#4 UI - a simple UI to develop, monitor and manage incredibly complex flows including IoT, Deep Learning, Logs and every data source you can throw at it.
#5 Agents - MiniFi gives me two different agents for my devices or systems to stream data headless.
running gluoncv yolo3 model
I wrap my Python script in a shell script to throw away warnings and junk
  1. cd /Volumes/TSPANN/2018/talks/ApacheDeepLearning101/nifi-gluoncv-yolo3
  2. python3.6 -W ignore /Volumes/TSPANN/2018/talks/ApacheDeepLearning101/nifi-gluoncv-yolo3/yolonifi.py 2>/dev/null
List of Possible Objects We Can Detect
  1. ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow",
  2. "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train",
  3. "tvmonitor"]
I am going to train this with my own data for the upcoming INTERNET OF BEER, for the vault use case we would need your vault content pictures.
See: https://gluon-cv.mxnet.io/build/examples_datasets/detection_custom.html#sphx-glr-build-examples-datasets-detection-custom-py
Example Output in JSON
  1. {"imgname": "images/gluoncv_image_20180924190411_b90c6ba4-bbc7-4bbf-9f8f-ee5a6a859602.jpg", "imgnamep": "images/gluoncv_image_p_20180924190411_b90c6ba4-bbc7-4bbf-9f8f-ee5a6a859602.jpg", "class1": "tvmonitor", "pct1": "49.070724999999996", "host": "HW13125.local", "shape": "(1, 3, 512, 896)", "end": "1537815855.105193", "te": "4.199203014373779", "battery": 100, "systemtime": "09/24/2018 15:04:15", "cpu": 33.7, "diskusage": "49939.2 MB", "memory": 60.1, "id": "20180924190411_b90c6ba4-bbc7-4bbf-9f8f-ee5a6a859602"}
Example Processed Image Output
It found one generic person, we could train against a known set of humans that are allowed in an area or are known users.
nifi flows

Gateway Server (We could skip this, but aggregating multiple camera agents is useful)
Send the Flow to the Cloud
Cloud Server Site-to-Site
After we infer the schema of the data once, we don't need it again. We could derive the schema manually or from another tool, but this is easy. Once you are done, then you can delete the InferAvroSchema processor from your flow. I left mine in for your uses if you wish to start from this flow that is attached at the end of the article.
flow steps
Route When No Error to Merge Record Then Convert Those Aggregated Apache Avro Records into One Apache ORC file.
Then store it in an HDFS directory. Once complete their will be a DDL added to metadata that you can send to a PutHiveQL or manually create the table in Beeline or Zeppelin or Hortonworks Data Analytics Studio (https://hortonworks.com/products/dataplane/data-analytics-studio/).
schema: gluoncvyolo
  1. { "type" : "record", "name" : "gluoncvyolo", "fields" : [ { "name" : "imgname", "type" : "string", "doc" : "Type inferred from '\"images/gluoncv_image_20180924211055_8f3b9dac-5645-49aa-94e7-ee5176c3f55c.jpg\"'" }, { "name" : "imgnamep", "type" : "string", "doc" : "Type inferred from '\"images/gluoncv_image_p_20180924211055_8f3b9dac-5645-49aa-94e7-ee5176c3f55c.jpg\"'" }, { "name" : "class1", "type" : "string", "doc" : "Type inferred from '\"tvmonitor\"'" }, { "name" : "pct1", "type" : "string", "doc" : "Type inferred from '\"95.71207000000001\"'" }, { "name" : "host", "type" : "string", "doc" : "Type inferred from '\"HW13125.local\"'" }, { "name" : "shape", "type" : "string", "doc" : "Type inferred from '\"(1, 3, 512, 896)\"'" }, { "name" : "end", "type" : "string", "doc" : "Type inferred from '\"1537823458.559896\"'" }, { "name" : "te", "type" : "string", "doc" : "Type inferred from '\"3.580893039703369\"'" }, { "name" : "battery", "type" : "int", "doc" : "Type inferred from '100'" }, { "name" : "systemtime", "type" : "string", "doc" : "Type inferred from '\"09/24/2018 17:10:58\"'" }, { "name" : "cpu", "type" : "double", "doc" : "Type inferred from '12.0'" }, { "name" : "diskusage", "type" : "string", "doc" : "Type inferred from '\"48082.7 MB\"'" }, { "name" : "memory", "type" : "double", "doc" : "Type inferred from '70.6'" }, { "name" : "id", "type" : "string", "doc" : "Type inferred from '\"20180924211055_8f3b9dac-5645-49aa-94e7-ee5176c3f55c\"'" } ] }
Tabular data has fields with types and properties. Let's specify those for automated analysis, conversion and live stream SQL.
hive table schema: gluoncvyolo
  1. CREATE EXTERNAL TABLE IF NOT EXISTS gluoncvyolo (imgname STRING, imgnamep STRING, class1 STRING, pct1 STRING, host STRING, shape STRING, end STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING) STORED AS ORC;
  2.  
Apache NiFi generates tables for me in Apache Hive 3.x as Apache ORC files for fast performance.

hive acid table schema: gluoncvyoloacid
  1. CREATE TABLE gluoncvyoloacid
  2. (imgname STRING, imgnamep STRING, class1 STRING, pct1 STRING, host STRING, shape STRING, `end` STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING)
  3. STORED AS ORC TBLPROPERTIES ('transactional'='true')
I can just as easily insert or update data into Hive 3.x ACID 2 tables.
We have data, now query it. Easy, no install analytics with tables, Leafletjs, AngularJS, graphs, maps and charts.
nifi flow registry
To manage version control I am using the NiFi Registry which is great. In the newest version, 0.2, there is the ability to back it up with github! It's easy. Everything you need to know is in the doc and Bryan Bend's excellent post on the subject.
There were a few gotchas for me.
  • Use your own new github project with permissions and then clone it local git clone https://github.com/tspannhw/nifi-registry-github.git
  • Make sure github directory has permission and is empty (no readme or junk)
  • Make sure you put in the full directory path
  • Update your config like below:
  1. <flowPersistenceProvider>
  2. <class>org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider</class>
  3. <property name="Flow Storage Directory">/Users/tspann/Documents/nifi-registry-0.2.0/conf/nifi-registry-github</property>
  4. <property name="Remote To Push">origin</property>
  5. <property name="Remote Access User">tspannhw</property>
  6. <property name="Remote Access Password">generatethis</property>
  7. </flowPersistenceProvider>
This is my github directory to hold versions: https://github.com/tspannhw/nifi-registry-github
resources: