Using Apache NiFi with Apache MXNet GluonCV for YOLO 3 Deep Learning Workflows

March 18, 2019

Using Apache NiFi with Apache MXNet GluonCV for YOLO 3 Deep Learning Workflows

Using GluonCV 0.3 with Apache MXNet 1.3

source code:

https://github.com/tspannhw/nifi-gluoncv-yolo3

*Captured and Processed Image Available for Viewing in Stream in Apache NiFi 1.7.x

use case:

I need to easily monitor the contents of my security vault. It is a fixed number of known things.

What we need in the real world is a nice camera(s) (maybe four to eight depending on angles of the room), a device like an NVidia Jetson TX2, MiniFi 0.5 Java Agent, JDK 8, Apache MXNet, GluonCV, Lots of Python Libraries, a network connection and a simple workflow. Outside of my vault, I will need a server(s) or clusters to do the more advanced processing, though I could run it all on the local box. If the number of items or certain items I am watching are no longer in the screen, then we should send an immediate alert. That could be to an SMS, Email, Slack, Alert System or other means. We had most of that implemented below. If anyone wants to do the complete use case I can assist.

demo implementation:

I wanted to use the new YOLO 3 model which is part of the new 0.3 stream, so I installed a 0.3. This may be final by the time you read this. You can try to do a regular pip3.6 install -U gluoncv and see what you get.


pip3.6 install -U gluoncv==0.3.0b20180924

Yolo v3 is a great pretrained model to use for object detection.

See: https://gluon-cv.mxnet.io/build/examples_detection/demo_yolo.html

The GluonCV Model Zoo is very rich and incredibly easy to use. So we just grab the model "yolo3_darknet53_voc" with an automatic one time download and we are ready to go. They provide easy to customize code to start with. I write my processed image and JSON results out for ingest by Apache NiFi. You will notice this is similar to what we did for the Open Computer Vision talks: https://community.hortonworks.com/articles/198939/using-apache-mxnet-gluoncv-with-apache-nifi-for-de.html

This is updated and even easier. I dropped the MQTT and just output image files and some JSON to read.

GluonCV makes working with Computer Vision extremely clean and easy.

why Apache NiFi For Deep Learning Workflows

Let me count the top five ways:

#1 Provenance - lets me see everything, everywhere, all the time with the data and the metadata.

#2 Configurable Queues - queues are everywhere and they are extremely configurable on size and priority. There's always backpressure and safety between every step. Sinks, Sources and steps can be offline as things happen in the real-world internet. Offline, online, wherever, I can recover and have full visibility into my flows as they spread between devices, servers, networks, clouds and nation-states.

#3 Security - secure at every level from SSL and data encryption. Integration with leading edge tools including Apache Knox, Apache Ranger and Apache Atlas. See: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_security/content/ch_enabling-knox-for-nifi.html

#4 UI - a simple UI to develop, monitor and manage incredibly complex flows including IoT, Deep Learning, Logs and every data source you can throw at it.

#5 Agents - MiniFi gives me two different agents for my devices or systems to stream data headless.

running gluoncv yolo3 model

I wrap my Python script in a shell script to throw away warnings and junk


cd /Volumes/TSPANN/2018/talks/ApacheDeepLearning101/nifi-gluoncv-yolo3 
python3.6  -W ignore /Volumes/TSPANN/2018/talks/ApacheDeepLearning101/nifi-gluoncv-yolo3/yolonifi.py 2>/dev/null

List of Possible Objects We Can Detect


["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", 
"diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", 
"tvmonitor"]

I am going to train this with my own data for the upcoming INTERNET OF BEER, for the vault use case we would need your vault content pictures.

See: https://gluon-cv.mxnet.io/build/examples_datasets/detection_custom.html#sphx-glr-build-examples-datasets-detection-custom-py

Example Output in JSON


{"imgname": "images/gluoncv_image_20180924190411_b90c6ba4-bbc7-4bbf-9f8f-ee5a6a859602.jpg", "imgnamep": "images/gluoncv_image_p_20180924190411_b90c6ba4-bbc7-4bbf-9f8f-ee5a6a859602.jpg", "class1": "tvmonitor", "pct1": "49.070724999999996", "host": "HW13125.local", "shape": "(1, 3, 512, 896)", "end": "1537815855.105193", "te": "4.199203014373779", "battery": 100, "systemtime": "09/24/2018 15:04:15", "cpu": 33.7, "diskusage": "49939.2 MB", "memory": 60.1, "id": "20180924190411_b90c6ba4-bbc7-4bbf-9f8f-ee5a6a859602"}

Example Processed Image Output

It found one generic person, we could train against a known set of humans that are allowed in an area or are known users.

nifi flows

Gateway Server (We could skip this, but aggregating multiple camera agents is useful)

Send the Flow to the Cloud

Cloud Server Site-to-Site

After we infer the schema of the data once, we don't need it again. We could derive the schema manually or from another tool, but this is easy. Once you are done, then you can delete the InferAvroSchema processor from your flow. I left mine in for your uses if you wish to start from this flow that is attached at the end of the article.

flow steps

Route When No Error to Merge Record Then Convert Those Aggregated Apache Avro Records into One Apache ORC file.

Then store it in an HDFS directory. Once complete their will be a DDL added to metadata that you can send to a PutHiveQL or manually create the table in Beeline or Zeppelin or Hortonworks Data Analytics Studio (https://hortonworks.com/products/dataplane/data-analytics-studio/).

schema: gluoncvyolo


{ "type" : "record", "name" : "gluoncvyolo", "fields" : [ { "name" : "imgname", "type" : "string", "doc" : "Type inferred from '\"images/gluoncv_image_20180924211055_8f3b9dac-5645-49aa-94e7-ee5176c3f55c.jpg\"'" }, { "name" : "imgnamep", "type" : "string", "doc" : "Type inferred from '\"images/gluoncv_image_p_20180924211055_8f3b9dac-5645-49aa-94e7-ee5176c3f55c.jpg\"'" }, { "name" : "class1", "type" : "string", "doc" : "Type inferred from '\"tvmonitor\"'" }, { "name" : "pct1", "type" : "string", "doc" : "Type inferred from '\"95.71207000000001\"'" }, { "name" : "host", "type" : "string", "doc" : "Type inferred from '\"HW13125.local\"'" }, { "name" : "shape", "type" : "string", "doc" : "Type inferred from '\"(1, 3, 512, 896)\"'" }, { "name" : "end", "type" : "string", "doc" : "Type inferred from '\"1537823458.559896\"'" }, { "name" : "te", "type" : "string", "doc" : "Type inferred from '\"3.580893039703369\"'" }, { "name" : "battery", "type" : "int", "doc" : "Type inferred from '100'" }, { "name" : "systemtime", "type" : "string", "doc" : "Type inferred from '\"09/24/2018 17:10:58\"'" }, { "name" : "cpu", "type" : "double", "doc" : "Type inferred from '12.0'" }, { "name" : "diskusage", "type" : "string", "doc" : "Type inferred from '\"48082.7 MB\"'" }, { "name" : "memory", "type" : "double", "doc" : "Type inferred from '70.6'" }, { "name" : "id", "type" : "string", "doc" : "Type inferred from '\"20180924211055_8f3b9dac-5645-49aa-94e7-ee5176c3f55c\"'" } ] }

Tabular data has fields with types and properties. Let's specify those for automated analysis, conversion and live stream SQL.

hive table schema: gluoncvyolo


CREATE EXTERNAL TABLE IF NOT EXISTS gluoncvyolo (imgname STRING, imgnamep STRING, class1 STRING, pct1 STRING, host STRING, shape STRING, end STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING) STORED AS ORC;

Apache NiFi generates tables for me in Apache Hive 3.x as Apache ORC files for fast performance.

hive acid table schema: gluoncvyoloacid


CREATE TABLE gluoncvyoloacid
(imgname STRING, imgnamep STRING, class1 STRING, pct1 STRING, host STRING, shape STRING, `end` STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING)
STORED AS ORC TBLPROPERTIES ('transactional'='true')

I can just as easily insert or update data into Hive 3.x ACID 2 tables.

We have data, now query it. Easy, no install analytics with tables, Leafletjs, AngularJS, graphs, maps and charts.

nifi flow registry

To manage version control I am using the NiFi Registry which is great. In the newest version, 0.2, there is the ability to back it up with github! It's easy. Everything you need to know is in the doc and Bryan Bend's excellent post on the subject.

https://nifi.apache.org/docs/nifi-registry-docs/index.html

https://bryanbende.com/development/2018/06/20/apache-nifi-registry-0-2-0

There were a few gotchas for me.

Use your own new github project with permissions and then clone it local git clone https://github.com/tspannhw/nifi-registry-github.git
Make sure github directory has permission and is empty (no readme or junk)
Make sure you put in the full directory path
Update your config like below:


    <flowPersistenceProvider>
        <class>org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider</class>
        <property name="Flow Storage Directory">/Users/tspann/Documents/nifi-registry-0.2.0/conf/nifi-registry-github</property>
        <property name="Remote To Push">origin</property>
        <property name="Remote Access User">tspannhw</property>
        <property name="Remote Access Password">generatethis</property>
    </flowPersistenceProvider>

This is my github directory to hold versions: https://github.com/tspannhw/nifi-registry-github

resources:

https://github.com/tspannhw/UsingGluonCV
https://gluon.mxnet.io/chapter01_crashcourse/ndarray.html
https://gluon-cv.mxnet.io/build/examples_detection/demo_yolo.html#sphx-glr-build-examples-detection-demo-yolo-py
https://gluon-cv.mxnet.io/model_zoo/index.html#object-detection
https://community.hortonworks.com/articles/215271/iot-edge-processing-with-deep-learning-on-hdf-32-a-2.html
https://community.hortonworks.com/articles/198912/ingesting-apache-mxnet-gluon-deep-learning-results.html

Search This Blog

Data In Motion

Using Apache NiFi with Apache MXNet GluonCV for YOLO 3 Deep Learning Workflows

Popular Posts

Ingesting Drone Data From DJII Ryze Tello Drones Part 1 - Setup and Practice

Connecting Apache NiFi to Apache Atlas For Data Governance At Scale in Streaming

DevOps: Working with Parameter Contexts in Apache NiFi 1.11.4+