[FLaNK] Streaming EdgeAI on the new NVIDIA Jetson Nano 2GB with MiNiFi Agents To FLaNK Applications

 [FLaNK] Streaming EdgeAI on the new NVIDIA Jetson Nano 2GB with MiNiFi Agents To FLaNK Applications

Plug Into Community AI Apps:  https://youtu.be/2T8CG7lDkcU

I am not patient enough to shoot an unboxing video, I was too excited to get this superb machine running.   The NVIDIA Jetson Nano 2GB is now available for purchase for only $59!!!


The 2GB version of NVIDIA Jetson Nano is great, you really don't miss anything that was removed.   I have copied over my MiNiFi agent and code from other Jetson Nanos, Xavier NX and TX1 and it all works fine.   The speed is fine for most needs especially for development and prototyping.   I prefer the Xavier, but at this price you can't go wrong.   I am definitely going to be getting Jetson Nanos instead of other devices for most IoT / Edge AI use cases.   I have used my NVidia Jetson 2GB for demos for a number of events including ApacheCon, BeamSummit, Open Source Summit and AI Dev World.


I installed the fswebcam to capture still images and build up a directory of them to process.


You must install and run:   https://github.com/dusty-nv/jetson-inference.   You get great libraries, tutorials, documentation and examples.   I usually build my apps starting from one of these examples and use one of the excellent NVIDIA pre-built models.   This rapidly accelerates my development and deployment of EdgeAI applications whether they are IoT or other purposes.   This is working with standard Raspberry Pi plug in cameras and the excellent Logitech USB web cameras that I have used with all my other NVIDIA devices.

At this price point, there seems no reason that every developer in every company should have one.   It's a great place to test out Edge AI applications and run classifications at a decent speed.   This is a real machine despite it.

I was facilitating data journeys at the NetHope Global Summit today and I thought these $59 devices could be great for non-profits to use for many data collection and analytics purposes in the field.  https://www.nethopeglobalsummit.org/agenda-2020#sz-tab-44134 I am exploring some use cases to see if I can pre build some easy applications that an NGO could just pick up and run with.   Let's see what develops.   A $59 GPU edge device enables some new applications at an affordable cost.   $59 won't give me a lot of cloud, but I can get a powerful small data collection device that runs ML, DL, cameras, MiNiFi Agents, Python and Java.   With 2 Gigabytes of fast RAM and a GPU, one is limited by their imagination.


Example Application



Output Exampl Datae:

{"uuid": "nano_uuid_cmq_20201026202757", "ipaddress": "192.168.1.169", "networktime": 47.7275505065918, "detectleft": 1.96746826171875, "detectconfidence": 52.86550521850586, "cputemp": "34.0", "gputemp": "30.0", "gputempf": "86", "cputempf": "93", "runtime": "169", "host": "nano5", "filename": "/opt/demo/images/out_iue_20201026202757.jpg", "host_name": "nano5", "macaddress": "00:e0:4c:49:d8:b7", "end": "1603744246.924455", "te": "169.4200084209442", "systemtime": "10/26/2020 16:30:46", "cpu": 9.9, "diskusage": "37100.4 MB", "memory": 91.5, "id": "20201026202757_64d69a82-88d8-45f8-be06-1b836cb6cc84"}


Below is some example output for running a Python script to classify a webcamera (a low end Logi webcam, but you can use a Raspberry Pi camera).   We would be best served by running this continuously outputting log messages and images for MiNiFi agents to scoop up and send to a server for routing, transformation and processing.


root@nano5:/opt/demo/minifi-jetson-nano# jetson_clocks 
root@nano5:/opt/demo/minifi-jetson-nano# python3 detect.py 
[gstreamer] initialized gstreamer, version 1.14.5.0
[gstreamer] gstCamera -- attempting to create device v4l2:///dev/video0
[gstreamer] gstCamera -- found v4l2 device: HD Webcam C615
[gstreamer] v4l2-proplist, device.path=(string)/dev/video0, udev-probed=(boolean)false, device.api=(string)v4l2, v4l2.device.driver=(string)uvcvideo, v4l2.device.card=(string)"HD\ Webcam\ C615", v4l2.device.bus_info=(string)usb-70090000.xusb-3.2, v4l2.device.version=(uint)264588, v4l2.device.capabilities=(uint)2216689665, v4l2.device.device_caps=(uint)69206017;
[gstreamer] gstCamera -- found 30 caps for v4l2 device /dev/video0
[gstreamer] [0] video/x-raw, format=(string)YUY2, width=(int)1920, height=(int)1080, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)5/1;
[gstreamer] [1] video/x-raw, format=(string)YUY2, width=(int)1600, height=(int)896, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 15/2, 5/1 };
[gstreamer] [2] video/x-raw, format=(string)YUY2, width=(int)1280, height=(int)720, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 10/1, 15/2, 5/1 };
[gstreamer] [3] video/x-raw, format=(string)YUY2, width=(int)960, height=(int)720, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [4] video/x-raw, format=(string)YUY2, width=(int)1024, height=(int)576, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [5] video/x-raw, format=(string)YUY2, width=(int)800, height=(int)600, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [6] video/x-raw, format=(string)YUY2, width=(int)864, height=(int)480, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [7] video/x-raw, format=(string)YUY2, width=(int)800, height=(int)448, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [8] video/x-raw, format=(string)YUY2, width=(int)640, height=(int)480, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [9] video/x-raw, format=(string)YUY2, width=(int)640, height=(int)360, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [10] video/x-raw, format=(string)YUY2, width=(int)432, height=(int)240, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [11] video/x-raw, format=(string)YUY2, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [12] video/x-raw, format=(string)YUY2, width=(int)320, height=(int)240, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [13] video/x-raw, format=(string)YUY2, width=(int)176, height=(int)144, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [14] video/x-raw, format=(string)YUY2, width=(int)160, height=(int)120, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [15] image/jpeg, width=(int)1920, height=(int)1080, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [16] image/jpeg, width=(int)1600, height=(int)896, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [17] image/jpeg, width=(int)1280, height=(int)720, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [18] image/jpeg, width=(int)960, height=(int)720, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [19] image/jpeg, width=(int)1024, height=(int)576, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [20] image/jpeg, width=(int)800, height=(int)600, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [21] image/jpeg, width=(int)864, height=(int)480, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [22] image/jpeg, width=(int)800, height=(int)448, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [23] image/jpeg, width=(int)640, height=(int)480, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [24] image/jpeg, width=(int)640, height=(int)360, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [25] image/jpeg, width=(int)432, height=(int)240, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [26] image/jpeg, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [27] image/jpeg, width=(int)320, height=(int)240, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [28] image/jpeg, width=(int)176, height=(int)144, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] [29] image/jpeg, width=(int)160, height=(int)120, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction){ 30/1, 24/1, 20/1, 15/1, 10/1, 15/2, 5/1 };
[gstreamer] gstCamera -- selected device profile:  codec=mjpeg format=unknown width=1280 height=720
[gstreamer] gstCamera pipeline string:
[gstreamer] v4l2src device=/dev/video0 ! image/jpeg, width=(int)1280, height=(int)720 ! jpegdec ! video/x-raw ! appsink name=mysink
[gstreamer] gstCamera successfully created device v4l2:///dev/video0
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> jpegdec0
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0
[gstreamer] gstreamer changed state from NULL to READY ==> v4l2src0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> jpegdec0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> v4l2src0
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer message new-clock ==> pipeline0
[gstreamer] gstreamer message stream-start ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> jpegdec0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> v4l2src0
[gstreamer] gstCamera -- onPreroll
[gstreamer] gstCamera -- map buffer size was less than max size (1382400 vs 1382407)
[gstreamer] gstCamera recieve caps:  video/x-raw, format=(string)I420, width=(int)1280, height=(int)720, interlace-mode=(string)progressive, multiview-mode=(string)mono, multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)mpeg2, colorimetry=(string)1:4:0:0, framerate=(fraction)30/1
[gstreamer] gstCamera -- recieved first frame, codec=mjpeg format=i420 width=1280 height=720 size=1382407
RingBuffer -- allocated 4 buffers (1382407 bytes each, 5529628 bytes total)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer message async-done ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0
RingBuffer -- allocated 4 buffers (14745600 bytes each, 58982400 bytes total)
jetson.inference -- detectNet loading build-in network 'ssd-mobilenet-v2'

detectNet -- loading detection network model from:
          -- model        networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff
          -- input_blob   'Input'
          -- output_blob  'NMS'
          -- output_count 'NMS_1'
          -- class_labels networks/SSD-Mobilenet-v2/ssd_coco_labels.txt
          -- threshold    0.500000
          -- batch_size   1

[TRT]    TensorRT version 7.1.3
[TRT]    loading NVIDIA plugins...
[TRT]    Registered plugin creator - ::GridAnchor_TRT version 1
[TRT]    Registered plugin creator - ::NMS_TRT version 1
[TRT]    Registered plugin creator - ::Reorg_TRT version 1
[TRT]    Registered plugin creator - ::Region_TRT version 1
[TRT]    Registered plugin creator - ::Clip_TRT version 1
[TRT]    Registered plugin creator - ::LReLU_TRT version 1
[TRT]    Registered plugin creator - ::PriorBox_TRT version 1
[TRT]    Registered plugin creator - ::Normalize_TRT version 1
[TRT]    Registered plugin creator - ::RPROI_TRT version 1
[TRT]    Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT]    Could not register plugin creator -  ::FlattenConcat_TRT version 1
[TRT]    Registered plugin creator - ::CropAndResize version 1
[TRT]    Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT]    Registered plugin creator - ::Proposal version 1
[TRT]    Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT]    Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT]    Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT]    Registered plugin creator - ::Split version 1
[TRT]    Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT]    Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT]    detected model format - UFF  (extension '.uff')
[TRT]    desired precision specified for GPU: FASTEST
[TRT]    requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]    native precisions detected for GPU:  FP32, FP16
[TRT]    selecting fastest native precision for GPU:  FP16
[TRT]    attempting to open engine cache file /usr/local/bin/networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff.1.1.7103.GPU.FP16.engine
[TRT]    loading network plan from engine cache... /usr/local/bin/networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff.1.1.7103.GPU.FP16.engine
[TRT]    device GPU, loaded /usr/local/bin/networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff
[TRT]    Deserialize required 2384046 microseconds.
[TRT]    
[TRT]    CUDA engine context initialized on device GPU:
[TRT]       -- layers       117
[TRT]       -- maxBatchSize 1
[TRT]       -- workspace    0
[TRT]       -- deviceMemory 35449344
[TRT]       -- bindings     3
[TRT]       binding 0
                -- index   0
                -- name    'Input'
                -- type    FP32
                -- in/out  INPUT
                -- # dims  3
                -- dim #0  3 (SPATIAL)
                -- dim #1  300 (SPATIAL)
                -- dim #2  300 (SPATIAL)
[TRT]       binding 1
                -- index   1
                -- name    'NMS'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  1 (SPATIAL)
                -- dim #1  100 (SPATIAL)
                -- dim #2  7 (SPATIAL)
[TRT]       binding 2
                -- index   2
                -- name    'NMS_1'
                -- type    FP32
                -- in/out  OUTPUT
                -- # dims  3
                -- dim #0  1 (SPATIAL)
                -- dim #1  1 (SPATIAL)
                -- dim #2  1 (SPATIAL)
[TRT]    
[TRT]    binding to input 0 Input  binding index:  0
[TRT]    binding to input 0 Input  dims (b=1 c=3 h=300 w=300) size=1080000
[TRT]    binding to output 0 NMS  binding index:  1
[TRT]    binding to output 0 NMS  dims (b=1 c=1 h=100 w=7) size=2800
[TRT]    binding to output 1 NMS_1  binding index:  2
[TRT]    binding to output 1 NMS_1  dims (b=1 c=1 h=1 w=1) size=4
[TRT]    
[TRT]    device GPU, /usr/local/bin/networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff initialized.
[TRT]    W = 7  H = 100  C = 1
[TRT]    detectNet -- maximum bounding boxes:  100
[TRT]    detectNet -- loaded 91 class info entries
[TRT]    detectNet -- number of object classes:  91
detected 0 objects in image

[TRT]    ------------------------------------------------
[TRT]    Timing Report /usr/local/bin/networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff
[TRT]    ------------------------------------------------
[TRT]    Pre-Process   CPU   0.07802ms  CUDA   0.48875ms
[TRT]    Network       CPU  45.52254ms  CUDA  44.93750ms
[TRT]    Post-Process  CPU   0.03193ms  CUDA   0.03177ms
[TRT]    Total         CPU  45.63248ms  CUDA  45.45802ms
[TRT]    ------------------------------------------------

[TRT]    note -- when processing a single image, run 'sudo jetson_clocks' before
                to disable DVFS for more accurate profiling/timing measurements

[image] saved '/opt/demo/images/out_kfy_20201030195943.jpg'  (1280x720, 4 channels)

[TRT]    ------------------------------------------------
[TRT]    Timing Report /usr/local/bin/networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff
[TRT]    ------------------------------------------------
[TRT]    Pre-Process   CPU   0.07802ms  CUDA   0.48875ms
[TRT]    Network       CPU  45.52254ms  CUDA  44.93750ms
[TRT]    Post-Process  CPU   0.03193ms  CUDA   0.03177ms
[TRT]    Total         CPU  45.63248ms  CUDA  45.45802ms
[TRT]    ------------------------------------------------

[gstreamer] gstCamera -- stopping pipeline, transitioning to GST_STATE_NULL
[gstreamer] gstCamera -- pipeline stopped

We are using the enhanced example script, detect.py.   To capture a webc amera image and classify:   camera = jetson.utils.gstCamera(width, height, camera)

This is plenty fast and gives us the results and data we want.





References:

2020 Events - Slides, Githubs, and Videos

 

2020 Events: https://www.linkedin.com/pulse/2020-streaming-edge-ai-events-tim-spann/


  • Lightning Talk - Using the Mm FLaNK Stack for Edge AI (Flink, NiFi, Kafka, Kudu) 0.1



Sept 29 - Oct 1 - Apache Con https://apachecon.com/acna2020/

I have some talks here and I am bringing in some superstars to assist me! It's a dream team of speakers that I will be collaborate with. I will release names when we get closer. So I will be covering Apache MXNet, Apache NiFi, MiNiFi, Apache Flink, Apache Kafka, Apache Hue and Apache Kudu. I would be surprised if Apache Spark, Apache Hadoop, Apache Hive, Apache HBase, Apache Phoenix, Apache Zeppelin, Apache Livy

  • Incrementally Streaming RDBMS Data to Your DataLake Automagically


  • Apache Deep Learning 301


  • Using the Mm FLaNK Stack for Edge AI (Apache MXNet, Apache Flink, Apache NiFi, Apache Kafka, Apache Kudu) 0.2


  • Utilizing Apache NiFi and MiNiFi for EdgeAI IoT at Scale


  • Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data


  • Real-Time Stock Processing With Apache NiFi, Apache Flink and Apache Kafka 




Overview: https://speakerdeck.com/tspannhw/2020-conference-talk-preview

October 9 - Ukraine DevOps Stage 2020 11am Ukraine Time - Apache NiFi Talk

https://www.datainmotion.dev/2020/05/cloudera-flow-management-101-lets-build.html

https://devopsstage.com/speakers/timothy-spann/



Oct 19 - 22. Flink Forward - Mm FLaNK Stack for Edge AI (Flink, NiFi, Kafka, Kudu)

Oct 22 - 1pm EST - https://www.flink-forward.org/global-2020/conference-program#using-the-mm-flank-stack-for-edge-ai--flink--nifi--kafka--kudu--

There's a few more coming this year, including Nethope, OSS and Big Data Conference.

Top 25 Use Cases of Cloudera Flow Management Powered by Apache NiFi

Top 25 Use Cases of Cloudera Flow Management Powered by Apache NiFi

 Cloudera Flow Management has proven immensely popular in solving so many different use cases I thought I would make a list of the top twenty-five that I have seen recently.   

If you have never used CFM or Apache NiFi before, please checkout these two quick resources:   https://github.com/tspannhw/EverythingApacheNiFi and https://nifi.apache.org/docs/nifi-docs/.

21-25

25.   Ingesting Data into Kafka in the Public Cloud

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-kafka-ingest/topics/cdf-datahub-fm-kafka-ingest-overview.html

24.  Cybersecurity Data Collection and Filtering

https://www.datainmotion.dev/2020/10/monitoring-mac-laptops-with-apache-nifi.html

23.  Ingesting Data into Hive in the Public Cloud

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-hive-ingest/topics/cdf-datahub-nifi-hive-ingest.html

22. Ingesting Data into HBase in the Public Cloud

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-hbase-ingest/topics/cdf-datahub-nifi-hbase-ingest.html

21. Ingesting Data into Kudu in the Public Cloud

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-kudu-ingest/topics/cdf-datahub-nifi-kudu-ingest.html


16-20

20.  Ingesting Data into ADLS Storage

https://docs.cloudera.com/cdf-datahub/7.2.2/nifi-azure-ingest/topics/cdf-datahub-fm-adls-ingest-overview.html

19.   Populate SOLR Indexes

https://www.datainmotion.dev/2020/04/building-search-indexes-with-apache.html

18.  Hadoop Data to Kafka

https://www.datainmotion.dev/2020/04/read-apache-impala-apache-kudu-tables.html

17.   Deep Learning And Machine Learning Pipelines

https://www.datainmotion.dev/2019/12/easy-deep-learning-in-apache-nifi-with.html

16.  Intercepting JMS and SOA

https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_42.html


11-15

15.    Edge ML Model Integration

14.   Migrate Data from On-Premise Private Cloud to Public Cloud

13.   Converting XML to JSON

12.   MQTT to HDFS

11.   Ingesting REST Endpoints (Bulk)

6-10

10.  Ingesting Data into AWS S3 Buckets

9.  Ingest REST Endpoints

8.  Ingesting SAAS Products Like Salesforce

7.   Automating Manual Tasks

6.  Ingesting Social Media Data

Top 5

5.  Logs, Logs, Logs

https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_35.html

https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html

4.  FLaNK Streaming Data Pipeline (Any Data to Kafka to Flink SQL)

https://www.flankstack.dev/

3.   IoT - MiNiFi Agents Ingest, Store and Forward

https://www.datainmotion.dev/2020/02/edgeai-google-coral-with-coral.html

https://community.cloudera.com/t5/Community-Articles/IoT-Series-Sensors-Utilizing-Breakout-Garden-Hat-Part-2/ta-p/249380

2. Pseudo-CDC / Database Ingest

https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_15.html

1.  Doing a 1,000 different ingest, conversion, routing and transformation flows

The most common use case is doing a lot of things with a lot of data, including things like documents, XML, JSON, AVRO, Parquet, CSV, PDF, Images, Video, Mongo documents, Logs and more.    Rarely do I ever see someone solve just one problem with NiFi and say, that was enough.   One simple use cases leads to another and another and before you know it every cron job, script, ETL, ELT and big data op is now touched by NiFi.    Keep it up, Cloudera will make it ever easier soon.   Also check out NiFi Stateless for some of those more job/event oriented things like File to Kafka, Kafka to Kafka and more.

https://community.cloudera.com/t5/Community-Articles/Scanning-Documents-into-Data-Lakes-via-Tesseract-MQTT-Python/ta-p/248492


Running Flink SQL Against Kafka Using a Schema Registry Catalog

[FLaNK]:  Running Apache Flink SQL Against Kafka Using a Schema Registry Catalog



There are a few things you can do when you are sending data from Apache NiFi to Apache Kafka to maximize it's availability to Flink SQL queries through the catalogs.


AvroWriter



JSONReader




Producing Kafka Messages


Make sure you set AvroRecordSetWriter and set a Message Key Field.






A great way to work with Flink SQL is to connect to the Cloudera Schema Registry.   It let's you define your schema once them use it in Apache NiFi, Apache Kafka Connect, Apache Spark, Java Microservices 

Setup



Make sure you setup your HDFS directory for use by Flink which keeps history and other important information in HDFS.

HADOOP_USER_NAME=hdfs hdfs dfs -mkdir /user/root

HADOOP_USER_NAME=hdfs hdfs dfs -chown root:root /user/root


SQL-ENV.YAML:

configuration:
execution.target: yarn-session
catalogs:
- name: registry
type: cloudera-registry
# Registry Client standard properties
registry.properties.schema.registry.url: http://edge2ai-1.dim.local:7788/api/v1
# registry.properties.key:
# Registry Client SSL properties
# Kafka Connector properties
connector.properties.bootstrap.servers: edge2ai-1.dim.local:9092
connector.startup-mode: earliest-offset
- name: kudu
type: kudu
kudu.masters: edge2ai-1.dim.local:7051

CLI:

flink-sql-client embedded -e sql-env.yaml


We now have access to Kudu and Schema Registry catalogs of tables.   This let's use start querying, joining and filtering any of these multiple tables without having to recreate or redefine them.


SELECT * FROM events

Code:






Automating the Building, Migration, Backup, Restore and Testing of Streaming Applications

 Automating the Building, Migration, Backup, Restore and Testing of Streaming Applications


One of the main things you will want to add to your flows as you restore them from backup or migrate them between clusters is apply appropriate parameters.

So you can import the parameter contexts and then connect them to the correct process group(s).

nifi-toolkit-1.12.0/bin/cli.sh nifi import-param-context -u http://edge2ai-1.dim.local:8080 -i parameter.json

Note, values can be encrypted so the NiFi Operator or Developer doesn't have to see keys or protected values.


See an example script:

https://github.com/tspannhw/ApacheConAtHome2020/blob/main/scripts/setupnifi.sh

Resources