Skip to main content

Simple Apache NiFi Operations Dashboard

Simple Apache NiFi Operations Dashboard
This is an evolving work in progress, please get involved everything is open source. @milind pandit and I are working on a project to build something useful for teams to analyze their flows, current cluster state, start and stop flows and have a rich one look dashboard.
There's a lot of data provided by Apache NiFi and related tools to aggregate, sort, categorize, search and eventually do machine learning analytics on.
There are a lot of tools that come out of the box that solve parts of these problems. Ambari Metrics, Grafana and Log Search provide a ton of data and analysis abilities. You can find all your errors easily in Log Search and see nice graphs of what is going on in Ambari Metrics and Grafana.
What is cool with Apache NiFi is that is has SitetoSite tasks for sending all the provenance, analytics, metrics and operational data you need to wherever you want it. That includes to Apache NiFi! This is Monitoring Driven Development (MDD).

Monitoring Driven Development (MDD)
In this little proof of concept work, we grab some of these flows process them in Apache NiFi and then store them in Apache Hive 3 tables for analytics. We should probably push the data to HBase for aggregates and Druid for time series. We will see as this expands.
There are also other data access options including the NiFi REST API and the NiFi Python APIs.
Boostrap Notifier
Reporting Tasks
  • AmbariReportingTask (global, per process group)
  • MonitorDiskUsage(Flowfile, content, provenance repositories)
  • MonitorMemory
Monitor Disk Usage
MonitorActivity
See:
These are especially useful for doing things like purging connections.
Purge it!
  • nipyapi.canvas.purge_connection(con_id)
  • nipyapi.canvas.purge_process_group(process_group, stop=False)
  • nipyapi.canvas.delete_process_group(process_group, force=True, refresh=True)


Use Cases
Example Metrics Data
  1. [ {
  2. "appid" : "nifi",
  3. "instanceid" : "7c84501d-d10c-407c-b9f3-1d80e38fe36a",
  4. "hostname" : "princeton1.field.hortonworks.com",
  5. "timestamp" : 1539411679652,
  6. "loadAverage1min" : 0.93,
  7. "availableCores" : 16,
  8. "FlowFilesReceivedLast5Minutes" : 14,
  9. "BytesReceivedLast5Minutes" : 343779,
  10. "FlowFilesSentLast5Minutes" : 0,
  11. "BytesSentLast5Minutes" : 0,
  12. "FlowFilesQueued" : 59952,
  13. "BytesQueued" : 294693938,
  14. "BytesReadLast5Minutes" : 241681,
  15. "BytesWrittenLast5Minutes" : 398753,
  16. "ActiveThreads" : 2,
  17. "TotalTaskDurationSeconds" : 273,
  18. "TotalTaskDurationNanoSeconds" : 273242860763,
  19. "jvmuptime" : 224997,
  20. "jvmheap_used" : 5.15272616E8,
  21. "jvmheap_usage" : 0.9597700387239456,
  22. "jvmnon_heap_usage" : -5.1572632E8,
  23. "jvmthread_statesrunnable" : 11,
  24. "jvmthread_statesblocked" : 2,
  25. "jvmthread_statestimed_waiting" : 26,
  26. "jvmthread_statesterminated" : 0,
  27. "jvmthread_count" : 242,
  28. "jvmdaemon_thread_count" : 125,
  29. "jvmfile_descriptor_usage" : 0.0709,
  30. "jvmgcruns" : null,
  31. "jvmgctime" : null
  32. } ]
Example Status Data
  1. {
  2. "statusId" : "a63818fe-dbd2-44b8-af53-eaa27fd9ef05",
  3. "timestampMillis" : "2018-10-18T20:54:38.218Z",
  4. "timestamp" : "2018-10-18T20:54:38.218Z",
  5. "actorHostname" : "princeton1.field.hortonworks.com",
  6. "componentType" : "RootProcessGroup",
  7. "componentName" : "NiFi Flow",
  8. "parentId" : null,
  9. "platform" : "nifi",
  10. "application" : "NiFi Flow",
  11. "componentId" : "7c84501d-d10c-407c-b9f3-1d80e38fe36a",
  12. "activeThreadCount" : 1,
  13. "flowFilesReceived" : 1,
  14. "flowFilesSent" : 0,
  15. "bytesReceived" : 1661,
  16. "bytesSent" : 0,
  17. "queuedCount" : 18,
  18. "bytesRead" : 0,
  19. "bytesWritten" : 1661,
  20. "bytesTransferred" : 16610,
  21. "flowFilesTransferred" : 10,
  22. "inputContentSize" : 0,
  23. "outputContentSize" : 0,
  24. "queuedContentSize" : 623564,
  25. "activeRemotePortCount" : null,
  26. "inactiveRemotePortCount" : null,
  27. "receivedContentSize" : null,
  28. "receivedCount" : null,
  29. "sentContentSize" : null,
  30. "sentCount" : null,
  31. "averageLineageDuration" : null,
  32. "inputBytes" : null,
  33. "inputCount" : 0,
  34. "outputBytes" : null,
  35. "outputCount" : 0,
  36. "sourceId" : null,
  37. "sourceName" : null,
  38. "destinationId" : null,
  39. "destinationName" : null,
  40. "maxQueuedBytes" : null,
  41. "maxQueuedCount" : null,
  42. "queuedBytes" : null,
  43. "backPressureBytesThreshold" : null,
  44. "backPressureObjectThreshold" : null,
  45. "isBackPressureEnabled" : null,
  46. "processorType" : null,
  47. "averageLineageDurationMS" : null,
  48. "flowFilesRemoved" : null,
  49. "invocations" : null,
  50. "processingNanos" : null
  51. }

Example Failure Data
  1. [ {
  2. "objectId" : "34c3249c-4a42-41ce-b94e-3563409ad55b",
  3. "platform" : "nifi",
  4. "project" : null,
  5. "bulletinId" : 28321,
  6. "bulletinCategory" : "Log Message",
  7. "bulletinGroupId" : "0b69ea51-7afb-32dd-a7f4-d82b936b37f9",
  8. "bulletinGroupName" : "Monitoring",
  9. "bulletinLevel" : "ERROR",
  10. "bulletinMessage" : "QueryRecord[id=d0258284-69ae-34f6-97df-fa5c82402ef3] Unable to query StandardFlowFileRecord[uuid=cd305393-f55a-40f7-8839-876d35a2ace1,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1539633295746-10, container=default, section=10], offset=95914, length=322846],offset=0,name=783936865185030,size=322846] due to Failed to read next record in stream for StandardFlowFileRecord[uuid=cd305393-f55a-40f7-8839-876d35a2ace1,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1539633295746-10, container=default, section=10], offset=95914, length=322846],offset=0,name=783936865185030,size=322846] due to -40: org.apache.nifi.processor.exception.ProcessException: Failed to read next record in stream for StandardFlowFileRecord[uuid=cd305393-f55a-40f7-8839-876d35a2ace1,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1539633295746-10, container=default, section=10], offset=95914, length=322846],offset=0,name=783936865185030,size=322846] due to -40",
  11. "bulletinNodeAddress" : null,
  12. "bulletinNodeId" : "91ab706b-5d92-454e-bc7a-6911d155fdca",
  13. "bulletinSourceId" : "d0258284-69ae-34f6-97df-fa5c82402ef3",
  14. "bulletinSourceName" : "QueryRecord",
  15. "bulletinSourceType" : "PROCESSOR",
  16. "bulletinTimestamp" : "2018-10-18T20:54:39.179Z"
  17. } ]

Apache Hive 3 Tables
  1. CREATE EXTERNAL TABLE IF NOT EXISTS failure (statusId STRING, timestampMillis BIGINT, `timestamp` STRING, actorHostname STRING, componentType STRING, componentName STRING, parentId STRING, platform STRING, `application` STRING, componentId STRING, activeThreadCount BIGINT, flowFilesReceived BIGINT, flowFilesSent BIGINT, bytesReceived BIGINT, bytesSent BIGINT, queuedCount BIGINT, bytesRead BIGINT, bytesWritten BIGINT, bytesTransferred BIGINT, flowFilesTransferred BIGINT, inputContentSize BIGINT, outputContentSize BIGINT, queuedContentSize BIGINT, activeRemotePortCount BIGINT, inactiveRemotePortCount BIGINT, receivedContentSize BIGINT, receivedCount BIGINT, sentContentSize BIGINT, sentCount BIGINT, averageLineageDuration BIGINT, inputBytes BIGINT, inputCount BIGINT, outputBytes BIGINT, outputCount BIGINT, sourceId STRING, sourceName STRING, destinationId STRING, destinationName STRING, maxQueuedBytes BIGINT, maxQueuedCount BIGINT, queuedBytes BIGINT, backPressureBytesThreshold BIGINT, backPressureObjectThreshold BIGINT, isBackPressureEnabled STRING, processorType STRING, averageLineageDurationMS BIGINT, flowFilesRemoved BIGINT, invocations BIGINT, processingNanos BIGINT) STORED AS ORC
  2. LOCATION '/failure';
  3.  
  4. CREATE EXTERNAL TABLE IF NOT EXISTS bulletin (objectId STRING, platform STRING, project STRING, bulletinId BIGINT, bulletinCategory STRING, bulletinGroupId STRING, bulletinGroupName STRING, bulletinLevel STRING, bulletinMessage STRING, bulletinNodeAddress STRING, bulletinNodeId STRING, bulletinSourceId STRING, bulletinSourceName STRING, bulletinSourceType STRING, bulletinTimestamp STRING) STORED AS ORC
  5. LOCATION '/error';
  6.  
  7.  
  8. CREATE EXTERNAL TABLE IF NOT EXISTS memory (objectId STRING, platform STRING, project STRING, bulletinId BIGINT, bulletinCategory STRING, bulletinGroupId STRING, bulletinGroupName STRING, bulletinLevel STRING, bulletinMessage STRING, bulletinNodeAddress STRING, bulletinNodeId STRING, bulletinSourceId STRING, bulletinSourceName STRING, bulletinSourceType STRING, bulletinTimestamp STRING) STORED AS ORC
  9. LOCATION '/memory'
  10. ;
  11.  
  12.  
  13. // backpressure
  14. CREATE EXTERNAL TABLE IF NOT EXISTS status (statusId STRING, timestampMillis BIGINT, `timestamp` STRING, actorHostname STRING, componentType STRING, componentName STRING, parentId STRING, platform STRING, `application` STRING, componentId STRING, activeThreadCount BIGINT, flowFilesReceived BIGINT, flowFilesSent BIGINT, bytesReceived BIGINT, bytesSent BIGINT, queuedCount BIGINT, bytesRead BIGINT, bytesWritten BIGINT, bytesTransferred BIGINT, flowFilesTransferred BIGINT, inputContentSize BIGINT, outputContentSize BIGINT, queuedContentSize BIGINT, activeRemotePortCount BIGINT, inactiveRemotePortCount BIGINT, receivedContentSize BIGINT, receivedCount BIGINT, sentContentSize BIGINT, sentCount BIGINT, averageLineageDuration BIGINT, inputBytes BIGINT, inputCount BIGINT, outputBytes BIGINT, outputCount BIGINT, sourceId STRING, sourceName STRING, destinationId STRING, destinationName STRING, maxQueuedBytes BIGINT, maxQueuedCount BIGINT, queuedBytes BIGINT, backPressureBytesThreshold BIGINT, backPressureObjectThreshold BIGINT, isBackPressureEnabled STRING, processorType STRING, averageLineageDurationMS BIGINT, flowFilesRemoved BIGINT, invocations BIGINT, processingNanos BIGINT) STORED AS ORC
  15. LOCATION '/status';
  16.  
  17.  
  18.  
  19.  
  20.  
  21.  
  22.  
  23.  

Popular posts from this blog

Migrating Apache Flume Flows to Apache NiFi: Kafka Source to HDFS / Kudu / File / Hive

Migrating Apache Flume Flows to Apache NiFi: Kafka Source to HDFS / Kudu / File / HiveArticle 7 - https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_9.html Article 6 - https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_35.html
Article 5 - 
Article 4 - https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_8.html Article 3 - https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_7.html Article 2 - https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache.html Article 1https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html Source Code:  https://github.com/tspannhw/flume-to-nifi
This is one possible simple, fast replacement for "Flafka".



Consume / Publish Kafka And Store to Files, HDFS, Hive 3.1, Kudu

Consume Kafka Flow 

 Merge Records And Store As AVRO or ORC
Consume Kafka, Update Records via Machine Learning Models In CDSW And Store to Kudu

Sour…

Exploring Apache NiFi 1.10: Stateless Engine and Parameters

Exploring Apache NiFi 1.10:   Stateless Engine and Parameters Apache NiFi is now available in 1.10!
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12344993

You can now use JDK 8 or JDK 11!   I am running in JDK 11, seems a bit faster.

A huge feature is the addition of Parameters!   And you can use these to pass parameters to Apache NiFi Stateless!

A few lesser Processors have been moved from the main download, see here for migration hints:
https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance

Release Notes:   https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.10.0

Example Source Code:https://github.com/tspannhw/stateless-examples

More New Features:

ParquetReader/Writer (See:  https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_7.html)Prometheus Reporting Task.   Expect more Prometheus stuff coming.Experimental Encrypted content repository.   People asked me for this one before.Par…

Ingesting Drone Data From DJII Ryze Tello Drones Part 1 - Setup and Practice

Ingesting Drone Data From DJII Ryze Tello Drones Part 1 - Setup and Practice In Part 1, we will setup our drone, our communication environment, capture the data and do initial analysis. We will eventually grab live video stream for object detection, real-time flight control and real-time data ingest of photos, videos and sensor readings. We will have Apache NiFi react to live situations facing the drone and have it issue flight commands via UDP. In this initial section, we will control the drone with Python which can be triggered by NiFi. Apache NiFi will ingest log data that is stored as CSV files on a NiFi node connected to the drone's WiFi. This will eventually move to a dedicated embedded device running MiniFi. This is a small personal drone with less than 13 minutes of flight time per battery. This is not a commercial drone, but gives you an idea of the what you can do with drones. Drone Live Communications for Sensor Readings and Drone Control You must connect to the drone…