Kafka Replication with Cloudera Streams Replication Manager
Apache NiFi 1.12 Released! 18-August-2020
Apache NiFi 1.12 Released! 18-August-2020
Release Notes
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.12.0
Issues
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12346778
Major Feature List
https://twitter.com/pvillard31/status/1296469452180119553
Release Date: August 18, 2020.
Major Features:
- New processor to write scripted record transforms live in the flow (ScriptedTransformRecord)
- Expose a REST Endpoint for easy metric scraping by Prometheus
- Ability to specify group level flow file concurrency - for instance run a single flow file end to end for traditional job handling
- Improved several capabilities related to Azure service interaction including ADLS Gen2
- Improved AMQP and MQTT support as well as JMS improvements
- Support for latest Kafka 2.6 clients
- Search UI Improvements
I will be posting a few demos and test drives soon.
ScriptedTransformRecord
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.12.0/org.apache.nifi.processors.script.ScriptedTransformRecord/index.html
Deleting Schemas From Cloudera Schema Registry
Deleting Schemas From Cloudera Schema Registry
Did the user really ask for Exactly Once? Fault Tolerance
Exactly Once Requirements
- Apache Kafka - must have Exactly-Once selected, transactions enabled and correct driver.
- HDFS BucketingSink
- Apache Kafka
Reference
FLaNK in the Cloud!!!! Huge Cloudera Data Platform Public Cloud Updates - July 2020 - Data Flow Releases
FLaNK in the Cloud!!!!
Huge Cloudera Data Platform Public Cloud Updates
July 2020 - Data Flow Releases
- Data source reading from Kafka
- Data sinks writing to Kafka, HBase and Kudu
- Apache Atlas integration
- SQL/Table API and SQL Client
- Table connectors
- Kafka
- Kudu
- Hive (through catalog)
Sizing Your Apache NiFi Cluster For Production Workloads
Sizing Your Apache NiFi Cluster For Production Workloads
Report on This: Apache NiFi 1.11.4 - Monitor All The Things
The easiest way to grab monitoring data is via the NiFi REST API. Also everything in the NiFi UI is done through REST calls which you can call programmatically. Please read the NiFi docs they are linked directly from your running NiFi application or on the web. They are very thorough and have all the information you could want: https://nifi.apache.org/docs/nifi-docs/. If you are not running NiFi 1.11.4, I recommend you please upgrade. This is supported by Cloudera on multiple platforms.
NiFi Rest API
https://nifi.apache.org/docs/nifi-docs/rest-api/
There's also an awesome Python wrapper for that REST API: https://pypi.org/project/nipyapi/
Also in NiFi flow programming, every time you produce data to Kafka you get metadata back in FlowFile Attributes. You can push those attributes directly to a kafka topic if you want.
So after your PublishKafkaRecord_2_0 1.11.4 so for success read the attributes on # of record and other data then AttributesToJson and push to another topic. you may want a mergerecord in there to aggregate a few of those together.
If you are interested in Kafka metrics/record counts/monitoring then you must use Cloudera Streams Messaging Manager, it provides a full Web UI, Monitoring Tool, Alerts, REST API and everything you need for monitoring every producer, consumer, broker, cluster, topic, message, offset and Kafka component.
The best way to get NiFi stats is to use the NiFi Reporting Tasks, I like the SQL Reporting task.
SQL Reporting Tasks are very powerful and use standard SELECT * FROM JVM_METRICS style reporting, see my article:
https://www.datainmotion.dev/2020/04/sql-reporting-task-for-cloudera-flow.html
Monitoring Articles
https://www.datainmotion.dev/2019/04/monitoring-number-of-of-flow-files.html
https://www.datainmotion.dev/2019/03/apache-nifi-operations-and-monitoring.html
Other Resources
https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_9.html
https://www.datainmotion.dev/2019/08/using-cloudera-streams-messaging.html
https://dev.to/tspannhw/apache-nifi-and-nifi-registry-administration-3c92
https://dev.to/tspannhw/using-nifi-cli-to-restore-nifi-flows-from-backups-18p9
https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html
https://www.datainmotion.dev/p/links.html
https://www.tutorialspoint.com/apache_nifi/apache_nifi_monitoring.htm
Using Cloudera Data Platform with Flow Management and Streams on Azure
Using Cloudera Data Platform with Flow Management and Streams on Azure
Apache NiFi on Azure CDP Data Hub |
- Streams Messaging Heavy Duty for AWS
- Streams Messaging Heavy Duty for Azure
- Flow Management Heavy Duty for AWS
- Flow Management Heavy Duty for Azure
- Apache Kafka 2.4.1
- Cloudera Schema Registry 0.8.1
- Cloudera Streams Messaging Manager 2.1.0
- Apache NiFi 1.11.4
- Apache NiFi Registry 0.5.0
NiFi and Kafka are autoconfigured to work with Apache Atlas under our environments Data Lake SDX. We can browse through the lineage for all the Kafka topics we use.
- https://www.cloudera.com/about/enterprise-data-cloud.html
https://docs.cloudera.com/cdf-datahub/7.2.0/release-notes/topics/cdf-datahub-whats-new.html
https://dzone.com/articles/lets-build-a-simple-ingest-to-cloud-data-warehouse
https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
The Rise of the Mega Edge (FLaNK)
Explore Enterprise Apache Flink with Cloudera Streaming Analytics - CSA 1.2
- Kafka
- Kudu
- Hive (through catalog)
- JSON
- Avro
- CSV
mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.10.0
Using Apache Kafka Using Cloudera Data Platform Data Center 7.1.1
Unboxing the Most Amazing Edge AI Device Part 1 of 3 - NVIDIA Jetson Xavier NX
- https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-xavier-nx/
- https://elinux.org/Jetson_Zoo
- https://ngc.nvidia.com/catalog/containers/nvidia:l4t-ml
- https://www.nvidia.com/en-us/deep-learning-ai/education/?ncid=so-dis-dldlwsd1-72342
- https://developer.nvidia.com/embedded/jetpack
- https://elinux.org/Jetson_Nano#Cameras
- https://developer.nvidia.com/embedded/community/jetson-projects
- https://github.com/neuralet/neuralet/tree/master/applications/smart-distancing
- https://developer.nvidia.com/embedded/downloads
- https://www.jetsonhacks.com/
- https://docs.nvidia.com/jetson/jetpack/introduction/
- https://devblogs.nvidia.com/bringing-cloud-native-agility-to-edge-ai-with-jetson-xavier-nx/
- https://github.com/tspannhw/minifi-jetson-nano
- https://community.cloudera.com/t5/Community-Articles/Edge-Data-Processing-with-Jetson-Nano-Part-3-AI-Integration/ta-p/93642
- https://www.slideshare.net/bunkertor/iot-edge-data-processing-with-nvidia-jetson-nano-oct-3-2019
- https://dzone.com/articles/edge-data-processing-with-jetson-nano
- https://github.com/dusty-nv/jetson-inference/blob/master/python/examples/segnet-camera.py
- https://github.com/tspannhw/nvidiajetsontx1-mxnet/blob/master/classify.py
- https://github.com/tspannhw/ApacheDeepLearning202
- https://github.com/tspannhw/OpenSourceComputerVision sudo /usr/sbin/nvpmodel -q