Apache NiFi 1.12 Released! 18-August-2020
Major Feature List
Release Date: August 18, 2020.
- New processor to write scripted record transforms live in the flow (ScriptedTransformRecord)
- Expose a REST Endpoint for easy metric scraping by Prometheus
- Ability to specify group level flow file concurrency - for instance run a single flow file end to end for traditional job handling
- Improved several capabilities related to Azure service interaction including ADLS Gen2
- Improved AMQP and MQTT support as well as JMS improvements
- Support for latest Kafka 2.6 clients
- Search UI Improvements
I will be posting a few demos and test drives soon.
Deleting Schemas From Cloudera Schema Registry
Exactly Once Requirements
- Apache Kafka - must have Exactly-Once selected, transactions enabled and correct driver.
- HDFS BucketingSink
- Apache Kafka
FLaNK in the Cloud!!!! Huge Cloudera Data Platform Public Cloud Updates - July 2020 - Data Flow Releases
FLaNK in the Cloud!!!!
Huge Cloudera Data Platform Public Cloud Updates
July 2020 - Data Flow Releases
- Data source reading from Kafka
- Data sinks writing to Kafka, HBase and Kudu
- Apache Atlas integration
- SQL/Table API and SQL Client
- Table connectors
- Hive (through catalog)
Sizing Your Apache NiFi Cluster For Production Workloads
The easiest way to grab monitoring data is via the NiFi REST API. Also everything in the NiFi UI is done through REST calls which you can call programmatically. Please read the NiFi docs they are linked directly from your running NiFi application or on the web. They are very thorough and have all the information you could want: https://nifi.apache.org/docs/nifi-docs/. If you are not running NiFi 1.11.4, I recommend you please upgrade. This is supported by Cloudera on multiple platforms.
NiFi Rest API
There's also an awesome Python wrapper for that REST API: https://pypi.org/project/nipyapi/
Also in NiFi flow programming, every time you produce data to Kafka you get metadata back in FlowFile Attributes. You can push those attributes directly to a kafka topic if you want.
So after your PublishKafkaRecord_2_0 1.11.4 so for success read the attributes on # of record and other data then AttributesToJson and push to another topic. you may want a mergerecord in there to aggregate a few of those together.
If you are interested in Kafka metrics/record counts/monitoring then you must use Cloudera Streams Messaging Manager, it provides a full Web UI, Monitoring Tool, Alerts, REST API and everything you need for monitoring every producer, consumer, broker, cluster, topic, message, offset and Kafka component.
The best way to get NiFi stats is to use the NiFi Reporting Tasks, I like the SQL Reporting task.
SQL Reporting Tasks are very powerful and use standard SELECT * FROM JVM_METRICS style reporting, see my article:
Using Cloudera Data Platform with Flow Management and Streams on Azure
|Apache NiFi on Azure CDP Data Hub|
- Streams Messaging Heavy Duty for AWS
- Streams Messaging Heavy Duty for Azure
- Flow Management Heavy Duty for AWS
- Flow Management Heavy Duty for Azure
- Apache Kafka 2.4.1
- Cloudera Schema Registry 0.8.1
- Cloudera Streams Messaging Manager 2.1.0
- Apache NiFi 1.11.4
- Apache NiFi Registry 0.5.0
NiFi and Kafka are autoconfigured to work with Apache Atlas under our environments Data Lake SDX. We can browse through the lineage for all the Kafka topics we use.
- Hive (through catalog)
mvn archetype:generateorg.apache.flink flink-quickstart-java 1.10.0
- https://github.com/tspannhw/OpenSourceComputerVision sudo /usr/sbin/nvpmodel -q