Apache NiFi 1.12 Released! 18-August-2020

Apache NiFi 1.12 Released! 18-August-2020

Release Notes

https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.12.0

Issues

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12346778

Major Feature List

https://twitter.com/pvillard31/status/1296469452180119553


Release Date: August 18, 2020. 

Major Features:

  • New processor to write scripted record transforms live in the flow (ScriptedTransformRecord)
  • Expose a REST Endpoint for easy metric scraping by Prometheus
  • Ability to specify group level flow file concurrency - for instance run a single flow file end to end for traditional job handling
  • Improved several capabilities related to Azure service interaction including ADLS Gen2 
  • Improved AMQP and MQTT support as well as JMS improvements
  • Support for latest Kafka 2.6 clients
  • Search UI Improvements

I will be posting a few demos and test drives soon.


ScriptedTransformRecord


https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.12.0/org.apache.nifi.processors.script.ScriptedTransformRecord/index.html


Deleting Schemas From Cloudera Schema Registry

 Deleting Schemas From Cloudera Schema Registry


It is very easy to delete schemas from Cloudera Schema Registry if you need to do so.   I recommend downloading them and having a backup first.

Let's look at our schema


Well let's get rid of that junk.

Here is the documentation For CDF Datahub in CDP Public Cloud



Example

curl -X DELETE "http://MYSERVERHASACOOLNAME.DEV:7788/api/v1/schemaregistry/schemas/junk" -H "accept: application/json"

Where junk is the name of my schema.





You could call this REST API from NiFi, a DevOps tool or just a simple CURL like listed above.

Knox and other security may apply.


Did the user really ask for Exactly Once? Fault Tolerance

Exactly Once Requirements

It is very tricky and can cause performance degradation, if your user could just use at least once, then always go with that.    Having data sinks like Kudu where you can do an upsert makes exactly once less needed.

https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-kafka.html

Apache Flink, Apache NiFi Stateless and Apache Kafka can participate in that.

For CDF Stream Processing and Analytics with Apache Flink 1.10 Streaming:

Both Kafka sources and sinks can be used with exactly once processing guarantees when checkpointing is enabled.


End-to-End Guaranteed Exactly-Once Record Delivery

The Data Source and Data Sink to need to support exactly-once state semantics and take part in checkpointing.


Data Sources
  • Apache Kafka - must have Exactly-Once selected, transactions enabled and correct driver.

Select:  Semantic.EXACTLY_ONCE


Data Sinks
  • HDFS BucketingSink
  • Apache Kafka



Reference