Tracking Satellites with Apache NiFi

Thanks to https://www.n2yo.com/ for awesome data feeds.

Again, these types of ingests are so easy in Apache NiFi.

Step 1, schedule when we want these. There is a limit of 1,000 calls per hour, so let's keep it to 4 calls a minute for each of the three REST end points.

Let's get satellite information on right above me.

We set parameters for: your latitude, your longitude, your apikey and then just change up bits of the REST URL. Note for this one we are using SSL, so make sure you have an SSL context.

Now we have three streams of JSON data that has lat and long, so we can plot this on a map with Cloudera Visual Apps, storing our data in Impala tables in Kudu.

Some example data:

{

"info" : {

"satname" : "SPACE STATION",

"satid" : 25544,

"transactionscount" : "5"

"positions" : [ {

"satlatitude" : 37.46839338,

"satlongitude" : 95.12767402,

"sataltitude" : 422.01,

"azimuth" : 8.37,

"elevation" : -49.35,

"ra" : 290.4714551,

"dec" : 0.06300703,

"timestamp" : 1602242926,

"eclipsed" : false

}, {

"satlatitude" : 37.4278686,

"satlongitude" : 95.18684731,

"sataltitude" : 422.01,

"azimuth" : 8.32,

"elevation" : -49.37,

"ra" : 290.50535165,

"dec" : 0.04159856,

"timestamp" : 1602242927,

"eclipsed" : false

} ]

}

Unveiling the NVIDIA Jetson Nano 2GB and Other NVIDIA GTC 2020 Announcements

Unveiling the NVIDIA Jetson Nano 2GB and Other NVIDIA GTC 2020 Announcements

NVIDIA Jetson Nano 2GB Press Release

https://nvidianews.nvidia.com/news/nvidia-unveils-jetson-nano-2gb-the-ultimate-ai-and-robotics-starter-kit-for-students-educators-robotics-hobbyists

I have given this one a test run, it has all the features you like for Jetson, with just 2 GB less RAM and 2 less USB ports. This is a very affordable device to do cool apps.

128-core NVIDIA MaxwellTM
64-bit Quad-core ARM A57 (1.43 GHz)
2 GB 64-bit LPDDR4 (25.6 GB/s bandwidth)
Gigabit Ethernet
1x USB 3.0 Type A ports, 2x USB 2.0 Type A ports, 1x USB 2.0

Micro-B
HDMI
WiFi
GPIOs, I2C, I2S, SPI, PWM, UART
1x MIPI CSI-2 connector
MicroSD Connector
12-pin header (Power and related signals, UART)
100mm x 80mm x 29mm
USB-C Port for Power

Depending where you or or how you buy the package you may need to buy a power supply and USB WiFi.

All of my existing workloads have been working fine in the 2GB version, but with a very nice cost saving. The setup is easy, the system is fast, I have to highly recommend anyone looking for a quick way to do Edge AI and other edge workloads a try. This could be a decent machine for learning. I hooked mine up to a monitor, keyboard and mouse and I could use it right away for edge development and also as a basic desktop. Nice work! I might need to get 11 more of these. These will run MiNiFi agents, Python and Deep Learning classifications at ease.

NVIDIA didn't stop with the ultimate low-cost edge device, they have some serious enterprise updates as well:

Cloudera superchargers the Enterprise Data Cloud with NVIDIA

https://blog.cloudera.com/cloudera-supercharges-the-enterprise-data-cloud-with-nvidia/

There seems to be a ton more news coming at this virtual event, so I recommend attending and watching for more detailed posts on new things coming out.

Product page:

https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/education-projects/

Unboxing video:

https://youtu.be/dVGEtWYkP2c

NVIDIA Jetson Developer Community AI Projects:

https://youtu.be/2T8CG7lDkcU

Open-source projects on Jetson Nano 2GB:

https://youtu.be/fIESu365Sb0

Dev Blog:

https://developer.nvidia.com/blog/ultimate-starter-ai-computer-jetson-nano-2gb-developer-kit/

DevOps: Working with Parameter Contexts in Apache NiFi 1.11.4+

nifi list-param-contexts -u http://localhost:8080 -ot simple

# Id Name Description

- ------------------------------------ -------------- -----------

1 3a801ff4-1f73-1836-b59c-b9fbc79ab030 backupregistry

2 7184b9f4-0171-1000-4627-967e118f3037 health

3 3a801faf-1f87-1836-54ba-3d913fa223ad retail

4 3a801fde-1f73-1836-957b-a9f4d2c9b73d sensors

#> nifi export-param-context -u http://localhost:8080 -verbose --paramContextId 3a801faf-1f87-1836-54ba-3d913fa223ad

{

"name" : "retail",

"description" : "",

"parameters" : [ {

"parameter" : {

"name" : "allquery",

"description" : "",

"sensitive" : false,

"value" : "SELECT * FROM FLOWFILE"

}

}, {

"parameter" : {

"name" : "allrecordssql",

"description" : "",

"sensitive" : false,

"value" : "SELECT * FROM FLOWFILE"

}

}, {

"parameter" : {

"name" : "energytopic",

"description" : "",

"sensitive" : false,

"value" : "energy"

}

}, {

"parameter" : {

"name" : "importantsql",

"description" : "",

"sensitive" : false,

"value" : "SELECT * FROM FLOWFILE\nWHERE kernel_logs like '%SIGKILL%'"

}

}, {

"parameter" : {

"name" : "itempricetable",

"description" : "",

"sensitive" : false,

"value" : "impala::default.itemprice"

}

}, {

"parameter" : {

"name" : "itsgettingHotInHere",

"description" : "",

"sensitive" : false,

"value" : "SELECT * FROM\nFLOWFILE\nWHERE CAST (temp_f as DOUBLE) > 80\nAND UPPER(location) LIKE '%NJ%'"

}

You can now move that to another server and import. nifi import-param-context.

bin/cli.sh nifi list-param-contexts -u http://localhost:8080 -ot json

Use simple for a simple table list.

bin/cli.sh nifi export-param-context -u http://localhost:8080 --paramContextId 3a801ff4-1f73-1836-b59c-b9fbc79ab030 -ot json -o backupregistry.json

Example Shell Script

/Users/tspann/Downloads/nifi-toolkit-1.12.0/bin/cli.sh nifi export-param-context -u http://localhost:8080 --paramContextId a13e3764-134c-16f0-7c35-312b7ee4b182 -ot json -o financial.json

/Users/tspann/Downloads/nifi-toolkit-1.12.0/bin/cli.sh nifi export-param-context -u http://localhost:8080 --paramContextId 7184b9f4-0171-1000-4627-967e118f3037 -ot json -o health.json

/Users/tspann/Downloads/nifi-toolkit-1.12.0/bin/cli.sh nifi export-param-context -u http://localhost:8080 --paramContextId 3a801faf-1f87-1836-54ba-3d913fa223ad -ot json -o retail.json

/Users/tspann/Downloads/nifi-toolkit-1.12.0/bin/cli.sh nifi export-param-context -u http://localhost:8080 --paramContextId 3a801fde-1f73-1836-957b-a9f4d2c9b73d -ot json -o sensors.json

/Users/tspann/Downloads/nifi-toolkit-1.12.0/bin/cli.sh nifi export-param-context -u http://localhost:8080 --paramContextId 3a801ff4-1f73-1836-b59c-b9fbc79ab030 -ot json -o backupregistry.json

Reference

http://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html#nifi_CLI

https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.5.1/versioning-a-dataflow/content/parameters-in-versioned-flows.html

Using Google Forms As a A Data Source for NiFi Flows

Setup a Google Developers Account

Use or Create an API Key For Sheets at Developer Console

For Your Google Sheet (If not OAuth, You Need to Make it Visible via URL)

Or you will face PERMISSION_DENIED

Enable Google Sheets API

https://console.developers.google.com/apis/api/sheets.googleapis.com/overview?project=YOURPROJECTID

View Metrics

https://console.developers.google.com/apis/api/sheets.googleapis.com/overview?project=YOURPROJECTISCOOL

Access The Data Via NIFI

https://sheets.googleapis.com/v4/spreadsheets/YOURGOOGLESHEET?includeGridData=true&key=YOURKEY

References:

https://community.cloudera.com/t5/Community-Articles/Streaming-Ingest-of-Google-Sheets-with-HDF-2-0/ta-p/247764

Using DJL.AI For Deep Learning BERT Q&A in NiFi DataFlows

Introduction:

I will be talking about this processor at Apache Con @ Home 2020 in my "Apache Deep Learning 301" talk with Dr. Ian Brooks.

Sometimes you want your Deep Learning Easy and in Java, so let's do that with DJL in a custom Apache NiFi processor running in CDP Data Hubs. This one does BERT QA.

To use the processor feed in a paragraph to analyze via the paragraph parameter in the NiFi processor. Also feed in a question, like Why? or something very specific like asking the date or an event.

The pretrained model is BERT QA model using PyTorch. the NiFi Processor Source:

https://github.com/tspannhw/nifi-djlqa-processor

Grab the Recent Release NAR to install to your NiFi lib directories:

https://github.com/tspannhw/nifi-djlqa-processor/releases/tag/1.2

Example Run

Demo Data Source

https://newsapi.org/v2/everything?q=cloudera&apiKey=REGISTERFORAKEY

Reference:

Deep Learning Note:

BERT QA Model

Tip

Make sure you have 1-2 GB of RAM extra for your NiFi instance for running each DJL processor. If you have a lot of text, run more nodes and/or RAM. Make sure you have at least 8 cores per Deep Learning process. I prefer JDK 11 for this.

Using DJL.AI For Deep Learning Based Sentiment Analysis in NiFi DataFlow

Introduction:

I will be talking about this processor at Apache Con @ Home 2020 in my "Apache Deep Learning 301" talk with Dr. Ian Brooks.

Sometimes you want your Deep Learning Easy and in Java, so let's do that with DJL in a custom Apache NiFi processor running in CDP Data Hubs.

Grab the Source:

https://github.com/tspannhw/nifi-djlsentimentanalysis-processor

Grab the Recent Release NAR to install to your NiFi lib directories:

https://github.com/tspannhw/nifi-djlsentimentanalysis-processor/releases/tag/1.2

Example Run

probnegative

0.99

No value set

probnegativeperc

99.44

No value set

probpositive

0.01

No value set

probpositiveperc

0.56

No value set

rawclassification

[class: "Negative", probability: 0.99440, class: "Positive", probability: 0.00559]

Demo Data Source

https://newsapi.org/v2/everything?q=cloudera&apiKey=REGISTERFORAKEY

Reference:

Deep Learning Note:

The pretrained model is DistilBERT model trained by HuggingFace using PyTorch.

Tip

Cloudera Edge Flow Updates - Cloudera Edge Management (CEM) 1.2.1

Cloudera Edge Flow Updates

Support for RetryFlowFile processor for MiNiFi C++
Support for MergeContent processor for MiNiFi C++
New attributes introduced in ConsumeWindowsEventLog to add timezone information
MiNiFi Java preserves NiFi properties overrides during flow updates

References

Cloudera Streams Messaging Manager Swagger Docs (For Kafka Monitoring, Management, Kafka Connect)

Note that the port is 8585 and not the SMM port which is often 9991.

YOURSERVER:8585/swagger

See:

https://docs.cloudera.com/smm/2.0.0/rest-api-reference/index.html#/Application_context_related_operations

Apache Flink REST and Metrics

After seeing Caito Scherr's amazing talk, I want to build up some useful dashboards. My first step is exploring all the available APIs in my CSA/Flink environment. The easiest way to discover them was I turned on Developer Console in Chrome while using the Flink Dashboard which is a great dashboard in it's own right. But it is not focused on some key metrics that some customers are asking about in a very easy to read format for end-users.

Some URLs I have been using:

Overview of Flink Cluster

http://FLINKCLUSTER:8078/overview

{"taskmanagers":1,"slots-total":2,"slots-available":1,"jobs-running":1,"jobs-finished":0,"jobs-cancelled":3,"jobs-failed":0,"flink-version":"1.10.0-csa1.2.0.0","flink-commit":"664a5f5"}

Overview of All Flink Jobs

http://FLINKCLUSTER:8078/jobs/overview

{"jobs":[{"jid":"7c01884b74ff981a896307c4a06f2b15","name":"default: select * from itemprice","state":"CANCELED","start-time":1599576455857,"end-time":1599576486876,"duration":31019,"last-modification":1599576486876,"cluster":null,"tasks":{"total":1,"created":0,"scheduled":0,"deploying":0,"running":0,"finished":0,"canceling":0,"canceled":1,"failed":0,"reconciling":0}},{"jid":"faeb308856db337ce628af5fea24b895","name":"default: insert into krogerprices\nselect upc,updatedate,itemdescription,brandname,CAST(price as float) as price, UUID() as uuid\nfrom itemprice\nwhere originstore = 'kroger'","state":"CANCELED","start-time":1599674296089,"end-time":1599766705456,"duration":92409367,"last-modification":1599766705456,"cluster":null,"tasks":{"total":1,"created":0,"scheduled":0,"deploying":0,"running":0,"finished":0,"canceling":0,"canceled":1,"failed":0,"reconciling":0}},{"jid":"5d6ae4f72ab9fca3cea28ba6d4905ca7","name":"default: select * from krogerprices","state":"CANCELED","start-time":1599576795487,"end-time":1599579517485,"duration":2721998,"last-modification":1599579517485,"cluster":null,"tasks":{"total":1,"created":0,"scheduled":0,"deploying":0,"running":0,"finished":0,"canceling":0,"canceled":1,"failed":0,"reconciling":0}},{"jid":"ec80a32b6ab59d96f649f5b3e493ec67","name":"Streaming WordCount","state":"FINISHED","start-time":1599571302659,"end-time":1599571318768,"duration":16109,"last-modification":1599571318768,"cluster":null,"tasks":{"total":5,"created":0,"scheduled":0,"deploying":0,"running":0,"finished":5,"canceling":0,"canceled":0,"failed":0,"reconciling":0}},{"jid":"ad949e727a8c0267c9f2550c6a9b6000","name":"default: select * from itemprice","state":"CANCELED","start-time":1599676984684,"end-time":1599677004620,"duration":19936,"last-modification":1599677004620,"cluster":null,"tasks":{"total":1,"created":0,"scheduled":0,"deploying":0,"running":0,"finished":0,"canceling":0,"canceled":1,"failed":0,"reconciling":0}},{"jid":"8c1a6903b81e7b926b7105720e24aee8","name":"default: insert into krogerprices\nselect upc,updatedate,itemdescription,brandname,CAST(price as float) as price, UUID() as uuid\nfrom itemprice\nwhere originstore = 'kroger'","state":"CANCELED","start-time":1599576540243,"end-time":1599582887998,"duration":6347755,"last-modification":1599582887998,"cluster":null,"tasks":{"total":1,"created":0,"scheduled":0,"deploying":0,"running":0,"finished":0,"canceling":0,"canceled":1,"failed":0,"reconciling":0}},{"jid":"4cb8a5983b0bd3a14fe90618e17e2488","name":"default: select * from krogerprices","state":"CANCELED","start-time":1599674323425,"end-time":1599676592438,"duration":2269013,"last-modification":1599676592438,"cluster":null,"tasks":{"total":1,"created":0,"scheduled":0,"deploying":0,"running":0,"finished":0,"canceling":0,"canceled":1,"failed":0,"reconciling":0}},{"jid":"01988557ccd71cbab899ded9babab606","name":"default: insert into krogerprices\nselect upc,updatedate,itemdescription,brandname,CAST(price as float) as price, UUID() as uuid\nfrom itemprice\nwhere originstore = 'kroger'","state":"RUNNING","start-time":1599673893701,"end-time":-1,"duration":103791030,"last-modification":1599673903811,"cluster":{"url":"http://ec2-3-86-165-80.compute-1.amazonaws.com:8088/proxy/application_1599570933443_0003/","originalUrl":"http://ec2-3-86-165-80.compute-1.amazonaws.com:35981","id":"application_1599570933443_0003","hostAndPort":"ec2-3-86-165-80.compute-1.amazonaws.com:35981"},"tasks":{"total":1,"created":0,"scheduled":0,"deploying":0,"running":1,"finished":0,"canceling":0,"canceled":0,"failed":0,"reconciling":0}},{"jid":"7c7932678f193f51c32cd3a2ebff6d59","name":"default: select * from itemprice","state":"CANCELED","start-time":1599573232967,"end-time":1599576425840,"duration":3192873,"last-modification":1599576425840,"cluster":null,"tasks":{"total":1,"created":0,"scheduled":0,"deploying":0,"running":0,"finished":0,"canceling":0,"canceled":1,"failed":0,"reconciling":0}}]}

Flink Job Details

http://FLINKCLUSTER:8078/jobs/7c01884b74ff981a896307c4a06f2b15

{"jid":"7c01884b74ff981a896307c4a06f2b15","name":"default: select * from itemprice","isStoppable":false,"state":"CANCELED","start-time":1599576455857,"end-time":1599576486876,"duration":31019,"now":1599576486888,"timestamps":{"FAILING":0,"CANCELED":1599576486876,"RECONCILING":0,"SUSPENDED":0,"CREATED":1599576455857,"CANCELLING":1599576486855,"FAILED":0,"RESTARTING":0,"FINISHED":0,"RUNNING":1599576455905},"vertices":[{"id":"cbc357ccb763df2852fee8c4fc7d55f2","name":"Source: KafkaTableSource(upc, originstore, updatedate, longdescription, itemdescription, brandname, displayimage, price, msrp, tpr) -> SourceConversion(table=[registry.default_database.itemprice, source: [KafkaTableSource(upc, originstore, updatedate, longdescription, itemdescription, brandname, displayimage, price, msrp, tpr)]], fields=[upc, originstore, updatedate, longdescription, itemdescription, brandname, displayimage, price, msrp, tpr]) -> SinkConversionToTuple2 -> Sink: SQL Client Stream Collect Sink","parallelism":1,"status":"CANCELED","start-time":1599576455941,"end-time":1599576486876,"duration":30935,"tasks":{"RUNNING":0,"FAILED":0,"RECONCILING":0,"FINISHED":0,"CANCELING":0,"CANCELED":1,"SCHEDULED":0,"CREATED":0,"DEPLOYING":0},"metrics":{"read-bytes":0,"read-bytes-complete":true,"write-bytes":0,"write-bytes-complete":true,"read-records":0,"read-records-complete":true,"write-records":0,"write-records-complete":true}}],"status-counts":{"RUNNING":0,"FAILED":0,"RECONCILING":0,"FINISHED":0,"CANCELING":0,"CANCELED":1,"SCHEDULED":0,"CREATED":0,"DEPLOYING":0},"plan":{"jid":"7c01884b74ff981a896307c4a06f2b15","name":"default: select * from itemprice","nodes":[{"id":"cbc357ccb763df2852fee8c4fc7d55f2","parallelism":1,"operator":"","operator_strategy":"","description":"Source: KafkaTableSource(upc, originstore, updatedate, longdescription, itemdescription, brandname, displayimage, price, msrp, tpr) -&gt; SourceConversion(table=[registry.default_database.itemprice, source: [KafkaTableSource(upc, originstore, updatedate, longdescription, itemdescription, brandname, displayimage, price, msrp, tpr)]], fields=[upc, originstore, updatedate, longdescription, itemdescription, brandname, displayimage, price, msrp, tpr]) -&gt; SinkConversionToTuple2 -&gt; Sink: SQL Client Stream Collect Sink","optimizer_properties":{}}]}}

Flink Cluster Configuration

http://FLINKCLUSTER:8078/config

{"refresh-interval":10000,"timezone-name":"Coordinated Universal Time","timezone-offset":0,"flink-version":"1.10.0-csa1.2.0.0","flink-revision":"664a5f5 @ 29.04.2020 @ 14:13:26 UTC","features":{"web-submit":false,"global-dashboard":true}}

Jobs - Specific Job - Checkpoints

http://YARNCLUSTER:8088/proxy/application_1599570933443_0003/jobs/01988557ccd71cbab899ded9babab606/checkpoints

{"counts":{"restored":0,"total":0,"in_progress":0,"completed":0,"failed":0},"summary":{"state_size":{"min":0,"max":0,"avg":0},"end_to_end_duration":{"min":0,"max":0,"avg":0},"alignment_buffered":{"min":0,"max":0,"avg":0}},"latest":{"completed":null,"savepoint":null,"failed":null,"restored":null},"history":[]}

Job Manager Configuration

http://YARNCLUSTER:8088/proxy/application_1599570933443_0003/jobmanager/config

[{"key":"yarn.flink-dist-jar","value":"file:/opt/cloudera/parcels/FLINK-1.10.0-csa1.2.0.0-cdh7.1.1.0-326-2901802/lib/flink/lib/flink-dist_2.11-1.10.0-csa1.2.0.0.jar"},{"key":"high-availability.cluster-id","value":"application_1599570933443_0003"},{"key":"jobmanager.rpc.address","value":"ec2-3-86-165-80.compute-1.amazonaws.com"},{"key":"high-availability.storageDir","value":"hdfs:///user/flink/ha"},{"key":"state.backend.rocksdb.timer-service.factory","value":"ROCKSDB"},{"key":"io.tmp.dirs","value":"/yarn/nm/usercache/root/appcache/application_1599570933443_0003"},{"key":"historyserver.cli.fallback","value":"true"},{"key":"parallelism.default","value":"1"},{"key":"execution.buffer-timeout","value":"100"},{"key":"jobmanager.heap.size","value":"1073741824"},{"key":"execution.checkpointing.mode","value":"EXACTLY_ONCE"},{"key":"taskmanager.memory.process.size","value":"2 gb"},{"key":"web.port","value":"0"},{"key":"state.backend.local-recovery","value":"true"},{"key":"state.backend.rocksdb.memory.managed","value":"true"},{"key":"yarn.tags","value":"flink"},{"key":"state.backend.incremental","value":"true"},{"key":"taskmanager.memory.network.fraction","value":"0.1"},{"key":"yarn.container-start-command-template","value":"%java% %jvmmem% %jvmopts% -DyarnContainerId=$CONTAINER_ID %logging% %class% %args% %redirects%"},{"key":"web.tmpdir","value":"/tmp/flink-web-37be6c08-f457-468a-9c4b-5fde92c042b1"},{"key":"jobmanager.rpc.port","value":"33566"},{"key":"execution.checkpointing.timeout","value":"60000"},{"key":"high-availability.zookeeper.quorum","value":"ec2-3-86-165-80.compute-1.amazonaws.com:2181"},{"key":"taskmanager.memory.managed.fraction","value":"0.4"},{"key":"rest.address","value":"ec2-3-86-165-80.compute-1.amazonaws.com"},{"key":"state.backend","value":"FILESYSTEM"},{"key":"logging.configuration.file","value":"/etc/flink/conf/log4j.properties"},{"key":"execution.checkpointing.max-concurrent-checkpoints","value":"1"},{"key":"high-availability.zookeeper.client.acl","value":"open"},{"key":"historyserver.web.address","value":"ec2-3-86-165-80.compute-1.amazonaws.com"},{"key":"state.checkpoints.num-retained","value":"3"},{"key":"historyserver.web.port","value":"8078"},{"key":"pipeline.auto-watermark-interval","value":"200"},{"key":"state.savepoints.dir","value":"hdfs:///user/flink/savepoints"},{"key":"pipeline.generic-types","value":"true"},{"key":"yarn.maximum-failed-containers","value":"100"},{"key":"yarn.application-attempts","value":"5"},{"key":"taskmanager.numberOfTaskSlots","value":"2"},{"key":"state.backend.rocksdb.memory.write-buffer-ratio","value":"0.5"},{"key":"jobmanager.archive.fs.dir","value":"hdfs:///user/flink/applicationHistory"},{"key":"execution.target","value":"yarn-per-job"},{"key":"pipeline.object-reuse","value":"false"},{"key":"internal.io.tmpdirs.use-local-default","value":"true"},{"key":"state.backend.rocksdb.memory.high-prio-pool-ratio","value":"0.1"},{"key":"taskmanager.memory.network.max","value":"2147483648"},{"key":"execution.attached","value":"false"},{"key":"internal.cluster.execution-mode","value":"NORMAL"},{"key":"execution.checkpointing.externalized-checkpoint-retention","value":"RETAIN_ON_CANCELLATION"},{"key":"high-availability","value":"ZOOKEEPER"},{"key":"execution.checkpointing.min-pause","value":"0"},{"key":"execution.checkpointing.snapshot-compression","value":"false"},{"key":"state.checkpoints.dir","value":"hdfs:///user/flink/checkpoints"}]

Job Manager - Log
http://YARNCLUSTER:8088/proxy/application_1599570933443_0003/jobmanager/log
2020-09-11 13:37:42,586 INFO  org.apache.flink.yarn.YarnResourceManager                     - Disconnect job manager 9c2e3f25dae1d548dc730941d6484cbb@akka.tcp://flink@ec2-3-86-165-80.compute-1.amazonaws.com:33566/user/jobmanager_7 for job 67a2de8fb291333bbd90b334f8f83def from the resource manager.
2020-09-11 13:37:42,586 INFO  org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Stopping ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/67a2de8fb291333bbd90b334f8f83def/job_manager_lock'}.
2020-09-11 13:37:42,591 INFO  org.apache.flink.runtime.jobmanager.ZooKeeperJobGraphStore    - Removed job graph 67a2de8fb291333bbd90b334f8f83def from ZooKeeper.

Job Manager - Standard Out
http://YARNCLUSTER:8088/proxy/application_1599570933443_0003/jobmanager/stdout

Task Managers
http://YARNCLUSTER:8088/proxy/application_1599570933443_0003/taskmanagers

{"taskmanagers":[{"id":"container_1599570933443_0003_04_000002","path":"akka.tcp://flink@ec2-3-86-165-80.compute-1.amazonaws.com:38435/user/taskmanager_0","dataPort":41920,"timeSinceLastHeartbeat":1599832789681,"slotsNumber":2,"freeSlots":1,"hardware":{"cpuCores":16,"physicalMemory":133838598144,"freeMemory":669515776,"managedMemory":665719939}}]}

Task Managers - Container
http://YARNCLUSTER:8088/proxy/application_1599570933443_0003/taskmanagers/container_1599570933443_0003_04_000002

{"id":"container_1599570933443_0003_04_000002","path":"akka.tcp://flink@ec2-3-86-165-80.compute-1.amazonaws.com:38435/user/taskmanager_0","dataPort":41920,"timeSinceLastHeartbeat":1599832809720,"slotsNumber":2,"freeSlots":1,"hardware":{"cpuCores":16,"physicalMemory":133838598144,"freeMemory":669515776,"managedMemory":665719939},"metrics":{"heapUsed":200512944,"heapCommitted":664272896,"heapMax":664272896,"nonHeapUsed":145389616,"nonHeapCommitted":149569536,"nonHeapMax":780140544,"directCount":5111,"directUsed":168414439,"directMax":168414435,"mappedCount":0,"mappedUsed":0,"mappedMax":0,"memorySegmentsAvailable":5079,"memorySegmentsTotal":5079,"garbageCollectors":[{"name":"PS_Scavenge","count":91,"time":453},{"name":"PS_MarkSweep","count":4,"time":358}]}}

Task Managers - Container - Log

http://YARNCLUSTER:8088/proxy/application_1599570933443_0003/taskmanagers/container_1599570933443_0003_04_000002/log

2020-09-09 17:58:17,456 INFO  org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer  - Starting FlinkKafkaInternalProducer (1/1) to produce into default topic krogerprices
2020-09-09 17:58:17,465 INFO  org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - Consumer subtask 0 has no restore state.
2020-09-09 17:58:17,475 INFO  org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.ConsumerConfig  - ConsumerConfig values:

Task Managers - Container - Standard Out

http://YARNCLUSTER:8088/proxy/application_1599570933443_0003/taskmanagers/container_1599570933443_0003_04_000002/stdout

Talks

For the definitive way to read and use these, see Caito's awesome talk here:

https://www.youtube.com/watch?v=MQQ7qaKKKc8

Command Line (https://docs.cloudera.com/csa/1.2.0/job-lifecycle/topics/csa-supported-cli.html)

flink list

20/09/11 13:45:34 INFO cli.CliFrontend: Successfully retrieved list of jobs

------------------ Running/Restarting Jobs -------------------

09.09.2020 17:51:33 : 01988557ccd71cbab899ded9babab606 : default: insert into krogerprices

select upc,updatedate,itemdescription,brandname,CAST(price as float) as price, UUID() as uuid

from itemprice

where originstore = 'kroger' (RUNNING)

--------------------------------------------------------------

Data and Streaming News September 2020

Apache NiFi 1.12 Released! 18-August-2020

Release Notes

https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.12.0

Issues

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12346778

Major Feature List

https://twitter.com/pvillard31/status/1296469452180119553

Release Date: August 18, 2020.

Major Features:

New processor to write scripted record transforms live in the flow (ScriptedTransformRecord)
Expose a REST Endpoint for easy metric scraping by Prometheus
Ability to specify group level flow file concurrency - for instance run a single flow file end to end for traditional job handling
Improved several capabilities related to Azure service interaction including ADLS Gen2
Improved AMQP and MQTT support as well as JMS improvements
Support for latest Kafka 2.6 clients
Search UI Improvements

I will be posting a few demos and test drives soon.

ScriptedTransformRecord

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.12.0/org.apache.nifi.processors.script.ScriptedTransformRecord/index.html

Deleting Schemas From Cloudera Schema Registry

It is very easy to delete schemas from Cloudera Schema Registry if you need to do so. I recommend downloading them and having a backup first.

Let's look at our schema

Well let's get rid of that junk.

Here is the documentation For CDF Datahub in CDP Public Cloud

https://docs.cloudera.com/cdf-datahub/7.0.2/using-schema-registry/topics/csp-deleting-schemas.html

Example

curl -X DELETE "http://MYSERVERHASACOOLNAME.DEV:7788/api/v1/schemaregistry/schemas/junk" -H "accept: application/json"

Where junk is the name of my schema.

You could call this REST API from NiFi, a DevOps tool or just a simple CURL like listed above.

Knox and other security may apply.

Did the user really ask for Exactly Once? Fault Tolerance

Exactly Once Requirements

It is very tricky and can cause performance degradation, if your user could just use at least once, then always go with that. Having data sinks like Kudu where you can do an upsert makes exactly once less needed.

https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-kafka.html

Apache Flink, Apache NiFi Stateless and Apache Kafka can participate in that.

For CDF Stream Processing and Analytics with Apache Flink 1.10 Streaming:

Both Kafka sources and sinks can be used with exactly once processing guarantees when checkpointing is enabled.

End-to-End Guaranteed Exactly-Once Record Delivery

The Data Source and Data Sink to need to support exactly-once state semantics and take part in checkpointing.

Data Sources

Apache Kafka - must have Exactly-Once selected, transactions enabled and correct driver.

Select: Semantic.EXACTLY_ONCE

Data Sinks

HDFS BucketingSink
Apache Kafka

For Kafka, please check the timeouts sync up to checkpoints. https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/kafka.html#kafka-producers-and-fault-tolerance

Reference

FLaNK in the Cloud!!!! Huge Cloudera Data Platform Public Cloud Updates - July 2020 - Data Flow Releases

FLaNK in the Cloud!!!!

Huge Cloudera Data Platform Public Cloud Updates

July 2020 - Data Flow Releases

With the promotion of Cloud Runtime 7.2.1 to Public Cloud, the CDF team is pleased to announce three key and very important updates that were also promoted to production today and available to customers. These are:

https://docs.cloudera.com/runtime/7.2.1/index.html

https://docs.cloudera.com/runtime/7.2.1/kafka-connector-reference/topics/kafka-connector-reference-hdfs.html

https://docs.cloudera.com/runtime/7.2.1/cctrl-rest-api-reference/index.html

https://docs.cloudera.com/runtime/7.2.1/smm-rest-api-reference/index.html

https://docs.cloudera.com/runtime/7.2.1/srm-rest-api-reference/index.html

https://docs.cloudera.com/runtime/7.2.1/cctrl-overview/topics/cctrl-how-it-works.html

Let's see what's new in 7.2.1!

https://docs.cloudera.com/cdf-datahub/7.2.1/release-notes/topics/cdf-datahub-whats-new.html

Flow (GA) - 2.0.3 is optimized for public cloud!

https://docs.cloudera.com/cdf-datahub/7.2.1/index.html

https://docs.cloudera.com/cdf-datahub/7.2.1/nifi-security/topics/cdf-datahub-nifi-security-predefined-nifiregistry-policies.html

Streams (GA) - Now with Apache Kafka v2.5!