Skip to main content

Posts

Commonly Used TCP/IP Ports in Streaming

Cloudera CDF and HDF PortsNiFi and Friends
FLaNK Extended Stack


Note: 
All of these ports can be changed by administrators or in version updates.   Also if you are running Apache Knox like in Cloudera Data Platform Public Cloud, these ports may be changed or hidden.   This is just based on a version of CDF I am running and defaults in.   This does not include standard Cloudera ports for Cloudera Manager, Hadoop, Atlas, Ranger and other necessary and fun services.

Cloudera Flow Management (CFM Powered by Apache NiFi) Cloudera NiFi HTTP:    8080 or 9090Cloudera NiFi HTTPS:  8443 or 9443Cloudera NiFi RIP Socket: 10443 or 50999Cloudera NiFi Node Protocol: 11443Cloudera NiFi Load Balancing:  6342Cloudera NiFi Registry: 18080Cloudera NiFi Registry SSL: 18433Cloudera NiFi Certificate Authority:  10443
Cloudera Edge Flow Management (CEM Powered by Apache NiFi - MiNiFi)
Cloudera EFM HTTP:  10080Cloudera EFM CoAP:  8989
Cloudera Stream Processing (CSP Powered by Apache Kafka) Cloudera Kafka: 9092Clouder…

Cloudera Edge Management 1.1.0 Release

Let's Query Kafka with Hive

Let's Query Kafka with Hive


I can hop into beeline and build an external Hive table to access my Cloudera CDF Kafka cluster whether it is in the public cloud in CDP DataHub, on-premise in HDF or CDF or in CDP-DC.
I just have to set my KafkaStorageHandler, Kafka Topic Name and my bootstrap servers (usually port 9092).   Now I can use that table to do ELT/ELT for populating Hive tables or populating Kafka topics from Hive tables.   This is a nice and easy way to do data engineering on the quick and easy.
This is a good item to augment CDP Data Engineering with Spark, CDP DataHub with NiFi, CDP DataHub with Kafka and KafkaStreams and various SQOOP or Python utilities you may have in your environment.
For real-time continuous queries on Kafka with SQL, you can use Flink SQL.  https://www.datainmotion.dev/2020/05/flank-low-code-streaming-populating.html


Example Table Create
CREATE EXTERNAL TABLE <tableName>   (`uuid` STRING, `systemtime` STRING , `temperaturef` STRING , `pressure` DOUBL…

Cloudera Flow Management 101: Let's Build a Simple REST Ingest to Cloud Datawarehouse With LowCode? Powered by Apache NiFi

Use NiFi to call REST API, transform, route and store the data
Pick any REST API of your choice, but I have walked through this one to grab a number of weather stations reports.  Weather or not we have good weather, we can query it anyway.
We are going to build a GenerateFlowFile to feed our REST calls.
[ {"url":"http://weather.gov/xml/current_obs/CWAV.xml"}, {"url":"http://weather.gov/xml/current_obs/KTTN.xml"}, {"url":"http://weather.gov/xml/current_obs/KEWR.xml"}, {"url":"http://weather.gov/xml/current_obs/KEWR.xml"}, {"url":"http://weather.gov/xml/current_obs/CWDK.xml"}, {"url":"http://weather.gov/xml/current_obs/CWDZ.xml"}, {"url":"http://weather.gov/xml/current_obs/CWFJ.xml"}, {"url":"http://weather.gov/xml/current_obs/PAEC.xml"}, {"url":"http://weather.gov/xml/current_obs/PAYA.xml"}, {"url":&qu…