Showing posts with label cloudera. Show all posts
Showing posts with label cloudera. Show all posts

FLaNK Stack For 19 February 2024



Monday Feb 19, 2024 is Presidents Day

FLaNK Stack Weekly

Tim Spann @PaaSDev

Get your new Apache NiFi for Dummies!


Building Realtime AI Applications with Apache Flink



Please join my meetup group NJ/NYC/Philly/Virtual.


**This is Issue #125 **


NYC Traffic?? (NiFi, Kafka, Flink)

Subways and Transit Updates in Real-Time

Open Source Data Infrastructure Meetup - Feb 2024

Catalogs in Flink SQL: A Primer


Unlocking Financial Data with Real-Time Pipelines (OSACon 2023)

The Never Landing Stream


February 8, 2024 Meetup


Feb 2024: Webinar

Feb 20, 2024: 12-1PM EST. Virtual. Azure Data Tech Groups: DBA Fundamentals Group

Feb 22, 2024: NYC. AI Camp Meetup.

Feb 28, 2024: NYC. Cloudera Meetup. Flink

Feb 29, 2024: Virtual. Conf42 Python.

Soon, 2024: Princeton. TigerLabs New Location. Meetup. GenAI.

March 15, 2024: TCF Pro. Princeton, NJ. IT Professional Conference at Trenton Computer Festival IEEE Information Technology Professional Conference on Friday, March 15th, 2024

April 2024: XtremeJ 2024. Virtual.

April 11, 2024: Conf42 LLM. Virtual.

May 8-9, 2024: Data Summit 2024. Boston, MA.

Cloudera Events

More Events:




© 2020-2024 Tim Spann

Using Cloudera Data Platform with Flow Management and Streams on Azure

Using Cloudera Data Platform with Flow Management and Streams on Azure

Today I am going to be walking you through using Cloudera Data Platform (CDP) with Flow Management and Streams on Azure Cloud.  To see a streaming demo video, please join my webinar (or see it on demand) at Streaming Data Pipelines with CDF in Azure.  I'll share some additional how-to videos on using Apache NiFi and Apache Kafka in Azure very soon.   

Apache NiFi on Azure CDP Data Hub
Sensors to ADLS/HDFS and Kafka

In the above process group we are using QueryRecord to segment JSON records and only pick ones where the Temperature in Fahrenheit is over 80 degrees then we pick out a few attributes to display from the record and send them to a slack channel.

To become a Kafka Producer you set a Record Reader for the type coming in, this is JSON in my case and then set a Record Writer for the type to send to the sensors topic.    In this case we kept it as JSON, but we could convert to AVRO.   I usually do that if I am going to be reading it with Cloudera Kafka Connect.

Our security is automagic and requires little for you to do in NiFi.   I put in my username and password from CDP.   The SSL context is setup for my when I create my datahub.

When I am writing to our Real-Time Data Mart (Apache Kudu), I enter my Kudu servers that I copied from the Kudu Data Mart Hardware page, put in my table name and your login info.   I recommend UPSERT and use your Record Reader JSON.

For real use cases, you will need to spin up:

Public Cloud Data Hubs:
  • Streams Messaging Heavy Duty for AWS
  • Streams Messaging Heavy Duty for Azure
  • Flow Management Heavy Duty for AWS
  • Flow Management Heavy Duty for Azure
  • Apache Kafka 2.4.1
  • Cloudera Schema Registry 0.8.1
  • Cloudera Streams Messaging Manager 2.1.0
  • Apache NiFi 1.11.4
  • Apache NiFi Registry 0.5.0
Demo Source Code:

Let's configure out Data Hubs in CDP in an Azure Environment.   It is a few clicks and some naming and then it builds.

Under the Azure Portal

In Azure, we can examine the files we uploaded to the Azure object store.

Under the Data Lake SDX

NiFi and Kafka are autoconfigured to work with Apache Atlas under our environments Data Lake SDX.  We can browse through the lineage for all the Kafka topics we use.

We can also see the flow for NiFi, HDFS and Kudu.


We can examine all of our Kafka infrastructure from Kafka Brokers, Topics, Consumers, Producers, Latency and Messages.  We can also create and update topics.

Cloudera Manager

We still have access to all of our traditional items like Cloudera Manager to manage configuration of servers.

Under Real-Time Data Mart

We can view tables, create tables and query our table.   Apache Hue is a great tool for accessing data in my Real-Time Data Mart in a datahub.

We can also look at table details in the Impala UI.

©2020 Timothy Spann