Skip to main content

Posts

Showing posts with the label azure

Using Cloudera Data Platform with Flow Management and Streams on Azure

Using Cloudera Data Platform with Flow Management and Streams on Azure Today I am going to be walking you through using Cloudera Data Platform (CDP) with Flow Management and Streams on Azure Cloud.  To see a streaming demo video, please join my webinar (or see it on demand) at  Streaming Data Pipelines with CDF in Azure .  I'll share some additional how-to videos on using Apache NiFi and Apache Kafka in Azure very soon.    Apache NiFi on Azure CDP Data Hub Sensors to ADLS/HDFS and Kafka In the above process group we are using QueryRecord to segment JSON records and only pick ones where the Temperature in Fahrenheit is over 80 degrees then we pick out a few attributes to display from the record and send them to a slack channel. To become a Kafka Producer you set a Record Reader for the type coming in, this is JSON in my case and then set a Record Writer for the type to send to the  sensors  topic.    In this case we kept it as JSON, but we could convert to AVRO.   I usually do that

Streaming Data with Cloudera Data Flow (CDF) Into Public Cloud (CDP)

Streaming Data with Cloudera Data Flow (CDF) Into Public Cloud (CDP) At Cloudera Now NYC, I showed a demo on streaming data from MQTT Sensors and Twitter that was running in AWS.   Today I am going to walk you through some of the details and give you the tools to build your own streaming demos in CDP Public Cloud.   If you missed that event, you can watch a recording here . Let's get streaming! Let's login, I use Okta for Single-Sign On (SSO) which makes this so easy.  Cloudera Flow Management (CFM) Apache NiFi is officially available in the CDP Public Cloud.   So get started here .   We will be following the guide ( https://docs.cloudera.com/cdf-datahub/7.1.0/howto-data-ingest.html ).   We are running CDF DataHub on CDP 7.1.0. There's a lot of data engineering and streaming tasks I can accomplish with few clicks.   I can bring up a virtual datawarehouse and use tools like Apache Hue and Data Analytics Studio to examine database and tables and run quer