Skip to main content

Posts

Showing posts with the label cdf

Deleting Schemas From Cloudera Schema Registry

 Deleting Schemas From Cloudera Schema Registry It is very easy to delete schemas from Cloudera Schema Registry if you need to do so.   I recommend downloading them and having a backup first. Let's look at our schema Well let's get rid of that junk . Here is the documentation For CDF Datahub in CDP Public Cloud https://docs.cloudera.com/cdf-datahub/7.0.2/using-schema-registry/topics/csp-deleting-schemas.html Example curl -X DELETE "http://MYSERVERHASACOOLNAME.DEV:7788/api/v1/schemaregistry/schemas/ junk " -H "accept: application/json" Where junk is the name of my schema. You could call this REST API from NiFi, a DevOps tool or just a simple CURL like listed above. Knox and other security may apply.

Using Cloudera Data Platform with Flow Management and Streams on Azure

Using Cloudera Data Platform with Flow Management and Streams on Azure Today I am going to be walking you through using Cloudera Data Platform (CDP) with Flow Management and Streams on Azure Cloud.  To see a streaming demo video, please join my webinar (or see it on demand) at  Streaming Data Pipelines with CDF in Azure .  I'll share some additional how-to videos on using Apache NiFi and Apache Kafka in Azure very soon.    Apache NiFi on Azure CDP Data Hub Sensors to ADLS/HDFS and Kafka In the above process group we are using QueryRecord to segment JSON records and only pick ones where the Temperature in Fahrenheit is over 80 degrees then we pick out a few attributes to display from the record and send them to a slack channel. To become a Kafka Producer you set a Record Reader for the type coming in, this is JSON in my case and then set a Record Writer for the type to send to the  sensors  topic.    In this case we kept it as JSON, but we could convert to AVRO.   I usually do that

No More Spaghetti Flows

Spaghetti Flows You may have heard of:   https://en.wikipedia.org/wiki/Spaghetti_code .   For Apache NiFi, I have seen some (and have done some of them in the past), I call them Spaghetti Flows. Let's avoid them.   When you are first building a flow it often meanders and has lots of extra steps and extra UpdateAttributes and random routes. This applies if you are running on-premise, in CDP or in other stateful NiFi clusters (or single nodes). The following video from Mark Payne is a must watch before you write any NiFi flows. Apache NiFi Anti-Patterns with Mark Payne https://www.youtube.com/watch?v=RjWstt7nRVY https://www.youtube.com/watch?v=v1CoQk730qs https://www.youtube.com/watch?v=JbUjYr6Kd3I https://github.com/tspannhw/EverythingApacheNiFi  Do Not: Do not Put 1,000 Flows on one workspace. If your flow has hundreds of steps, this is a Flow Smell.   Investigate why. Do not Use ExecuteProcess, ExecuteScripts or a lot of Groovy scripts as a default, look for existing processor

Streaming Data with Cloudera Data Flow (CDF) Into Public Cloud (CDP)

Streaming Data with Cloudera Data Flow (CDF) Into Public Cloud (CDP) At Cloudera Now NYC, I showed a demo on streaming data from MQTT Sensors and Twitter that was running in AWS.   Today I am going to walk you through some of the details and give you the tools to build your own streaming demos in CDP Public Cloud.   If you missed that event, you can watch a recording here . Let's get streaming! Let's login, I use Okta for Single-Sign On (SSO) which makes this so easy.  Cloudera Flow Management (CFM) Apache NiFi is officially available in the CDP Public Cloud.   So get started here .   We will be following the guide ( https://docs.cloudera.com/cdf-datahub/7.1.0/howto-data-ingest.html ).   We are running CDF DataHub on CDP 7.1.0. There's a lot of data engineering and streaming tasks I can accomplish with few clicks.   I can bring up a virtual datawarehouse and use tools like Apache Hue and Data Analytics Studio to examine database and tables and run quer