Skip to main content

Posts

Technical Preview - Cloudera DataFlow

Technical Preview - Cloudera DataFlow Import Flows Build in CDP Datahub Flow Management https://docs.cloudera.com/dataflow/cloud/quick-start/topics/cdf-qs-definition.html Deploy Flows https://docs.cloudera.com/dataflow/cloud/quick-start/topics/cdf-qs-deploy.html Import a Quick Flow https://docs.cloudera.com/dataflow/cloud/qs-flow-definitions/topics/cdf-import-quick-flow.html https://docs.cloudera.com/dataflow/cloud/qs-flow-definitions/topics/cdf-qf-kafka-kafka.html https://docs.cloudera.com/dataflow/cloud/qs-flow-definitions/topics/cdf-qf-kafka-filter-kafka.html https://docs.cloudera.com/dataflow/cloud/qs-flow-definitions/topics/cdf-qf-kafka-to-s3-parquet.html Monitoring https://docs.cloudera.com/dataflow/cloud/kpi-overview/topics/cdf-introduction-to-kpi.html Top DataFlow Resources https://docs.cloudera.com/dataflow/cloud/index.html https://docs.cloudera.com/dataflow/cloud/overview/topics/cdf-overview.html https://docs.cloudera.com/dataflow/cloud/overview/topics/cdf-architecture.html h

How about Some Free Cloud Training?

How about Some Free Cloud Training?   CDP Private Cloud Fundamentals https://www.cloudera.com/about/training/courses/cdp-private-cloud-fundamentals.html CDP Private Cloud Fundamentals Cloudera, IBM, and Red Hat Our CDP Private Cloud Fundamentals OnDemand course provides a solid introduction to CDP Private Cloud. In addition to learning what CDP Private Cloud is and how it fits into the Enterprise Data Cloud vision, you'll find out about its architecture and how it uses cloud-native design elements such as containerization in order to overcome limitations of the traditional bare metal cluster architecture. Following a summary of the system requirements, the course concludes with a demonstration of a CDP Private Cloud installation. https://www.cloudera.com/about/training/courses/cloudera-ibm-redhat-cdp-pvc-fundamentals.html Cloudera Essentials for CDP https://www.cloudera.com/about/training/courses/cloudera-essentials-for-cdp.html Introduction to Cloudera Manager https://www.cloude

Processing Fixed Width and Complex Files

Processing Fixed Width and Complex Files Pointers The first decision you will have to make is if it's structured at all.   If it is a known type like CSV, JSON, AVRO, XML or Parquet then just use a record. If it's semi-structured like a log file, GrokReader may work or ExtractGrok. If it's like CSV, you may be able to tweak the CSV reader to work (say header or no header) or try one of the two CSV parsers NiFi has (Jackson or Apache Commons).     If it's a format like PDF, Word, Excel, RTF or something like that, I have a custom processor that uses Apache Tika and that should be able to parse it into text.   Once it is text you can probably work with it. Examples https://community.cloudera.com/t5/Support-Questions/How-to-parse-w-fixed-width-instead-of-char-delimited/td-p/102597 https://community.cloudera.com/t5/Support-Questions/Best-way-to-parse-Fixed-width-file-using-Nifi-Kindly-help/m-p/177637 https://community.cloudera.com/t5/Support-Questions/Split-one-Nifi-flo

Price Comparisons Using Retail REST APIs with Apache NiFi, Kafka and Flink SQL

 Price Comparisons Using Retail REST APIs with Apache NiFi, Kafka and Flink SQL Part 1:   NiFi Rest Part 2:   Kafka - Flink SQL Part 3:  Cloudera Visual Apps Part 4:   Smart Shelf Updates - MiNiFi Agents

MiNiFi Agent Update March 2021

 Cloudera Agent Availability https://docs.cloudera.com/cem/1.2.2/release-notes/topics/cem-minifi-cpp-agent-updates.html https://docs.cloudera.com/cem/1.2.2/release-notes/topics/cem-minifi-cpp-download-locations.html Getting Started https://docs.cloudera.com/cem/1.2.2/minifi-agent-quick-start/topics/cem-install-and-start-minifi-cpp.html MiNiFi (C++) Version cpp-0.9.0 Release Date:  1  March 2021 Highlights of 0.9.0 release include: Added support for RocksDB-based content repository for better performance Added SQL extension Improved task scheduling Various C2 improvements Bug fixes and improvements to TailFile, ConsumeWindowsEventLog, MergeContent, CompressContent, PublishKafka, InvokeHTTP Implemented RetryFlowFile and smart handling of loopback connections Added a way to encrypt sensitive config properties and the flow configuration Implemented full S3 support Reduced memory footprint when working with many flow files Build Notes: It is advised that you use the bootstrap.sh when not bu