Cloudera SQL Stream Builder (SSB) - Update Your FLaNK Stack

Cloudera SQL Stream Builder (SSB) Released!

CSA 1.3.0 is now available with Apache Flink 1.12 and SQL Stream Builder!   Check out this white paper for some details.    You can get full details on the Stream Processing and Analytics available from Cloudera here.


This is awesome way to query Kafka topics with continuous SQL that is deployed to scalable Flink nodes in YARN or K8.   We can also easily define functions in JavaScript to enhance, enrich and augment our data streams.   No Java to write, no heavy deploys or build scripts, we can build, test and deploy these advanced streaming applications all from your secure browser interface.


References:



Example Queries:


SELECT location, max(temp_f) as max_temp_f, avg(temp_f) as avg_temp_f,
                 min(temp_f) as min_temp_f
FROM weather2 
GROUP BY location


SELECT HOP_END(eventTimestamp, INTERVAL '1' SECOND, INTERVAL '30' SECOND) as        windowEnd,
       count(`close`) as closeCount,
       sum(cast(`close` as float)) as closeSum, avg(cast(`close` as float)) as closeAverage,
       min(`close`) as closeMin,
       max(`close`) as closeMax,
       sum(case when `close` > 14 then 1 else 0 end) as stockGreaterThan14 
FROM stocksraw
WHERE symbol = 'CLDR'
GROUP BY HOP(eventTimestamp, INTERVAL '1' SECOND, INTERVAL '30' SECOND)
                                                         

SELECT scada2.uuid, scada2.systemtime, scada2.temperaturef, scada2.pressure, scada2.humidity, scada2.lux, scada2.proximity, 
scada2.oxidising,scada2.reducing , scada2.nh3, scada2.gasko,energy2.`current`,                   
energy2.voltage,energy2.`power`,energy2.`total`,energy2.fanstatus
FROM energy2 JOIN scada2 ON energy2.systemtime = scada2.systemtime
                                                 


SELECT symbol, uuid, ts, dt, `open`, `close`, high, volume, `low`, `datetime`, 'new-high' message, 
'nh' alertcode, CAST(CURRENT_TIMESTAMP AS BIGINT) alerttime 
FROM stocksraw st 
WHERE symbol is not null 
AND symbol <> 'null' 
AND trim(symbol) <> '' and 
CAST(`close` as DOUBLE) > 
(SELECT MAX(CAST(`close` as DOUBLE))
FROM stocksraw s 
WHERE s.symbol = st.symbol);



SELECT  * 
FROM statusevents
WHERE lower(description) like '%fail%'



SELECT
  sensor_id as device_id,
  HOP_END(sensor_ts, INTERVAL '1' SECOND, INTERVAL '30' SECOND) as windowEnd,
  count(*) as sensorCount,
  sum(sensor_6) as sensorSum,
  avg(cast(sensor_6 as float)) as sensorAverage,
  min(sensor_6) as sensorMin,
  max(sensor_6) as sensorMax,
  sum(case when sensor_6 > 70 then 1 else 0 end) as sensorGreaterThan60
FROM iot_enriched_source
GROUP BY
  sensor_id,
  HOP(sensor_ts, INTERVAL '1' SECOND, INTERVAL '30' SECOND)



SELECT title, description, pubDate, `point`, `uuid`, `ts`, eventTimestamp
FROM transcomevents


Source Code:



Example SQL Stream Builder Run

We login then build our Kafka data source(s), unless they were predefined.

Next we build a few virtual table sources for Kafka topics we are going to read from.   If they are JSON we can let SSB determine the schema for us.   Or we can connect to the Cloudera Schema Registry for it to determine the schema for AVRO data.

We can then define virtual table syncs to Kafka or webhooks.

We then run a SQL query with some easy to determine parameters and if we like the results we can create a materialized view.

























































Technical Preview - Cloudera DataFlow

Technical Preview - Cloudera DataFlow


Import Flows Build in CDP Datahub Flow Management

https://docs.cloudera.com/dataflow/cloud/quick-start/topics/cdf-qs-definition.html


Deploy Flows

https://docs.cloudera.com/dataflow/cloud/quick-start/topics/cdf-qs-deploy.html

Import a Quick Flow

https://docs.cloudera.com/dataflow/cloud/qs-flow-definitions/topics/cdf-import-quick-flow.html

Monitoring




Top DataFlow Resources


Example Flows





How about Some Free Cloud Training?

How about Some Free Cloud Training?


 

CDP Private Cloud Fundamentals



CDP Private Cloud Fundamentals

Cloudera, IBM, and Red Hat

Our CDP Private Cloud Fundamentals OnDemand course provides a solid introduction to CDP Private Cloud. In addition to learning what CDP Private Cloud is and how it fits into the Enterprise Data Cloud vision, you'll find out about its architecture and how it uses cloud-native design elements such as containerization in order to overcome limitations of the traditional bare metal cluster architecture. Following a summary of the system requirements, the course concludes with a demonstration of a CDP Private Cloud installation.


https://www.cloudera.com/about/training/courses/cloudera-ibm-redhat-cdp-pvc-fundamentals.html


Cloudera Essentials for CDP




Introduction to Cloudera Manager

https://www.cloudera.com/about/training/courses/introduction-to-cloudera-manager.html


Introduction to Cloudera Data Warehouse: Self-Service Analytics in the Cloud with CDP




Introduction to Cloudera Machine Learning




Introduction to Apache Impala
'


Enriching Data with Pyspark









Introduction to Hive




Demo Jam Build a Flow with Apache NiFi



Semantic Analysis



Apache Yunikorn



Introduction to Apache Ozone



Importing Data into the Cloud with Apache NiFi






Processing Fixed Width and Complex Files

Processing Fixed Width and Complex Files


Pointers

The first decision you will have to make is if it's structured at all.   If it is a known type like CSV, JSON, AVRO, XML or Parquet then just use a record.

If it's semi-structured like a log file, GrokReader may work or ExtractGrok.

If it's like CSV, you may be able to tweak the CSV reader to work (say header or no header) or try one of the two CSV parsers NiFi has (Jackson or Apache Commons).    

If it's a format like PDF, Word, Excel, RTF or something like that, I have a custom processor that uses Apache Tika and that should be able to parse it into text.   Once it is text you can probably work with it.


Examples



Documentation




Processors To Use For File Manipulation

  • AttributesToCSV
  • AttributesToJSON
  • ConvertExcelToCSVProcessor 
  • ConvertRecord
  • ConvertText
  • CSVReader
  • EvaluateJSONPath
  • EvaluateXPath
  • EvaluateXQuery
  • ExecuteScript
  • ExecuteStreamCommand
  • ExtractGrok
  • ExtractText
  • FlattenJson
  • ForkRecord
  • GrokReader
  • JsonPathReader
  • JsonTreeReader
  • JoltTransformJSON
  • JoltTransformRecord
  • LookupAttribute
  • LookupRecord
  • MergeContent
  • MergeRecord
  • ModifyBytes
  • ParseSyslog*
  • PartitionRecord
  • QueryRecord
  • ReaderLookup
  • ReplaceText
  • ReplaceTextWithMapping
  • ScriptedReader
  • ScriptedRecordSink
  • ScriptedTransformRecord
  • SegmentContent
  • SplitContent
  • SplitJson
  • SplitRecord
  • SplitText
  • SplitXml
  • SyslogReader
  • TransformXml
  • UnpackContent
  • UpdateAttribute
  • UpdateRecord
  • ValidCsv
  • ValidateRecord
  • ValidateXml

Custom Processors

Helper Projects, SDK, Libraries and Services




Price Comparisons Using Retail REST APIs with Apache NiFi, Kafka and Flink SQL

 Price Comparisons Using Retail REST APIs with Apache NiFi, Kafka and Flink SQL


Part 1:   NiFi Rest
Part 2:   Kafka - Flink SQL
Part 3:  Cloudera Visual Apps
Part 4:   Smart Shelf Updates - MiNiFi Agents








































MiNiFi Agent Update March 2021

 Cloudera Agent Availability

Getting Started

MiNiFi (C++)

Version cpp-0.9.0

Release Date: March 2021

Highlights of 0.9.0 release include:

  • Added support for RocksDB-based content repository for better performance
  • Added SQL extension
  • Improved task scheduling
  • Various C2 improvements
  • Bug fixes and improvements to TailFile, ConsumeWindowsEventLog, MergeContent, CompressContent, PublishKafka, InvokeHTTP
  • Implemented RetryFlowFile and smart handling of loopback connections
  • Added a way to encrypt sensitive config properties and the flow configuration
  • Implemented full S3 support
  • Reduced memory footprint when working with many flow files

Build Notes:

It is advised that you use the bootstrap.sh when not building on windows.


https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes#ReleaseNotes-Versioncpp-0.9.0


Download Now As Source or Pre-Build for Your Platform

https://nifi.apache.org/minifi/download.html