Skip to main content

Flink SQL Preview

FLaNK:  Flink SQL Preview

From our Web Flink Dashboard, we can see how our insert is doing and view the joins and records passing quickly through our tiny cluster.

As part of the May 7th, 2020 Virtual Meetup, I was doing some work with Flink SQL to show for a quick demo as the introduction to the meetup and I found out how easy it was to do some cool stuff.   This was inspired by my Streaming Hero, Abdelkrim, who wrote this amazing article on Flink SQL use cases:

As part of our time series meetup, I have a few streams of data coming from one device from a MiNiFi Java agent to NiFi for some transformation, routing and processing and then sent to Apache Flink for final processing.   I decided to join Kafka topics with Flink SQL.   

Let's create Flink Tables:

This table will be used to insert the joined events from both source Kafka topics.

CREATE TABLE global_sensor_events (
 uuid STRING, 
systemtime STRING ,  
temperaturef STRING , 
pressure DOUBLE, 
humidity DOUBLE, 
lux DOUBLE, 
proximity int, 
oxidising DOUBLE , 
reducing DOUBLE, 
nh3 DOUBLE , 
gasko STRING,
`current` INT, 
voltage INT ,
`power` INT,
`total` INT,
fanstatus STRING
) WITH (
'connector.type'    = 'kafka',
'connector.version' = 'universal',
'connector.topic'    = 'global_sensor_events',
'connector.startup-mode' = 'earliest-offset',
'' = '',
'' = 'flink-sql-global-sensor_join',
'format.type' = 'json'

This table will hold Kafka topic messages from our energy reader.

uuid STRING, 
systemtime STRING,  
        `current` INT, 
voltage INT, 
`power` INT, 
`total` INT, 
swver STRING, 
hwver STRING,
type STRING, 
model STRING, 
mac STRING, 
deviceId STRING, 
alias STRING, 
devname STRING, 
iconhash STRING, 
relaystate INT, 
ontime INT, 
activemode STRING, 
feature STRING, 
updating INT, 
rssi INT, 
ledoff INT, 
latitude INT, 
longitude INT, 
`day` INT, 
`index` INT, 
zonestr STRING, 
tzstr STRING, 
dstoffset INT, 
host STRING, 
currentconsumption INT, 
devicetime STRING, 
ledon STRING, 
fanstatus STRING, 
`end` STRING, 
cpu INT, 
memory INT, 
diskusage STRING
) WITH (
'connector.type'    = 'kafka',
'connector.version' = 'universal',
'connector.topic'    = 'energy',
'connector.startup-mode' = 'earliest-offset',
'' = '',
'' = 'flink-sql-energy-consumer',
'format.type' = 'json'

The scada table holds events from our sensors.

uuid STRING, 
systemtime STRING,  
amplitude100 DOUBLE, 
        amplitude500 DOUBLE, 
amplitude1000 DOUBLE, 
lownoise DOUBLE, 
midnoise DOUBLE,
        highnoise DOUBLE, 
amps DOUBLE, 
ipaddress STRING, 
host STRING, 
host_name STRING,
        macaddress STRING, 
endtime STRING, 
runtime STRING, 
starttime STRING, 
        cpu DOUBLE, 
cpu_temp STRING, 
diskusage STRING, 
memory DOUBLE, 
temperature STRING, 
adjtemp STRING, 
adjtempf STRING, 
temperaturef STRING, 
pressure DOUBLE, 
humidity DOUBLE, 
lux DOUBLE, 
proximity INT, 
oxidising DOUBLE, 
reducing DOUBLE, 
nh3 DOUBLE, 
gasko STRING
) WITH (
'connector.type'    = 'kafka',
'connector.version' = 'universal',
'connector.topic'    = 'scada',
'connector.startup-mode' = 'earliest-offset',
'' = '',
'' = 'flink-sql-scada-consumer',
'format.type' = 'json'

This is the magic part:

INSERT INTO global_sensor_events 
scada.systemtime ,  
scada.temperaturef , 
scada.pressure , 
scada.humidity , 
scada.lux , 
scada.proximity , 
scada.oxidising , 
scada.reducing , 
scada.nh3 , 
energy.voltage ,
energy.`power` ,

FROM energy,
    scada.systemtime = energy.systemtime;

So we join two Kafka topics and use some of their fields to populate a third Kafka topic that we defined above.

With Cloudera, it is so easy to monitor our streaming Kafka events with SMM.

For context, this is where the data comes from:

Popular posts from this blog

Ingesting Drone Data From DJII Ryze Tello Drones Part 1 - Setup and Practice

Ingesting Drone Data From DJII Ryze Tello Drones Part 1 - Setup and Practice In Part 1, we will setup our drone, our communication environment, capture the data and do initial analysis. We will eventually grab live video stream for object detection, real-time flight control and real-time data ingest of photos, videos and sensor readings. We will have Apache NiFi react to live situations facing the drone and have it issue flight commands via UDP. In this initial section, we will control the drone with Python which can be triggered by NiFi. Apache NiFi will ingest log data that is stored as CSV files on a NiFi node connected to the drone's WiFi. This will eventually move to a dedicated embedded device running MiniFi. This is a small personal drone with less than 13 minutes of flight time per battery. This is not a commercial drone, but gives you an idea of the what you can do with drones. Drone Live Communications for Sensor Readings and Drone Control You must connect t

NiFi on Cloudera Data Platform Upgrade - April 2021

CFM 2.1.1 on CDP 7.1.6 There is a new Cloudera release of Apache NiFi now with SAML support. Apache NiFi Apache NiFi Registry See:   For changes: Get your download on: To start researching for the future, take a look at some of the technical preview features around Easy Rules engine and handlers. Make sure you use the latest possible JDK 8 as there are some bugs out there.   Use a recent v

Using Apache NiFi in OpenShift and Anywhere Else to Act as Your Global Integration Gateway

Using Apache NiFi in OpenShift and Anywhere Else to Act as Your Global Integration Gateway What does it look like? Where Can I Run This Magic Engine: Private Cloud, Public Cloud, Hybrid Cloud, VM, Bare Metal, Single Node, Laptop, Raspberry Pi or anywhere you have a 1GB of RAM and some CPU is a good place to run a powerful graphical integration and dataflow engine.   You can also run MiNiFi C++ or Java agents if you want it even smaller. Sounds Too Powerful and Expensive: Apache NiFi is Open Source and can be run freely anywhere. For What Use Cases: Microservices, Images, Deep Learning and Machine Learning Models, Structured Data, Unstructured Data, NLP, Sentiment Analysis, Semistructured Data, Hive, Hadoop, MongoDB, ElasticSearch, SOLR, ETL/ELT, MySQL CDC, MySQL Insert/Update/Delete/Query, Hosting Unlimited REST Services, Interactive with Websockets, Ingesting Any REST API, Natively Converting JSON/XML/CSV/TSV/Logs/Avro/Parquet, Excel, PDF, Word Documents, Syslog, Kafka, JMS, MQTT, TCP