Flink SQL Preview

FLaNK:  Flink SQL Preview







From our Web Flink Dashboard, we can see how our insert is doing and view the joins and records passing quickly through our tiny cluster.










As part of the May 7th, 2020 Virtual Meetup, I was doing some work with Flink SQL to show for a quick demo as the introduction to the meetup and I found out how easy it was to do some cool stuff.   This was inspired by my Streaming Hero, Abdelkrim, who wrote this amazing article on Flink SQL use cases:   https://towardsdatascience.com/event-driven-supply-chain-for-crisis-with-flinksql-be80cb3ad4f9

As part of our time series meetup, I have a few streams of data coming from one device from a MiNiFi Java agent to NiFi for some transformation, routing and processing and then sent to Apache Flink for final processing.   I decided to join Kafka topics with Flink SQL.   


Let's create Flink Tables:

This table will be used to insert the joined events from both source Kafka topics.

CREATE TABLE global_sensor_events (
 uuid STRING, 
systemtime STRING ,  
temperaturef STRING , 
pressure DOUBLE, 
humidity DOUBLE, 
lux DOUBLE, 
proximity int, 
oxidising DOUBLE , 
reducing DOUBLE, 
nh3 DOUBLE , 
gasko STRING,
`current` INT, 
voltage INT ,
`power` INT,
`total` INT,
fanstatus STRING
) WITH (
'connector.type'    = 'kafka',
'connector.version' = 'universal',
'connector.topic'    = 'global_sensor_events',
'connector.startup-mode' = 'earliest-offset',
'connector.properties.bootstrap.servers' = 'tspann-princeton0-cluster-0.general.fuse.l42.cloudera.com:9092',
'connector.properties.group.id' = 'flink-sql-global-sensor_join',
'format.type' = 'json'
);


This table will hold Kafka topic messages from our energy reader.

CREATE TABLE energy (
uuid STRING, 
systemtime STRING,  
        `current` INT, 
voltage INT, 
`power` INT, 
`total` INT, 
swver STRING, 
hwver STRING,
type STRING, 
model STRING, 
mac STRING, 
deviceId STRING, 
hwId STRING, 
fwId STRING, 
oemId STRING,
alias STRING, 
devname STRING, 
iconhash STRING, 
relaystate INT, 
ontime INT, 
activemode STRING, 
feature STRING, 
updating INT, 
rssi INT, 
ledoff INT, 
latitude INT, 
longitude INT, 
`day` INT, 
`index` INT, 
zonestr STRING, 
tzstr STRING, 
dstoffset INT, 
host STRING, 
currentconsumption INT, 
devicetime STRING, 
ledon STRING, 
fanstatus STRING, 
`end` STRING, 
te STRING, 
cpu INT, 
memory INT, 
diskusage STRING
) WITH (
'connector.type'    = 'kafka',
'connector.version' = 'universal',
'connector.topic'    = 'energy',
'connector.startup-mode' = 'earliest-offset',
'connector.properties.bootstrap.servers' = 'tspann-princeton0-cluster-0.general.fuse.l42.cloudera.com:9092',
'connector.properties.group.id' = 'flink-sql-energy-consumer',
'format.type' = 'json'
);


The scada table holds events from our sensors.

CREATE TABLE scada (
uuid STRING, 
systemtime STRING,  
amplitude100 DOUBLE, 
        amplitude500 DOUBLE, 
amplitude1000 DOUBLE, 
lownoise DOUBLE, 
midnoise DOUBLE,
        highnoise DOUBLE, 
amps DOUBLE, 
ipaddress STRING, 
host STRING, 
host_name STRING,
        macaddress STRING, 
endtime STRING, 
runtime STRING, 
starttime STRING, 
        cpu DOUBLE, 
cpu_temp STRING, 
diskusage STRING, 
memory DOUBLE, 
id STRING, 
temperature STRING, 
adjtemp STRING, 
adjtempf STRING, 
temperaturef STRING, 
pressure DOUBLE, 
humidity DOUBLE, 
lux DOUBLE, 
proximity INT, 
oxidising DOUBLE, 
reducing DOUBLE, 
nh3 DOUBLE, 
gasko STRING
) WITH (
'connector.type'    = 'kafka',
'connector.version' = 'universal',
'connector.topic'    = 'scada',
'connector.startup-mode' = 'earliest-offset',
'connector.properties.bootstrap.servers' = 'tspann-princeton0-cluster-0.general.fuse.l42.cloudera.com:9092',
'connector.properties.group.id' = 'flink-sql-scada-consumer',
'format.type' = 'json'
);


This is the magic part:

INSERT INTO global_sensor_events 
SELECT 
scada.uuid, 
scada.systemtime ,  
scada.temperaturef , 
scada.pressure , 
scada.humidity , 
scada.lux , 
scada.proximity , 
scada.oxidising , 
scada.reducing , 
scada.nh3 , 
scada.gasko,
energy.`current`, 
energy.voltage ,
energy.`power` ,
energy.`total`,
energy.fanstatus

FROM energy,
     scada
WHERE
    scada.systemtime = energy.systemtime;

So we join two Kafka topics and use some of their fields to populate a third Kafka topic that we defined above.

With Cloudera, it is so easy to monitor our streaming Kafka events with SMM.


For context, this is where the data comes from: