Let's Query Kafka with Hive

May 19, 2020

Let's Query Kafka with Hive

Let's Query Kafka with Hive

I can hop into beeline and build an external Hive table to access my Cloudera CDF Kafka cluster whether it is in the public cloud in CDP DataHub, on-premise in HDF or CDF or in CDP-DC.

I just have to set my KafkaStorageHandler, Kafka Topic Name and my bootstrap servers (usually port 9092). Now I can use that table to do ELT/ELT for populating Hive tables or populating Kafka topics from Hive tables. This is a nice and easy way to do data engineering on the quick and easy.

This is a good item to augment CDP Data Engineering with Spark, CDP DataHub with NiFi, CDP DataHub with Kafka and KafkaStreams and various SQOOP or Python utilities you may have in your environment.

For real-time continuous queries on Kafka with SQL, you can use Flink SQL. https://www.datainmotion.dev/2020/05/flank-low-code-streaming-populating.html

Example Table Create

CREATE EXTERNAL TABLE <tableName>

(`uuid` STRING, `systemtime` STRING , `temperaturef` STRING , `pressure` DOUBLE,`humidity` DOUBLE, `lux` DOUBLE, `proximity` int, `oxidising` DOUBLE , `reducing` DOUBLE, `nh3` DOUBLE , `gasko` STRING,`current` INT, `voltage` INT ,`power` INT, `total` INT,`fanstatus` STRING)

STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'

TBLPROPERTIES

("kafka.topic" = "<TopicName>",

"kafka.bootstrap.servers"="<ServerName>:9092");

show tables;

describe extended kafka_table;

select *

from kafka_table;

I can browse my Kafka topics with Cloudera SMM to see what the data is and why I want to load or need to load.

For more information take a look at the documentation for Integrating Hive and Kafka at Cloudera below:

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/integrating-hive/content/hive_kafka_query_table.html
https://docs.cloudera.com/runtime/7.0.3/integrating-hive-and-bi/topics/hive-kafka-integration.html
https://docs.cloudera.com/runtime/7.1.0/integrating-hive-and-bi/topics/hive-kafka-integration.html
https://docs.cloudera.com/runtime/7.1.0/integrating-hive-and-bi/topics/hive_ingest_kafka_data_into_hive.html

Search This Blog

Data In Motion

Let's Query Kafka with Hive

Popular Posts

Integration Testing for Apache NiFi Development

Apache NiFi Operations and Monitoring 101

Ingesting Drone Data From DJII Ryze Tello Drones Part 1 - Setup and Practice