Seamless Integration: Unleashing the Power of Real-Time Groceries with NiFi, Kafka, Flink and JQuery



Real-Time Retail Grocery Store with Apache MiNiFi, Apache NiFi, Apache Kafka, Apache Flink, Apache Kudu, Apache Ozone, Apache Iceberg, HTML, JQuery, DataTables.


In today’s example, I need to ingest grocery items for some analytics, so let’s read these via a secured REST API. To follow along, you will need to sign up for your own free key to see this interesting data. Who doesn’t want to ingest bananas with NiFi. Maybe I just really need to know about eggs and sugar. A friend of mine Brent wanted me to build this during lockdown to determine the current price of a basket of household goods. I had built it and it sat on the shelf as most of the APIs I wanted were not available. Thanks to the innovative retail work at Kroger, we can get the data at speed with their API. Let’s explore. This is the first in the series, I will cover storage to Ozone (on S3), Iceberg and Kudu. I will also cover In-Store Data Collection, Updating Shelf Pricing in Real-Time (Raspberry Pi with E-Ink) and more retail use cases.

We are ingesting data from Kroger

You will need to sign-up to get your credentials to get this cool data.

We picked a basket of items to watch from the stores.

Some of my basket items: Organic Banana,
Velveeta Original Cheese, Imperial Cane Sugar, Oreo Team USA Chocolate Sandwich Cookies, Kroger® Fat Free Skim Milk,
Land O Lakes® Salted Butter Sticks,
Kroger® 1 lb. Lean Ground Beef Chuck Roll 80/20,
Stouffer’s® Macaroni & Cheese Frozen Meal and
Eggo Homestyle Frozen Waffles.

Apache NiFi Flow Walk Through

We have a few interesting flows for working with Retail data. The first one is ingesting product information from Kroger.

Scheduled every 10 seconds, pick a “random” item from our basket, call the OAuth authentication and use it’s token.
We call that items product detail, split out the record to send different images to a separate topic, for the main flow we build a new pricing object.
Send images URLs to their own itemimage Kafka topic.
Add a schema, validate our record and push it to the item topic.

Kafka Topics Shown in Streams Messaging Manager

Apache Kafka Item Image

Querying Kafka Topic for Items

NiFi Calcite SQL — To Transform and Enrich Item Price Stream

SELECT brandname,category,countryorigin,'${date}' as itemdate,displayimage,
msrp,originstore,cast(COALESCE(price, 0.00) as float) as price,

Flink SQL to Browse Data

An example of querying Apache Kafka topics with Apache Flink SQL via Schema Registry catalog.

select brandname, item, itemdescription, itemsize, 
price, category,
updatedate, longdescrption, displayimage
from `sr1`.`default_database`.`item`

I wish to make this data available for Jupyter notebooks and also HTML pages. I will add some notebooks and Cloudera Data Visualization.

The easiest way to do this is create a materialized view in SQL Stream Builder to make my query results available as JSON over REST.

Once I see the results of my query, I am good to go. Let’s build an HTML view of this data.

Let’s make sure our materialized view is loaded in the raw REST feed.

Let’s use DataTables and JQuery to build a dynamic HTML table view.

All of the code is available in github for you to use with your own data or your own basket to load from Kroger.

The final results here:

ChatGPT gave me some good ideas.

Using the Kroger REST API for groceries with Apache NiFi is a great idea for several reasons. Let’s explore the benefits of combining these two technologies:

Data Integration: Apache NiFi is a powerful data integration tool that enables the seamless flow and transformation of data between various systems. By integrating the Kroger REST API with NiFi, you can easily fetch, process, and distribute grocery-related data from Kroger’s services to your desired destinations.

Real-Time Data: The Kroger REST API provides real-time access to a wide range of grocery-related information, including product details, prices, availability, and more. By leveraging NiFi’s capabilities, you can constantly monitor and retrieve the latest data from Kroger’s API, ensuring that you have up-to-date information at all times.

Scalability: NiFi is designed to handle high volumes of data and can scale horizontally to accommodate increased workloads. This scalability makes it well-suited for processing large quantities of grocery data fetched from the Kroger API. You can configure NiFi to handle parallel processing, data partitioning, and load balancing, ensuring efficient data flow even during peak times.

Data Transformation and Enrichment: NiFi provides a wide range of processors and functions that facilitate data transformation and enrichment. You can use NiFi’s processors to extract specific data from the Kroger API responses, apply transformations, perform calculations, and enrich the data with additional information. This capability allows you to tailor the data from the Kroger API to suit your specific requirements.

Data Quality and Reliability: NiFi offers extensive data quality monitoring and control capabilities. You can implement data validation rules, perform data cleansing operations, and apply data governance practices to ensure the accuracy and reliability of the grocery data received from the Kroger API. NiFi also provides features like provenance tracking, error handling, and data lineage, which help you maintain data integrity throughout the data flow.

Integration with Other Systems: NiFi supports integration with a wide range of systems and platforms, including databases, data lakes, messaging systems, and cloud services. By combining the Kroger REST API with NiFi, you can seamlessly integrate the fetched grocery data with your existing data infrastructure, enabling further analysis, reporting, and integration with downstream systems.

Workflow Orchestration: NiFi allows you to design complex data workflows through its intuitive graphical user interface. You can create workflows that fetch data from the Kroger API, apply various transformations, perform validations, and route the data to different destinations based on predefined rules. This workflow orchestration capability simplifies the data integration process and provides better control over data flow.

By utilizing the Kroger REST API with Apache NiFi, you can leverage the strengths of both technologies to build a robust and scalable grocery data integration pipeline. This combination empowers you to access real-time grocery data, apply transformations, ensure data quality, and integrate with other systems effectively.