Mastering Data Streaming Pipelines

FLaNK Stack 04 March 2024

04-March-2024

image

FLaNK Stack Weekly

Tim Spann @PaaSDev

https://pebble.is/PaaSDev

https://vimeo.com/flankstack

https://www.youtube.com/@FLaNK-Stack

https://www.threads.net/@tspannhw

https://medium.com/@tspann/subscribe

https://www.cloudera.com/campaign/apache-nifi-for-dummies.html

https://ossinsight.io/analyze/tspannhw

CODE + COMMUNITY

Please join my meetup group NJ/NYC/Philly/Virtual.

http://www.meetup.com/futureofdata-princeton/

https://www.meetup.com/futureofdata-newyork/

https://www.meetup.com/futureofdata-philadelphia/

image

**This is Issue #127 **

https://github.com/tspannhw/FLiPStackWeekly

https://www.cloudera.com/solutions/dim-developer.html

Project Updates

Apache Kafka 3.7.0

https://kafka.apache.org/blog#apache_kafka_370_release_announcement

Courses

https://www.youtube.com/watch?v=mEsleV16qdo&ab_channel=freeCodeCamp.org

Articles

Yet another Python Processor https://medium.com/@tspann/yet-another-python-processor-45aaae6fe406

Streaming Street Cams to YoLo v8 with Python and NiFi to MinIO (S3) https://medium.com/@tspann/streaming-street-cams-to-yolo-v8-with-python-and-nifi-to-minio-s3-3277e73723ce

Meetup Report 28 Feb 2024 https://medium.com/@tspann/report-28-feb-2024-building-realtime-ai-applications-with-apache-flink-76edb957b996

Using OLLAMA with Mistral and Apache NiFi https://medium.com/@tspann/using-ollama-with-mistral-and-apache-nifi-720c17f5ff12

Python to Apache Iceberg https://medium.com/@tspann/python-to-apache-iceberg-s-5d642e1170ae https://www.youtube.com/watch?v=pRTNQ2Ddu88

Using Google Gemma https://medium.com/@tspann/google-gemma-for-real-time-lightweight-open-llm-inference-88efe98e580f

NYC Traffic?? (NiFi, Kafka, Flink) https://medium.com/@tspann/nyc-traffic-are-you-kidding-me-6d3fa853903b

Subways and Transit Updates in Real-Time https://medium.com/@tspann/subways-and-transit-updates-in-real-time-30c104c359ef

Open Source Data Infrastructure Meetup - Feb 2024 https://medium.com/@tspann/open-source-data-infrastructure-meetup-feb-2024-9e8048666828

https://towardsdatascience.com/all-public-transport-leads-to-utrecht-not-rome-bb9674600e81

https://datavolo.io/2024/02/collecting-logs-with-apache-nifi-and-opentelemetry/

https://zilliz.com/learn/milvus-vector-database-quickstart

https://exceptionfactory.com/posts/2024/02/26/building-opentelemetry-collection-in-apache-nifi-with-netty/

https://echarts.apache.org/handbook/en/get-started/

https://www.decodable.co/blog/flink-sql-and-the-joy-of-jars?

https://techcrunch.com/2024/02/28/diffusion-transformers-are-the-key-behind-openais-sora-and-theyre-set-to-upend-genai/

https://www-bleepingcomputer-com.cdn.ampproject.org/c/s/www.bleepingcomputer.com/news/security/malicious-ai-models-on-hugging-face-backdoor-users-machines/amp/

https://www.philschmid.de/dpo-align-llms-in-2024-with-trl?

https://www.infoq.com/articles/architecting-java-persistence-patterns-and-strategies/

https://docs.cloudera.com/cdp-public-cloud-preview-features/cloud/dw-hue-sql-ai-assistant/dw-hue-sql-ai-assistant.pdf

https://medium.com/@yogi_r/relationship-graphs-using-llm-with-retrieval-augmented-generation-rag-and-vector-database-d3f12c914ade

https://news.samsung.com/global/samsungs-new-microsd-cards-bring-high-performance-and-capacity-for-the-new-era-in-mobile-computing-and-on-device-ai?cid=sem-mktg-pfs-mob-us-other-na-01312024-141981-

https://gonzoml.substack.com/p/big-post-about-big-context

https://ben11kehoe.medium.com/the-end-of-programming-will-look-a-lot-like-programming-8b877c8efef8

https://apiiro.com/blog/malicious-code-campaign-github-repo-confusion-attack/

https://vickiboykis.com/2024/02/28/gguf-the-long-way-around/

https://newsroom.ibm.com/2024-02-29-IBM-Announces-Availability-of-Open-Source-Mistral-AI-Model-on-watsonx,-Expands-Model-Choice-to-Help-Enterprises-Scale-AI-with-Trust-and-Flexibility

https://thenewstack.io/the-new-monitoring-for-services-that-feed-from-llms/?

https://nagarajtantri.medium.com/chaining-multiple-http-apis-via-apache-nifi-72c4d14c072d

https://webchick.hashnode.dev/no-one-gives-a-bleep-about-your-devrel-community-programs-and-what-to-do-about-it-2-collaboration

https://webchick.tech/no-one-gives-a-bleep-about-your-devrel-community-programs-and-what-to-do-about-it-1-organizational-alignment

Videos

Streaming Traffic Cameras https://www.youtube.com/watch?v=85ECRGJBEQU&ab_channel=DatainMotion-HowToBeaStreamingEngineer

Joining Three Kafka Topics in Flink SQL https://youtu.be/NI2n7uQJiP0?si=0aAFrkhOdqzZKisw

Continuous SQL with Kafka and Flink https://www.youtube.com/watch?v=0Fb8ggZlPrQ&ab_channel=stevecantrell

Building Real-time Pipelines: A Case Study by Transit Data https://www.youtube.com/watch?v=VjmC4J7KZgw&t=2s&ab_channel=Aiven

https://www.youtube.com/watch?v=29JnbO6LL6g

https://www.youtube.com/watch?v=0cdGwP3Shxs

https://www.youtube.com/watch?v=H7uUDLo_XI0

Feb 22, 2024 NYC Meetup

https://www.slideshare.net/slideshows/2024-feb-ai-meetup-nyc-genaillmsmldata-codeless-generative-ai-pipelines/266444687

Feb 28, 2024 NYC Flink Meetup

https://www.slideshare.net/slideshows/2024-february-28-nyc-meetup-unlocking-financial-data-with-realtime-pipelines/266539528

Feb 29, 2024 Conf42 Python 2024

https://www.slideshare.net/slideshows/conf42python-using-apache-nifi-apache-kafka-risingwave-and-apache-iceberg-with-stock-data-and-llm/266521940

https://www.slideshare.net/slideshows/conf42pythonbuilding-apache-nifi-20-python-processors/266522007

https://www.youtube.com/watch?v=awxzG7laWx4&ab_channel=Conf42

https://www.youtube.com/watch?v=FD16_oZ65Ug&ab_channel=Conf42

Events

March 11, 2024: Princeton. Meetup. GenAI. https://www.meetup.com/applied-generative-artificial-intelligence-applications/ https://23orchard.com/

March 15, 2024: TCF Pro. Princeton, NJ. IT Professional Conference at Trenton Computer Festival IEEE Information Technology Professional Conference on Friday, March 15th, 2024 https://princetonacm.acm.org/tcfpro/

March 27, 2024: Startup Grind. Jersey City https://www.startupgrind.com/events/details/startup-grind-princeton-presents-startup-grind-princeton-amp-nj-big-data-alliance-generative-ai-reverse-pitch/

March 28, 2024: Pinot + NiFi + Flink + Kafka Meetup NYC https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/

April 2024: XtremeJ 2024. Virtual. https://xtremej.dev/2023/schedule/

April 8-11, 2024: NLIT Summit. Seattle. https://www.fbcinc.com/e/nlit/default.aspx image

April 11, 2024: Conf42 LLM. Virtual. https://www.conf42.com/llms2024

April 2024: AI Meetup NJ https://www.meetup.com/nj-gai/

May 8-9, 2024: Data Summit 2024. Boston, MA. https://www.dbta.com/DataSummit/2024/default.aspx

Cloudera Events https://www.cloudera.com/about/events.html

More Events: https://www.linkedin.com/pulse/schedule-2024-tim-spann--y4coe

Code

Models

Tools

Notable Tools

Commands Du Jour

© 2020-2024 Tim Spann

Streaming Cameras with YOLOv8

 Apache NiFi, Python, YoLoV8, MinIO, S3, Images, Cameras, New York City

We can add a very easy to run Ultralytics YOLO v8 to hit against ingested camera’s from New York City. As you can see the code is really simple, we just need to load the pretrained model and call predict with some parameters and the image.

from ultralytics import YOLO
import sys
import io

import shutil
shutil.rmtree('runs/detect')

# Load a model
model = YOLO('yolov8n.pt') # pretrained YOLOv8n model

source = sys.argv[1]

results = model.predict(source, stream=False, save=True, imgsz=320, conf=0.5)

for r in results:
print(r.tojson())

Output

[
{
"name": "car",
"class": 2,
"confidence": 0.5163618922233582,
"box": {
"x1": 188.54917907714844,
"y1": 141.74185180664062,
"x2": 204.51304626464844,
"y2": 154.35519409179688
}
}
]
YOLOv9 added annotation

NiFi Flow

NiFi Detailed Steps

We invoke an HTTP URL from NY Open Data to get a list of all URLs. We send the metadata to a Kafka topic. We call the webcam URL to get the image. We save it to MinIO. We then save it local in a temporary file to get it analyzed by YOLOv8. We then retrieve the augmented image and send it to Slack.

Execute Shell Script Passing Argument to Python 3

META DATA

{
"Latitude" : 41.51472,
"Longitude" : -74.0733,
"ID" : "Skyline-9873",
"Name" : "I-87 MP 060.40 NB Just North of Interchange 17 (Newburgh/I-84)",
"DirectionOfTravel" : "Northbound",
"RoadwayName" : "I-87 - NYS Thruway",
"Url" : "https://511ny.org/map/Cctv/9873--43",
"VideoUrl" : "https://s58.nysdot.skyvdn.com:443/rtplive/TA_046/playlist.m3u8",
"Disabled" : false,
"Blocked" : false
}

RESOURCES