Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

FLaNK. Stack Weekly 23 October 2023

 

23-October-2023

img

FLiPN-FLaNK Stack Weekly

Tim Spann @PaaSDev

https://pebble.is/PaaSDev

https://vimeo.com/flankstack

https://www.youtube.com/@FLaNK-Stack

https://www.threads.net/@tspannhw

https://medium.com/@tspann/subscribe

Get your new Apache NiFi for Dummies!

https://www.cloudera.com/campaign/apache-nifi-for-dummies.html

https://ossinsight.io/analyze/tspannhw

CODE + COMMUNITY

Please join my meetup group NJ/NYC/Philly/Virtual.

http://www.meetup.com/futureofdata-princeton/

https://www.meetup.com/futureofdata-newyork/

https://www.meetup.com/futureofdata-philadelphia/

**This is Issue #108 **

https://github.com/tspannhw/FLiPStackWeekly

https://www.linkedin.com/pulse/schedule-2023-tim-spann-/

https://www.cloudera.com/solutions/dim-developer.html

https://www.cloudera.com/products/dataflow/nifi-dataflow-calculator.html?utm_source=twitter&keyplay=data-flow&utm_campaign=FY24-Q2_Content_Globl_Nifi_SS_Tool_Promos&cid=UNGATED&utm_medium=social-organic&pid=11590424099

https://community.cloudera.com/t5/Community-Articles/New-Cloudera-AMP-with-Amazon-Bedrock-Integration-Now/ta-p/377071?utm_medium=social-organic&pid=11547807644

Articles

https://dzone.com/articles/real-time-analytics-1

https://blog.cloudera.com/5-key-takeaways-from-current2023/

https://www.infoq.com/news/2023/10/reddit-rev2

https://blog.cloudera.com/accelerating-cost-reduction-ai-making-an-impact-on-financial-services/

http://funnifi.blogspot.com/2022/12/using-updatedatabaserecord-for-schema.html

https://github.com/colinmorris/pejorative-compounds

https://shamun.dev/posts/cdc

https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance#MigrationGuidance-Migratingto1.23.0

Videos

https://www.youtube.com/watch?v=91jG5HkwCTU&t=1s&pp=ygUOdGltIHNwYW5uIG5pZmk%3D

https://www.youtube.com/watch?v=hPLFltbLcik

https://www.youtube.com/watch?v=PP_b0eJ5UVY

Events

October 26, 2023: Future of Data NYC. Meetup. Hybrid. https://www.meetup.com/futureofdata-newyork/events/295516928/

https://www.meetup.com/real-time-analytics-meetup-ny/events/296510127/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link

October 26, 2023: Cloudera Now EMEA. Virtual. https://www.cloudera.com/about/events/cloudera-now-cdp/emea.html

November 1, 2023: Open Source Finance Forum. Virtual. https://events.linuxfoundation.org/open-source-finance-forum-new-york/

November 1, 2023 3PM EST: AI Dev World. Hybrid https://aidevworld.com/conference/

https://sched.co/1RoAO

November 2, 2023: Evolve. NYC https://www.cloudera.com/about/events/evolve/new-york.html#register

November 2, 2023: Iceberg Meetup https://attend.cloudera.com/icebergforum?utm_medium=social-organic&pid=11580463293

November 7, 2023: XtremeJ 2023. Virtual. https://xtremej.dev/2023/schedule/

November 8, 2023: Flink Forward, Seattle. https://www.flink-forward.org/seattle-2023

November 21, 2023: JCon World. Virtual.

jcon2023

https://sched.co/1RRWm

jcon

November 22, 2023: Big Data Conference. Hybrid
https://bigdataconference.eu/ https://events.pinetool.ai/3079/#sessions/101077

November 23, 2023: Data Science Summit. Hybrid. EU https://dssconf.pl/en/

Cloudera Events https://www.cloudera.com/about/events.html

More Events: https://www.linkedin.com/pulse/schedule-2023-tim-spann-/

Code

Tools

© 2020-2023 Tim Spann

The Rise of the Mega Edge (FLaNK)

At one point edge devices were cheap, low energy and low powered.   They may have some old WiFi and a single core CPU running pretty slow.    Now power, memory, GPUs, custom processors and substantial power has come to the edge.

Sitting on my desk is the NVidia Xaver NX which is the massively powerful machine that can easily be used for edge computing while sporting 8GB of fast RAM, a 384 NVIDIA CUDA® cores and 48 Tensor cores GPU, a 6 core 64-bit ARM CPU and is fast.   This edge device would make a great workstation and is now something that can be affordably deployed in trucks, plants, sensors and other Edge and IoT applications.  


Next that titan device is the inexpensive hobby device, the Raspberry Pi 4 that now sports 8 GB of LPDDR4 RAM, 4 core 64-bit ARM CPU and is speedy!   It can also be augmented with a Google Coral TPU or Intel Movidius 2 Neural Compute Stick.   


These boxes come with fast networking, bluetooth and the modern hardware running in small edge devices that can now deployed en masse.    Enabling edge computing, fast data capture, smart processing and integration with servers and cloud services.    By adding Apache NiFi's subproject MiNiFi C++ and Java agents we can easily integrate these powerful devices into a Streaming Data Pipeline.   We can now build very powerful flows from edge to cloud with Apache NiFi, Apache Flink, Apache Kafka  (FLaNK) and Apache NiFi - MiNiFi.    I can run AI, Deep Learning, Machine Learning including Apache MXNet, DJL, H2O, TensorFlow, Apache OpenNLP and more at any and all parts of my data pipeline.   I can push models to my edge device that now has a powerful GPU/TPU and adequate CPU, networking and RAM to do more than simple classification.    The NVIDIA Jetson Xavier NX will run multiple real-time inference streams at 60 fps on multiple cameras.  

I can run live SQL against these events at every segment of the data pipeline and combine with machine learning, alert checks and flow programming.   It's now easy to build and deploy applications from edge to cloud.

I'll be posting some examples in my next article showing some simple examples.

By next year, 12 or 16 GB of RAM may be a common edge device RAM, perhaps 2 CPUs with 8 cores, multiple GPUs and large fast SSD storage.   My edge swarm may be running much of my computing power as my flows running elastically on public and private cloud scale up and down based on demand in real-time.


Unboxing the Most Amazing Edge AI Device Part 1 of 3 - NVIDIA Jetson Xavier NX

Unboxing the Most Amazing Edge AI Device 

Fast, Intuitive, Powerful and Easy.
Part 1 of 3
NVIDIA Jetson Xavier NX


This is the first of a series on articles on using the Jetson Xavier NX Developer kit for EdgeAI applications.   This will include running various TensorFlow, Pytorch, MXNet and other frameworks.  I will also show how to use this amazing device with Apache projects including the FLaNK Stack of Apache Flink, Apache Kafka, Apache NiFi, Apache MXNet and Apache NiFi - MiNiFi.

These are not words that one would usually use to define AI, Deep Learning, IoT or Edge Devices.    They are now.    There is a new tool for making what was incredibly slow and difficult to something that you can easily get your hands on and develop with.  Supporting running multiple models simultaneously in containers with fast frame rates is not something I thought you could affordably run in robots and IoT devices.    Now it is and this will drive some amazingly smart robots, drones, self-driving machines and applications that are not yet in prototypes.

Out of the box, this machine is sleek, light weight and ready to go.   And now with built-in fast WiFi, yet another great upgrade!   I added a 256GB SSD Hard drive and it took seconds and a few quick Linux commands.   It's running Ubuntu 18.04 LTS which supports all the deep learning and python libraries you need and runs well.     It has a powerful fan already attached and judging by the fast spinning when I was running benchmarks it probably needs it.   










It was super easy to get working, just plugged in a USB mouse and keyboard and HDMI monitor. 

I ran the benchmarks and was massively impressed with the FPS that can be processed.   This machine has some serious power.  Basically, this device you are going to locate at the edge in a robot, drone, car or other edge point could be your desktop machine.

I ran a few graphics demos and tests to validate everything once my keyboard, mouse and HDMI monitor were connected.   The abilities are awesome.   I can see why NVIDIA GPUs are amazing for gaming.   


The specifications for the edge device are very impressive.   The 8GB of RAM makes this feel like a powerful desktop and not a low powered edge device.  




I ran the benchmarks and they were smoking fast.   I can see using this as a workstation as the FPS were nice as you can see below.




In part 2, I am going to show how to run some edge AI workloads at tremendous speed and stream the results and images to your cloud or big data environments using Apache open source frameworks including Apache Flink, Apache NiFi - MiNiFi and Apache Kafka.

In part 3, We will push the processing capabilities and amp up the workloads and test all the impressive features of this new killer edge device.

There is so many great tutorials and learning materials available for the NVIDIA Xavier NX.     I have found that all my work for Jetson Nano has been working here, only faster.  So this is great, I'll have a few interesting demos and run throughs and a video in the follow up articles.   

I added a standard USB hub and a Logitech C270 USB Web Camera which worked perfectly.   I will use that in the follow up articles and some edge applications.

Tutorials and Guides

References:

I highly recommend all AI, Deep Learning, IoT, IIoT, Edge and streaming developers obtain one or more of these developer kits.

This is a powerful machine in a small box.   From edge applications to robotics to smart devices to anything that needs powerful processing at the edge, this is your device.  A fast CPU, fast GPU and all the interfaces you need.  This should be part of any project.   Joining my NVIDIA Jetson Nano you now have some great affordable options for Edge AI applications.   It is amazing to test drive the performance of this device.   I will also be showing this at my online meetups, so join me or watch the video on Youtube later.

===

Jetson Xavier NX Developer Kit features:
 
Power:  10W (Max efficiency) | 15W (Max performance)
NVIDIA Volta architecture with 384 NVIDIA CUDA® cores and 48 Tensor cores
6-core NVIDIA Carmel ARM®v8.2 64-bit CPU 6 MB L2 + 4 MB L3
2x NVDLA Engines
8 GB 128-bit LPDDR4x @ 51.2GB/s
2x 4K @ 30 | 6x 1080p @ 60 | 14x 1080p @ 30 (H.265/H.264)
Gigabit Ethernet, M.2 Key E (WiFi/BT included), M.2 Key M (NVMe)
HDMI 
4x USB 3.1, USB 2.0 Micro-B
2x 4K @ 60 !  If you lower the resolution, it scales up the numbers.

The Jetson Xavier NX Developer Kit is now available for $399 US at NVIDIA.com and from channel partners worldwide.    I would recommend acquiring some ASAP before current supplies wane and you may have to wait.




ODPI's OpenDS4All - Open Source Data Science Content To Teach the World

OpenDS4All




Start learning now:


ODPI has officially announced this recently and it looks great.

There is a ton of amazing materials including slides, notes, documentation, homework, exercises and Jupyter notebooks covering Data Wrangling, Data Science, the Basics and Apache Spark.   

taxonomy

This“starter set” of training materials can help you build a Data Science program for yourself, your company, your university or your non-profit.    I am going to bring some of these to my meetups and hopefully can help give back with new materials, updates and suggestions.

These are college level materials developed by the University of Pennsylvania and open source via the ODPI with IBM leading.   The code and slides look great.   I can see these helping to enable the world adding another million desperately needed Data Scientists, Data Engineers and Data Science Enabled professionals.

I have been running some of this via Cloudera Machine Learning in my CDP cluster in AWS and it works great.   This is really well made.   I am hoping to create a module on Streaming Data Science to contribute.



EdgeAI: Google Coral with Coral Environmental Sensors and TPU With NiFi and MiNiFi (Updated EFM)

EdgeAI:   Google Coral with Coral Environmental Sensors and TPU With NiFi and MiNiFi


Building MiNiFi IoT Apps with the new Cloudera EFM 


It is very easy to build a drag and drop EdgeAI application with EFM and then push to all your MiNiFi agents.


Cloudera Edge Management CEM-1.1.1
Download the newest CEM today!









NiFi Flow Receiving From MiNiFi Java Agent


In a cluster in my CDP-DC Cluster I consume Kafka messages sent from my remote NiFi gateway to publish alerts to Kafka and push records to Apache HBase and Apache Kudu.  We filter our data with Streaming SQL.


We can use SQL to route, create aggregates like averages, chose a subset of fields and limit data returned.   Using the power of Apache Calcite, Streaming SQL in NiFi is a game changer against Record Data Types including CSV, XML, Avro, Parquet, JSON and Grokable text.   Read and write different formats and convert when your SQL is done.   Or just to SELECT * FROM FLOWFILE to get everything.  



We can see this flow from Atlas as we trace the data lineage and provenance from Kafka topic.



We can search Atlas for Kafka Topics.



From coral Kafka topic to NiFi to Kudu.


Details on Coral Kafka Topic


Examining the Hive Metastore Data on the Coral Kudu Table


NiFi Flow Details in Atlas


Details on Alerts Topic
'


Statistics from Atlas





Example Web Camera Image



 Example JSON Record

[{"cputemp":59,"id":"20200221190718_2632409e-f635-48e7-9f32-aa1333f3b8f9","temperature":"39.44","memory":91.1,"score_1":"0.29","starttime":"02/21/2020 14:07:13","label_1":"hair spray","tempf":"102.34","diskusage":"50373.5 MB","message":"Success","ambient_light":"329.92","host":"coralenv","cpu":34.1,"macaddress":"b8:27:eb:99:64:6b","pressure":"102.76","score_2":"0.14","ip":"127.0.1.1","te":"5.10","systemtime":"02/21/2020 14:07:18","label_2":"syringe","humidity":"10.21"}]


Querying Kudu results in Hue


Pushing Alerts to Slack from NiFi





I am running on Apache NiFi 1.11.1 and wanted to point out a new feature.   Download flow:   Will download the highlighted flow/pgroup as JSON.




Looking at NiFi counters to monitor progress:

We can see how easy it is to ingest IoT sensor data and run AI algorithms on Coral TPUs.



Shell (coralrun.sh)


#!/bin/bash
DATE=$(date +"%Y-%m-%d_%H%M%S")
fswebcam -q -r 1280x720 /opt/demo/images/$DATE.jpg
python3 -W ignore /opt/demo/test.py --image /opt/demo/images/$DATE.jpg 2>/dev/null


Kudu Table DDL

https://github.com/tspannhw/table-ddl


Python 3 (test.py)


import time
import sys
import subprocess
import os
import base64
import uuid
import datetime
import traceback
import base64
import json
from time import gmtime, strftime
import math
import random, string
import time
import psutil
import uuid 
from getmac import get_mac_address
from coral.enviro.board import EnviroBoard
from luma.core.render import canvas
from PIL import Image, ImageDraw, ImageFont
import os
import argparse
from edgetpu.classification.engine import ClassificationEngine

# Importing socket library 
import socket 

start = time.time()
starttf = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')

def ReadLabelFile(file_path):
    with open(file_path, 'r') as f:
        lines = f.readlines()
    ret = {}
    for line in lines:
        pair = line.strip().split(maxsplit=1)
        ret[int(pair[0])] = pair[1].strip()
    return ret

# Google Example Code
def update_display(display, msg):
    with canvas(display) as draw:
        draw.text((0, 0), msg, fill='white')

def getCPUtemperature():
    res = os.popen('vcgencmd measure_temp').readline()
    return(res.replace("temp=","").replace("'C\n",""))

# Get MAC address of a local interfaces
def psutil_iface(iface):
    # type: (str) -> Optional[str]
    import psutil
    nics = psutil.net_if_addrs()
    if iface in nics:
        nic = nics[iface]
        for i in nic:
            if i.family == psutil.AF_LINK:
                return i.address
# /opt/demo/examples-camera/all_models  
row = { }
try:
#i = 1
#while i == 1:
    parser = argparse.ArgumentParser()
    parser.add_argument('--image', help='File path of the image to be recognized.', required=True)
    args = parser.parse_args()
    # Prepare labels.
    labels = ReadLabelFile('/opt/demo/examples-camera/all_models/imagenet_labels.txt')

    # Initialize engine.
    engine = ClassificationEngine('/opt/demo/examples-camera/all_models/inception_v4_299_quant_edgetpu.tflite')

    # Run inference.
    img = Image.open(args.image)

    scores = {}
    kCount = 1

    # Iterate Inference Results
    for result in engine.ClassifyWithImage(img, top_k=5):
        scores['label_' + str(kCount)] = labels[result[0]]
        scores['score_' + str(kCount)] = "{:.2f}".format(result[1])
        kCount = kCount + 1    

    enviro = EnviroBoard()
    host_name = socket.gethostname()
    host_ip = socket.gethostbyname(host_name)
    cpuTemp=int(float(getCPUtemperature()))
    uuid2 = '{0}_{1}'.format(strftime("%Y%m%d%H%M%S",gmtime()),uuid.uuid4())
    usage = psutil.disk_usage("/")
    end = time.time()
    row.update(scores)
    row['host'] = os.uname()[1]
    row['ip'] = host_ip
    row['macaddress'] = psutil_iface('wlan0')
    row['cputemp'] = round(cpuTemp,2)
    row['te'] = "{0:.2f}".format((end-start))
    row['starttime'] = starttf
    row['systemtime'] = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
    row['cpu'] = psutil.cpu_percent(interval=1)
    row['diskusage'] = "{:.1f} MB".format(float(usage.free) / 1024 / 1024)
    row['memory'] = psutil.virtual_memory().percent
    row['id'] = str(uuid2)
    row['message'] = "Success"
    row['temperature'] = '{0:.2f}'.format(enviro.temperature)
    row['humidity'] = '{0:.2f}'.format(enviro.humidity)
    row['tempf'] = '{0:.2f}'.format((enviro.temperature * 1.8) + 32)    
    row['ambient_light'] = '{0}'.format(enviro.ambient_light)
    row['pressure'] = '{0:.2f}'.format(enviro.pressure)
    msg = 'Temp: {0}'.format(row['temperature'])
    msg += 'IP: {0}'.format(row['ip'])
    update_display(enviro.display, msg)
#    i = 2
except:
    row['message'] = "Error"
print(json.dumps(row)) 

Source Code:



Sensors / Devices / Hardware:

  • Humdity-HDC2010 humidity sensor
  • Light-OPT3002 ambient light sensor
  • Barometric-BMP280 barometric pressure sensor
  • PS3 Eye Camera and Microphone USB
  • Raspberry Pi 3B+
  • Google Coral Environmental Sensor Board
  • Google Coral USB Accelerator TPU

References: