Skip to main content

Posts

Ingesting All The Weather Data With Apache NiFi

Ingesting All The Weather Data With Apache NiFi


Step By Step NiFi Flow
GenerateFlowFile - build a schedule matching when NOAA updates weatherInvokeHTTP - download all weather ZIPCompressContent - decompress ZIPUnpackContent - extract files from ZIP*RouteOnAttribute - just give us ones that are airports (${filename:startsWith('K')}). optional.*QueryRecord - XMLReader to JsonRecordSetWriter.   Query:  SELECT * FROM FLOWFILE WHERE NOT location LIKE '%Unknown%'.  This is to remove some locations that are not identified.  optional.Send it somewhere for storage.   Could put PutKudu, PutORC, PutHDFS, PutHiveStreaming, PutHbaseRecord, PutDatabaseRecord, PublishKafkaRecord2* or others.







URL For All US Data
invokehttp.request.url https://w1.weather.gov/xml/current_obs/all_xml.zip


Example Record As Converted JSON
[ {   "credit" : "NOAA's National Weather Service",   "credit_URL" : "http://weather.gov/",   "image" : {     "url" :…

Apache Flink SQL Demo (FLaNK Series)

Using Cloudera Data Platform with Flow Management and Streams on Azure

Using Cloudera Data Platform with Flow Management and Streams on Azure
Today I am going to be walking you through using Cloudera Data Platform (CDP) with Flow Management and Streams on Azure Cloud.  To see a streaming demo video, please join my webinar (or see it on demand) at Streaming Data Pipelines with CDF in Azure.  I'll share some additional how-to videos on using Apache NiFi and Apache Kafka in Azure very soon.   





In the above process group we are using QueryRecord to segment JSON records and only pick ones where the Temperature in Fahrenheit is over 80 degrees then we pick out a few attributes to display from the record and send them to a slack channel.
To become a Kafka Producer you set a Record Reader for the type coming in, this is JSON in my case and then set a Record Writer for the type to send to the sensors topic.    In this case we kept it as JSON, but we could convert to AVRO.   I usually do that if I am going to be reading it with Cloudera Kafka Connect.


Our security…

The Rise of the Mega Edge (FLaNK)

At one point edge devices were cheap, low energy and low powered.   They may have some old WiFi and a single core CPU running pretty slow.    Now power, memory, GPUs, custom processors and substantial power has come to the edge.
Sitting on my desk is the NVidia Xaver NX which is the massively powerful machine that can easily be used for edge computing while sporting 8GB of fast RAM, a 384 NVIDIA CUDA® cores and 48 Tensor cores GPU, a 6 core 64-bit ARM CPU and is fast.   This edge device would make a great workstation and is now something that can be affordably deployed in trucks, plants, sensors and other Edge and IoT applications.  
https://www.datainmotion.dev/2020/06/unboxing-most-amazing-edge-ai-device.html
Next that titan device is the inexpensive hobby device, the Raspberry Pi 4 that now sports 8 GB of LPDDR4 RAM, 4 core 64-bit ARM CPU and is speedy!   It can also be augmented with a Google Coral TPU or Intel Movidius 2 Neural Compute Stick.   
https://dzone.com/articles/efm-series-…

Explore Enterprise Apache Flink with Cloudera Streaming Analytics - CSA 1.2

Explore Enterprise Apache Flink with Cloudera Streaming Analytics - CSA 1.2
What's New in Cloudera Streaming Analytics
https://docs.cloudera.com/csa/1.2.0/release-notes/topics/csa-what-new.html https://docs.cloudera.com/csa/1.2.0/index.html
Try out the tutorials now:   https://github.com/cloudera/flink-tutorials
So let's get our Apache Flink on, as part of my FLaNK Stack series I'll show you some fun things we can do with Apache Flink + Apache Kafka + Apache NiFi.
We will look at some of updates in Apache Flink 1.10 including the SQL Client and API.
We are working with Apache Flink 1.10, Apache NiFi 1.11.4 and Apache Kafka 2.4.1.
The SQL features are strong and we will take a look at what we can do.
https://docs.cloudera.com/csa/1.2.0/release-notes/topics/csa-supported-sql.html
Table connectors KafkaKuduHive (through catalog)
Data formats (Kafka) JSONAvroCSV
Using Hive Catalog with Flink SQL:https://docs.cloudera.com/csa/1.2.0/flink-sql-table-api/topics/csa-hive-catalog.html
Use Kudu Cat…

Using Apache Kafka Using Cloudera Data Platform Data Center 7.1.1

Unboxing the Most Amazing Edge AI Device Part 1 of 3 - NVIDIA Jetson Xavier NX

Unboxing the Most Amazing Edge AI Device 
Fast, Intuitive, Powerful and Easy. Part 1 of 3 NVIDIA Jetson Xavier NX

This is the first of a series on articles on using the Jetson Xavier NX Developer kit for EdgeAI applications.   This will include running various TensorFlow, Pytorch, MXNet and other frameworks.  I will also show how to use this amazing device with Apache projects including the FLaNK Stack of Apache Flink, Apache Kafka, Apache NiFi, Apache MXNet and Apache NiFi - MiNiFi.
These are not words that one would usually use to define AI, Deep Learning, IoT or Edge Devices.    They are now.    There is a new tool for making what was incredibly slow and difficult to something that you can easily get your hands on and develop with.  Supporting running multiple models simultaneously in containers with fast frame rates is not something I thought you could affordably run in robots and IoT devices.    Now it is and this will drive some amazingly smart robots, drones, self-driving machines a…