FLaNK Stack Weekly for 08 May 2023

 

08-May-2023

FLiPN-FLaNK Stack Weekly

Tim Spann @PaaSDev

We have a few more days left of the NiFi Event.

Join me and the NiFi creators! https://attend.cloudera.com/nificommitters0503?internal_keyplay=data-flow&internal_campaign=FY24-Q2_Webinar_Cloudera_AMER_NiFi_Meet_the_Committers&cid=7012H000001ZNXBQA4&internal_link=p07

https://www.youtube.com/watch?v=W1zho5yzm5M&ab_channel=DatainMotion

CODE + COMMUNITY

Please join my meetup group NJ/NYC/Philly/Virtual.

http://www.meetup.com/futureofdata-princeton/

https://www.meetup.com/futureofdata-sanfrancisco/events/292453316/

https://www.meetup.com/futureofdata-newyork/

https://www.meetup.com/futureofdata-philadelphia/

ready

This is Issue #82

https://github.com/tspannhw/FLiPStackWeekly

https://www.linkedin.com/pulse/schedule-2023-tim-spann-/

Videos

https://www.youtube.com/watch?v=ENhT0w44wVc

https://www.youtube.com/watch?v=ptjRobC1FSw

https://www.youtube.com/watch?v=b1sdDmlwAsk&t=1351s

https://www.youtube.com/watch?v=biKribaFD_s

Articles

https://medium.com/@tspann/deploy-your-nifi-flow-to-k8-in-aws-faa85b0e620e

https://medium.com/@tspann/ingest-from-iceberg-tables-with-cloudera-dataflow-2dc3bb30096f

https://github.com/tspannhw/FLaNK-TravelAdvisory/blob/main/steps.md

https://community.cloudera.com/t5/Community-Articles/Best-in-Flow-Event/ta-p/368947

https://github.com/tspannhw/FLaNK-TravelAdvisory

https://www.cloudera.com/solutions/dim-developer.html

https://www.datainmotion.dev/2023/04/cloudera-data-flow-readyflows.html

https://docs.cloudera.com/cdp-public-cloud/cloud/release-summaries/topics/announcement-202304.html

https://www.datainmotion.dev/2023/05/cloud-tools-guidance-how-to-build-data_18.html

https://digma.ai/blog/java-developer-vs-chatgpt-part-i-writing-a-spring-boot-microservice/

https://dzone.com/articles/chataws-deploy-aws-resources-seamlessly-chatgpt

https://medium.com/cloudera-inc/building-an-effective-nifi-flow-partitionrecord-b342a8efc50c

https://www.baeldung.com/spring-boot-chatgpt-api-openai

Recent Talks

https://www.slideshare.net/bunkertor/meetup-streaming-data-pipeline-development

https://www.slideshare.net/bunkertor/rtas-2023-building-a-realtime-iot-application

Events

https://www.youtube.com/watch?v=Ws7YmAHE1O8

https://www.cloudera.com/about/events/evolve.html

https://web.cvent.com/event/7598f981-2f7e-4915-b662-bd7be9b5f48d/summary?RefId=homepage_impact24

May 3-10, 2023: Special Once in a Lifetime Event. Virtual.

img

May 9, 2023: Garden State Java User Group. In-Person. New Jersey https://gsjug.org/. Modern Data Streaming Pipelines with Java, NiFi, Flink, Kafka. https://gsjug.org/meetings/2023/may2023.html https://www.meetup.com/garden-state-java-user-group/events/293229660/

May 10-12, 2023: Open Source Summit North America. Virtual https://events.linuxfoundation.org/open-source-summit-north-america/

May 11, 2023 https://www.meetup.com/futureofdata-siliconvalley/events/292962395/

May 17-18, 2023: IBM Event. Raleigh, NC.

May 23, 2023: Pulsar Summit Europe. Virtual https://pulsar-summit.org/

talks

talks2

May 24-25, 2023: Big Data Fest. Virtual. https://sessionize.com/big-data-fest-by-softserve/

June 14: 12PM EDT Cloudera Now - Virtual https://www.cloudera.com/about/events/cloudera-now-cdp.html?internal_keyplay=ALL&internal_campaign=FY24-Q2_AMER_CNOW_Q2_WEB_EP_P07_2023-06-14&cid=7012H000001ZLmyQAG&internal_link=p07

June 26-28, 2023: NLIT Summit. Milwaukee.
https://www.fbcinc.com/e/nlit/default.aspx

June 28, 2023: NiFi Meetup. Milwaukee and Hybrid. https://www.meetup.com/futureofdata-princeton/events/292976004/

meetup

July 19, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html

October 18, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html

Cloudera Events https://www.cloudera.com/about/events.html

More Events: https://www.linkedin.com/pulse/schedule-2023-tim-spann-/

Tools

https://github.com/Textualize/frogmouth

https://github.com/mlc-ai/mlc-llm

https://github.com/chronark/highstorm

https://github.com/charmbracelet/glow

https://github.com/NorSoulx/vscode-openai-code-analyzer

brew install pipx

https://stablediffusionweb.com/#demo

https://github.com/1rgs/jsonformer

https://github.com/cashapp/pranadb

https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md

https://github.com/mishalhossin/Discord-Chatbot-Gpt4Free

https://github.com/taviso/123elf

https://github.com/1rgs/jsonformer

https://github.com/juftin/browsr

https://github.com/jina-ai/thinkgpt

https://github.com/csgoh/roadmapper

https://github.com/AIGC-Audio/AudioGPT

https://github.com/cloudera/CML_AMP_LLM_Chatbot_Augmented_with_Enterprise_Data

© 2020-2023 Tim Spann

FLaNK-TravelAdvisory

 

FLaNK-TravelAdvisory

Travel Advisory - RSS Processing - Apache NiFi - Apache Kafka - Apache Flink - SQL

Overview

overview

Final Flow

overview

Adding Processors to the Designer

Here I list most of the processors available

https://www.datainmotion.dev/2023/04/dataflow-processors.html

Flow Parameters

Go to parameters and enter all you will need for the flow.

overview

You can add all the ones listed below.

overview

Flow Walk Through

If you are loading my pre-built flow when you enter you will see the details for the process group in the configuration pallet.

We add an invokeHTTP processor and set the parameters.

overview

details

Now we can add a parameter for the HTTP URL for Travel Advisories.

overview

Connect InvokeHTTP to QueryRecord. Name your connection for monitoring later.

overview

QueryRecord, convert XML(RSS) to JSON, you will need RSSXMLReader and TravelJsonRecordSetWriter.

overview

Connect QueryRecord to SplitJson if no errors.

overview

SplitJson we set the JsonPath Expression to $.*.*.item

overview

We then connect SplitJson to SplitRecord.

overview

For SplitRecord we set the Record Reader to JSON_Reader_InferRoot, the Record Writer to TravelJsonRecordSetWriter and records per split to 1.

overview

SplitRecord connected to EvaluateJSONPath

overview

overview

We set the Destination to flowfile-attribute, Return Type to json and add several new fields.

  • description - $.description
  • guid - $.guid
  • identifier - $.identifier
  • link - $.link
  • pubdate - $.pubDate
  • title - $.title

overview

We connect EvaluateJsonPath to SplitJson.

overview

For SplitJson we set the JsonPath Expression to $.category

overview

From SplitJson to UpdateRecord

overview

overview

In UpdateRecord, we set Record Reader to JSON_Reader_InferRoot and Record Writer to TravelJsonRecordSetWriter. We set Replacement Value Strategy to Literal Value.

We add new fields for our new record format.

  • /advisoryId - ${filename}
  • /description - ${description}
  • /domain - ${identifier:trim()}
  • /guid - ${guid}
  • /link - ${link}
  • /pubdate - ${pubdate}
  • /title - ${title}
  • /ts - ${now():toNumber()}
  • /uuid - ${uuid}

overview

Next we connect UpdateRecord to our Slack Sub-Processor Group

overview

The other branches flows from UpdateRecord to Write to Kafka

overview

overview

For PublishKafka2RecordCDP, there's a lot of parameters to set. This is why we recommend starting with a ReadyFlow.

There are a lot of parameters here, we need to set our Kafka Brokers, Destination Topic Name, JSON_Reader_InferRoot for Reader, AvroRecordSetWriterHWX for writer, turn transactions off, Guarantee Replicated Delivery, Use Content as Record Value, SASL_SSL/Plain security, Username to your login user id or machine user and then the associated password, the SSL Context maps to the Default NiFi SSL Context Service is built in, set uuid as the Message Key Field and finally set the client.id to a unique Kafka producer id.

overview

overview

We then send messages also to Slack about our travel advisories.

overview

We only need one processor to send to slack.

overview

We connect input to our PutSlack processor.

overview

For PutSlack we need to set the Webhook URL to the one from your Slack group admin and put the text from the ingest, set your channel to the channel mapped in the web hook and set a username for your bot.

Flow Services

services

All these services need to be set.

@copy; 2023 Tim Spann https://datainmotion.dev/