https://www.cloudera.com/products/dataflow/connectors.html
FLaNK Stack Weekly for 08 May 2023
08-May-2023
FLiPN-FLaNK Stack Weekly
Tim Spann @PaaSDev
We have a few more days left of the NiFi Event.
Join me and the NiFi creators! https://attend.cloudera.com/nificommitters0503?internal_keyplay=data-flow&internal_campaign=FY24-Q2_Webinar_Cloudera_AMER_NiFi_Meet_the_Committers&cid=7012H000001ZNXBQA4&internal_link=p07
https://www.youtube.com/watch?v=W1zho5yzm5M&ab_channel=DatainMotion
CODE + COMMUNITY
Please join my meetup group NJ/NYC/Philly/Virtual.
http://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-sanfrancisco/events/292453316/
https://www.meetup.com/futureofdata-newyork/
https://www.meetup.com/futureofdata-philadelphia/
This is Issue #82
https://github.com/tspannhw/FLiPStackWeekly
https://www.linkedin.com/pulse/schedule-2023-tim-spann-/
Videos
https://www.youtube.com/watch?v=ENhT0w44wVc
https://www.youtube.com/watch?v=ptjRobC1FSw
https://www.youtube.com/watch?v=b1sdDmlwAsk&t=1351s
https://www.youtube.com/watch?v=biKribaFD_s
Articles
https://medium.com/@tspann/deploy-your-nifi-flow-to-k8-in-aws-faa85b0e620e
https://medium.com/@tspann/ingest-from-iceberg-tables-with-cloudera-dataflow-2dc3bb30096f
https://github.com/tspannhw/FLaNK-TravelAdvisory/blob/main/steps.md
https://community.cloudera.com/t5/Community-Articles/Best-in-Flow-Event/ta-p/368947
https://github.com/tspannhw/FLaNK-TravelAdvisory
https://www.cloudera.com/solutions/dim-developer.html
https://www.datainmotion.dev/2023/04/cloudera-data-flow-readyflows.html
https://docs.cloudera.com/cdp-public-cloud/cloud/release-summaries/topics/announcement-202304.html
https://www.datainmotion.dev/2023/05/cloud-tools-guidance-how-to-build-data_18.html
https://digma.ai/blog/java-developer-vs-chatgpt-part-i-writing-a-spring-boot-microservice/
https://dzone.com/articles/chataws-deploy-aws-resources-seamlessly-chatgpt
https://medium.com/cloudera-inc/building-an-effective-nifi-flow-partitionrecord-b342a8efc50c
https://www.baeldung.com/spring-boot-chatgpt-api-openai
Recent Talks
https://www.slideshare.net/bunkertor/meetup-streaming-data-pipeline-development
https://www.slideshare.net/bunkertor/rtas-2023-building-a-realtime-iot-application
Events
https://www.youtube.com/watch?v=Ws7YmAHE1O8
https://www.cloudera.com/about/events/evolve.html
https://web.cvent.com/event/7598f981-2f7e-4915-b662-bd7be9b5f48d/summary?RefId=homepage_impact24
May 3-10, 2023: Special Once in a Lifetime Event. Virtual.
May 9, 2023: Garden State Java User Group. In-Person. New Jersey https://gsjug.org/. Modern Data Streaming Pipelines with Java, NiFi, Flink, Kafka. https://gsjug.org/meetings/2023/may2023.html https://www.meetup.com/garden-state-java-user-group/events/293229660/
May 10-12, 2023: Open Source Summit North America. Virtual https://events.linuxfoundation.org/open-source-summit-north-america/
May 11, 2023 https://www.meetup.com/futureofdata-siliconvalley/events/292962395/
May 17-18, 2023: IBM Event. Raleigh, NC.
May 23, 2023: Pulsar Summit Europe. Virtual https://pulsar-summit.org/
May 24-25, 2023: Big Data Fest. Virtual. https://sessionize.com/big-data-fest-by-softserve/
June 14: 12PM EDT Cloudera Now - Virtual https://www.cloudera.com/about/events/cloudera-now-cdp.html?internal_keyplay=ALL&internal_campaign=FY24-Q2_AMER_CNOW_Q2_WEB_EP_P07_2023-06-14&cid=7012H000001ZLmyQAG&internal_link=p07
June 26-28, 2023: NLIT Summit. Milwaukee.
https://www.fbcinc.com/e/nlit/default.aspx
June 28, 2023: NiFi Meetup. Milwaukee and Hybrid. https://www.meetup.com/futureofdata-princeton/events/292976004/
July 19, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html
October 18, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html
Cloudera Events https://www.cloudera.com/about/events.html
More Events: https://www.linkedin.com/pulse/schedule-2023-tim-spann-/
Tools
https://github.com/Textualize/frogmouth
https://github.com/mlc-ai/mlc-llm
https://github.com/chronark/highstorm
https://github.com/charmbracelet/glow
https://github.com/NorSoulx/vscode-openai-code-analyzer
brew install pipx
https://stablediffusionweb.com/#demo
https://github.com/1rgs/jsonformer
https://github.com/cashapp/pranadb
https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md
https://github.com/mishalhossin/Discord-Chatbot-Gpt4Free
https://github.com/taviso/123elf
https://github.com/1rgs/jsonformer
https://github.com/juftin/browsr
https://github.com/jina-ai/thinkgpt
https://github.com/csgoh/roadmapper
https://github.com/AIGC-Audio/AudioGPT
https://github.com/cloudera/CML_AMP_LLM_Chatbot_Augmented_with_Enterprise_Data
© 2020-2023 Tim Spann
FLaNK-TravelAdvisory
FLaNK-TravelAdvisory
Travel Advisory - RSS Processing - Apache NiFi - Apache Kafka - Apache Flink - SQL
Overview
Final Flow
Adding Processors to the Designer
Here I list most of the processors available
https://www.datainmotion.dev/2023/04/dataflow-processors.html
Flow Parameters
Go to parameters and enter all you will need for the flow.
You can add all the ones listed below.
Flow Walk Through
If you are loading my pre-built flow when you enter you will see the details for the process group in the configuration pallet.
We add an invokeHTTP processor and set the parameters.
Now we can add a parameter for the HTTP URL for Travel Advisories.
Connect InvokeHTTP to QueryRecord. Name your connection for monitoring later.
QueryRecord, convert XML(RSS) to JSON, you will need RSSXMLReader and TravelJsonRecordSetWriter.
Connect QueryRecord to SplitJson if no errors.
SplitJson we set the JsonPath Expression to $.*.*.item
We then connect SplitJson to SplitRecord.
For SplitRecord we set the Record Reader to JSON_Reader_InferRoot, the Record Writer to TravelJsonRecordSetWriter and records per split to 1.
SplitRecord connected to EvaluateJSONPath
We set the Destination to flowfile-attribute, Return Type to json and add several new fields.
- description - $.description
- guid - $.guid
- identifier - $.identifier
- link - $.link
- pubdate - $.pubDate
- title - $.title
We connect EvaluateJsonPath to SplitJson.
For SplitJson we set the JsonPath Expression to $.category
From SplitJson to UpdateRecord
In UpdateRecord, we set Record Reader to JSON_Reader_InferRoot and Record Writer to TravelJsonRecordSetWriter. We set Replacement Value Strategy to Literal Value.
We add new fields for our new record format.
- /advisoryId - ${filename}
- /description - ${description}
- /domain - ${identifier:trim()}
- /guid - ${guid}
- /link - ${link}
- /pubdate - ${pubdate}
- /title - ${title}
- /ts - ${now():toNumber()}
- /uuid - ${uuid}
Next we connect UpdateRecord to our Slack Sub-Processor Group
The other branches flows from UpdateRecord to Write to Kafka
For PublishKafka2RecordCDP, there's a lot of parameters to set. This is why we recommend starting with a ReadyFlow.
There are a lot of parameters here, we need to set our Kafka Brokers, Destination Topic Name, JSON_Reader_InferRoot for Reader, AvroRecordSetWriterHWX for writer, turn transactions off, Guarantee Replicated Delivery, Use Content as Record Value, SASL_SSL/Plain security, Username to your login user id or machine user and then the associated password, the SSL Context maps to the Default NiFi SSL Context Service is built in, set uuid as the Message Key Field and finally set the client.id to a unique Kafka producer id.
We then send messages also to Slack about our travel advisories.
We only need one processor to send to slack.
We connect input to our PutSlack processor.
For PutSlack we need to set the Webhook URL to the one from your Slack group admin and put the text from the ingest, set your channel to the channel mapped in the web hook and set a username for your bot.
Flow Services
All these services need to be set.
@copy; 2023 Tim Spann https://datainmotion.dev/