Best in Flow Competition Tutorials Part 1

Best in Flow Competition Tutorials

Author: Michael Kohs  George Vetticaden Timothy Spann

Date: 04/18/2023

Last Updated: 5/1/2023



Useful Data Assets

Setting Your Workload Password

Creating a Kafka Topic

Use Case walkthrough 9

1. Reading and filtering a stream of syslog data 9

2. Writing critical syslog events to Apache Iceberg for analysis 29

3. Resize image flow deployed as serverless function 56




Use Case Walkthrough for Competition


Notice


This document assumes that you have registered for an account, activated it and logged into the CDP Sandbox.   This is for authorized users only who have attended the webinar and have read the training materials.


A short guide and references are listed here.


Competition Resources


Login to the Cluster


https://login.cdpworkshops.cloudera.com/auth/realms/se-workshop-5/protocol/saml/clients/cdp-sso 

Kafka Broker connection string 


  • oss-kafka-demo-corebroker2.oss-demo.qsm5-opic.cloudera.site:9093,

  • oss-kafka-demo-corebroker1.oss-demo.qsm5-opic.cloudera.site:9093,

  • oss-kafka-demo-corebroker0.oss-demo.qsm5-opic.cloudera.site:9093



Kafka Topics


  • syslog_json

  • syslog_avro

  • syslog_critical


Schema Registry Hostname


  • oss-kafka-demo-master0.oss-demo.qsm5-opic.cloudera.site


Schema Name


  • syslog

  • syslog_avro

  • syslog_transformed



Syslog Filter Rule


  • SELECT * FROM FLOWFILE WHERE severity <= 2


Access Key and Private Key for Machine User in DataFlow Function


  • Access Key: eda9f909-d1c2-4934-bad7-95ec6e326de8

  • Private Key: eon6eFzLlxZI/gpU0dWtht21DI60MkSQZjIzeWSGBSI=


The following keys are needed if you want to deploy a DataFlow Function that you build during the Best in Flow Competition.



Your Workflow User Name and Password


  1. Click on your name at the bottom left corner of the screen for a menu to pop up.



  1. Click on Profile to be redirected to your user’s profile page with important information.




If your Workload Password does not say currently set or you forgot it, follow the steps below to reset it.   Your userid is shown above at Workload User Name.


Setting Workload Password

You will need to define your workload password that will be used to access non-SSO interfaces. You may read more about it here. Please keep it with you. If you have forgotten it, you will be able to repeat this process and define another one.

  1. From the Home Page, click on your User Name (Ex: tim) at the lower left corner.

  2. Click on the Profile option.

1

  1. Click option Set Workload Password.

  2. Enter a suitable Password and Confirm Password.

  3. Click the button Set Workload Password.

2

3

Check that you got the message - Workload password is currently set or alternatively, look for a message next to Workload Password which says (Workload password is currently set). Save the password you configured as well as the workload user name for use later. 

4



Create a Kafka Topic


The tutorials require you to create an Apache Kafka topic to send your data to, this is how you can create that topic.   You will also need this information to create topics for any of your own custom applications for the competition.


  1. Navigate to Data Hub Clusters from the Home Page

Info:   You can always navigate back to the home page by clicking the app switcher icon at the top left of your screen.



 

  1. Navigate to the oss-kafka-demo cluster



  1. Navigate to Streams Messaging Manager  




Info:   Streams Messaging Manager (SMM) is a tool for working with Apache Kafka.


  1. Now that you are in SMM. 


  1. Navigate to the round icon third from the top, click this Topic button.


  1. You are now in the Topic browser.


  1. Click Add New to build a new topic.

  1. Enter the name of your topic prefixed with your Workload User Name, ex:   <<replace_with_userid>>_syslog_critical.



  1. For settings you should create it with (3 partitions, cleanup.policy: delete, availability maximum) as shown above.

After successfully creating a topic, close the tab that opened when navigating to Streams Messaging Manager


Congratulations! You have built a new topic.   






  1. After successfully creating a topic, close the tab that opened when navigating to Streams Messaging Manager



FLiPN-FLaNK Stack Weekly for 30 April 2023

 

30-April-2023

FLiPN-FLaNK Stack Weekly

Tim Spann @PaaSDev

It was great seeing everyone at the Real-Time Analytics Summit and the meetup in San Francisco. Now let's get current on NiFi and build some new Data Flows!

May 3, 2023!!!! Join me and the NiFi creators! https://attend.cloudera.com/nificommitters0503?internal_keyplay=data-flow&internal_campaign=FY24-Q2_Webinar_Cloudera_AMER_NiFi_Meet_the_Committers&cid=7012H000001ZNXBQA4&internal_link=p07

Cool NiFi 2.0 Stuff -> https://issues.apache.org/jira/browse/NIFI-10757




timpacman

NiFiEvolution

CODE + COMMUNITY

Please join my meetup group NJ/NYC/Philly/Virtual.

http://www.meetup.com/futureofdata-princeton/

https://www.meetup.com/futureofdata-sanfrancisco/events/292453316/

https://www.meetup.com/futureofdata-newyork/

https://www.meetup.com/futureofdata-philadelphia/

ready

This is Issue #81

https://github.com/tspannhw/FLiPStackWeekly

https://www.linkedin.com/pulse/schedule-2023-tim-spann-/

Apache NiFi 2.0

NiFi 2.0 Python Demo

NiFi build for 2.0.0-SNAPSHOT allowing Python Processors: https://drive.google.com/file/d/1xAuao9rV8F_CQBLqWLWp7P12iZpuuUEP/view?usp=share_link And some sample processors: https://drive.google.com/drive/folders/1VCtNQmThAHL44-t2ORdav9YPIHMvCk_b

https://www.youtube.com/watch?v=9Oi_6nFmbPg&ab_channel=NiFiNotes

MiNiFi Updates

CEM MiNiFi C++ Agent - 1.23.04: Added support for the following processors: Fetch/PutOPCProcessor to get and push data over OPC-UA, Start shipping prometheus extension for metrics export, EL toDate can now parse RFC3339 dates. https://docs.cloudera.com/cem/1.5.1/release-notes-minifi-cpp/topics/cem-minifi-cpp-agent-updates.html https://docs.cloudera.com/cem/1.5.1/release-notes-minifi-cpp/topics/cem-minifi-cpp-download-locations.html

Data Trends

https://www.thoughtworks.com/radar/languages-and-frameworks?blipid=202210050

https://github.com/sdv-dev/SDV

Videos

https://www.youtube.com/watch?v=lqxPyHYzGQ0&ab_channel=DatainMotion

https://www.youtube.com/watch?v=4RoMOQtqKC0

https://www.youtube.com/watch?v=yKFS8-A14Tg&ab_channel=DatainMotion

Articles

https://www.cloudera.com/solutions/dim-developer.html

https://www.datainmotion.dev/2023/04/cloudera-data-flow-readyflows.html

https://www.datainmotion.dev/2023/04/dataflow-processors.html

https://funnifi.blogspot.com/2023/04/transform-json-string-field-into-record.html

http://funnifi.blogspot.com/2023/04/using-jslttransformjson-alternative-to.html

https://docs.cloudera.com/cdp-public-cloud/cloud/getting-started/topics/cdp-deploy_cdp_using_terraform.html

https://streamnative.io/blog/introducing-oxia-scalable-metadata-and-coordination?

Recent Talks

https://www.slideshare.net/bunkertor/meetup-streaming-data-pipeline-development

https://www.slideshare.net/bunkertor/rtas-2023-building-a-realtime-iot-application

Events

https://www.youtube.com/watch?v=Ws7YmAHE1O8

https://www.cloudera.com/about/events/evolve.html

https://web.cvent.com/event/7598f981-2f7e-4915-b662-bd7be9b5f48d/summary?RefId=homepage_impact24

May 3, 2023: Meet the Committers. Virtual https://attend.cloudera.com/nificommitters0503

May 3-10, 2023: Special Once in a Lifetime Event. Virtual.

img

May 9, 2023: Garden State Java User Group. In-Person. New Jersey https://gsjug.org/. Modern Data Streaming Pipelines with Java, NiFi, Flink, Kafka. https://gsjug.org/meetings/2023/may2023.html https://www.meetup.com/garden-state-java-user-group/events/293229660/

May 10-12, 2023: Open Source Summit North America. Virtual https://events.linuxfoundation.org/open-source-summit-north-america/

May 17-18, 2023: IBM Event. Raleigh, NC.

May 23, 2023: Pulsar Summit Europe. Virtual https://pulsar-summit.org/

talks

talks2

May 24-25, 2023: Big Data Fest. Virtual. https://sessionize.com/big-data-fest-by-softserve/

June 14: 12PM EDT Cloudera Now - Virtual https://www.cloudera.com/about/events/cloudera-now-cdp.html?internal_keyplay=ALL&internal_campaign=FY24-Q2_AMER_CNOW_Q2_WEB_EP_P07_2023-06-14&cid=7012H000001ZLmyQAG&internal_link=p07

June 26-28, 2023: NLIT Summit. Milwaukee.
https://www.fbcinc.com/e/nlit/default.aspx

June 28, 2023: NiFi Meetup. Milwaukee and Hybrid. https://www.meetup.com/futureofdata-princeton/events/292976004/

meetup

July 19, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html

October 18, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html

Cloudera Events https://www.cloudera.com/about/events.html

More Events: https://www.linkedin.com/pulse/schedule-2023-tim-spann-/

Code

https://flightaware.com/adsb/stats/site/180330

https://huggingface.co/chat/conversation/644b0761bde5eee46bf58eb2

https://github.com/streamnative/oxia

Tools

https://github.com/xdgrulez/kash.py

https://cheat.sh/

https://github.com/kuasar-io/kuasar/releases

https://github.com/jkfran/killport

https://github.com/StanfordBDHG/HealthGPT

https://orbstack.dev/

https://github.com/dynobo/normcap

https://github.com/faustomorales/keras-ocr

https://www.dragonflydb.io/

https://github.com/gventuri/pandas-ai

https://mrsk.dev/

https://github.com/termux/termux-app

https://www.youtube.com/watch?v=GsUKTs-J7jQ&ab_channel=DatainMotion

https://lineageos.org/

https://github.com/h2oai/h2o-llmstudio

https://github.com/sdv-dev/SDV

https://github.com/karpathy/nanoGPT

© 2020-2023 Tim Spann