FLaNK Stack for 3rd July 2023
3-July-2023
HOLIDAY!
FLiPN-FLaNK Stack Weekly
Tim Spann @PaaSDev
My friend wrote an awesome new book on streaming, I highly recommend picking up a copy!
https://leanpub.com/streamprocessingwithapacheflink/c/ucQ5dLcZYAo2
CODE + COMMUNITY
Please join my meetup group NJ/NYC/Philly/Virtual.
http://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
https://www.meetup.com/futureofdata-philadelphia/
**This is Issue #92 **
https://github.com/tspannhw/FLiPStackWeekly
https://www.linkedin.com/pulse/schedule-2023-tim-spann-/
Videos
https://www.youtube.com/watch?v=8NrK69WrRq0&ab_channel=PlainSchwarz
Talks
https://www.slideshare.net/bunkertor/meetup-streaming-data-pipeline-development-258709707
https://www.slideshare.net/bunkertor/big-data-fest-building-modern-data-streaming-apps
https://www.youtube.com/live/1xFha8va7pg?feature=share
Articles
https://exceptionfactory.com/posts/2023/07/01/streamlining-apache-nifi-cluster-state-migration/
https://medium.com/@tspann/cdc-not-cat-data-capture-e43713879c03
https://medium.com/@tspann/functions-anywhere-faas-ee92ecedb248
https://blog.cloudera.com/fraud-detection-with-cloudera-stream-processing-part-1/
https://siliconangle.com/2023/06/27/cloudera-expands-apache-iceberg-support-private-clouds/
https://debezium.io/blog/2023/06/22/towards-exactly-once-delivery/
https://dev.to/thedanicafine/so-you-want-to-speak-at-a-technical-conference-responding-to-a-cfp-54m6
https://www.vox.com/climate/23769186/bad-air-quality-index-wildfires-pollution
https://marcushellberg.dev/java-ecosystem-trends-report-2023
https://hazelcast.com/blog/enriching-kafka-applications-with-contextual-data/
Documentation
https://docs.cloudera.com/csa/1.10.0/how-to-ssb/topics/csa-ssb-kafka-kudu-join.html
https://docs.cloudera.com/runtime/7.2.17/index.html
Events
https://attend.cloudera.com/ameropendatalakehousewithcdpon?lid=7vxyhds3tlv7
July 19, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html
October 18, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html
Cloudera Events https://www.cloudera.com/about/events.html
More Events: https://www.linkedin.com/pulse/schedule-2023-tim-spann-/
Code
https://github.com/cloudera/CML_AMP_LLM_Chatbot_Augmented_with_Enterprise_Data/tree/main
NiFi Code
https://github.com/georgevetticaden/evernote-ai-chatbot
Tools
- https://saurabhs.org/advanced-macos-commands
- https://github.com/poloclub/wizmap
- https://high-qr-code-generator.com/
- https://github.com/salesforce/xGen
- https://erichartford.com/openorca
- https://neal.fun/password-game/
- https://github.com/Kanaries/graphic-walker
- https://orbstack.dev/
- https://github.com/apache/parquet-format/blob/master/Encryption.md
- https://github.com/Stability-AI/generative-models
- https://github.com/CASIA-IVA-Lab/FastSAM
- https://github.com/imgly/background-removal-js
- https://github.com/ooguz/papyrus
- https://github.com/configu/configu
- https://www.pinecone.io/
- https://github.com/orf/gping
- https://rust-lang.github.io/mdBook/
© 2020-2023 Tim Spann
FLaNK Stack Weekly for 26 June 2023
26-June-2023
FLiPN-FLaNK Stack Weekly
Tim Spann @PaaSDev
My friend wrote an awesome new book on streaming, I highly recommend picking up a copy!
https://leanpub.com/streamprocessingwithapacheflink/c/ucQ5dLcZYAo2
Join me in person for steak & stack or virtually for FLaNK Stack
https://www.meetup.com/futureofdata-princeton/events/292976004/
Wednesday, June 28, 2023 at 6:00 PM to Wednesday, June 28, 2023 at 8:00 PM EDT Add to calendar The Capital Grille 310 W Wisconsin Ave · Milwaukee, WI
Also live streamed to Youtube
This will be a hybrid event with a Zoom. The in-person event will be in Milwaukee.
In this interactive session, Tim will lead participants through how to best build streaming data pipelines. He will cover how to build applications from some common use cases and highlight tips, tricks, best practices and patterns. He will show how to build the easy way and then dive deep into the underlying open source technologies including Apache NiFi, Apache Flink, Apache Kafka and Apache Iceberg. If you wish to follow along, please download open source projects beforehand. You can also download this helpful streaming platform: https://docs.cloudera.com/csp-ce/latest/installation/topics/csp-ce-installing-ce.html All source code and slides will be shared for those interested in building their own FLaNK Apps. https://www.flankstack.dev/
https://www.thecapitalgrille.com/locations/wi/milwaukee/milwaukee/8027
Hardware For FLaNK
The amazing team at Ampere Computing sent us a 2U Mt Jade.
https://amperecomputing.com/en/systems/altra/2u-mt-jade-2s-nvme
We will be running some AI, IoT, MiNiFi, NiFi, Kafka, Flink, Pulsar, Spark, Iceberg, Ozone, HBase, Kudu, Hive, Impala, Jupyter and more workloads here.
Updates
CDF-PC 2.5 on CDP Public Cloud
https://docs.cloudera.com/dataflow/cloud/deploy-flows/topics/cdf-flow-deployment-autoscaling.html
New Advanced UIs:
- The Flow Designer now supports the advanced configuration UI for UpdateAttribute.
- The Flow Designer now supports the advanced configuration UI for JoltTransformJson.
- New Canvas navigation: The Flow Designer now supports Birdseye and Zoom controls.
- New troubleshooting: The Flow Designer now supports Processor Diagnostics with an active Test Session.
- Multi-Select: The Flow Designer now supports multi-selection on the canvas and bulk actions for Start, Stop, Enable, Disable, Move, Change parent group, Copy/Paste, and Delete.
New ReadyFlows for this release:
- CDW Ingest
- CDP Kafka to Snowflake
- Slack to S3
- Updated Confluent Cloud to Snowflake using new Snowpipe processors
CODE + COMMUNITY
Please join my meetup group NJ/NYC/Philly/Virtual.
http://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
https://www.meetup.com/futureofdata-philadelphia/
**This is Issue #91 **
You may notice a version jump, Linked in says we had 89 already, so I am assuming two other articles got assimilated. I will go with this, since 90 is a better number.
https://github.com/tspannhw/FLiPStackWeekly
https://www.linkedin.com/pulse/schedule-2023-tim-spann-/
Courses
https://www.cloudera.com/about/training/courses/apache-nifi-anti-patterns.html
Videos
https://www.youtube.com/watch?v=H1SYOuLcUTI&ab_channel=Ververica
https://www.youtube.com/watch?app=desktop&v=8cZJ9CyLYyI
Conference Videos
Hail Hydrate! From Stream to Lake https://www.youtube.com/watch?v=IBpqa8re--o&ab_channel=PowerShell.org
Articles
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
https://dzone.com/articles/apache-nifi-10-cheatsheet
https://www.linkedin.com/posts/excalidraw_re-keying-a-kafka-topic-activity-7077942003837100033-KfnM/
https://medium.com/@tspann/functions-anywhere-faas-ee92ecedb248
Events
June 26-28, 2023: NLIT Summit. Milwaukee.
https://www.fbcinc.com/e/nlit/default.aspx
June 28, 2023: NiFi Meetup. Milwaukee and Hybrid. https://www.meetup.com/futureofdata-princeton/events/292976004/
July 19, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html
October 18, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html
Cloudera Events https://www.cloudera.com/about/events.html
More Events: https://www.linkedin.com/pulse/schedule-2023-tim-spann-/
Code
https://github.com/polyzos/stream-processing-with-apache-flink
NiFi Code
Tools
- https://devpod.sh/
- https://github.com/owenthereal/ccat
- https://github.com/SkalskiP/top-cvpr-2023-papers
- https://github.com/vercel-labs/ai
- https://github.com/arwes/arwes
- https://linearmouse.app/
- https://github.com/axllent/mailpit
- https://index.quantumstat.com/
- https://www.edx.org/course/introduction-computer-science-harvardx-cs50x
- https://app.revolt.chat/login
- https://github.com/binpash/try
- https://paimon.apache.org/docs/master/engines/flink/
- https://github.com/Kanaries/pygwalker
© 2020-2023 Tim Spann
FLaNK Stack Weekly for 20 June 2023
20-June-2023
FLiPN-FLaNK Stack Weekly
Tim Spann @PaaSDev
We are publishing late due to Father's Day and https://en.wikipedia.org/wiki/Juneteenth.
NiFI updates
Apache NiFi 1.22 is now available for download! Release Date: June 11, 2023 MiNiFi agents can now talk to C2 servers using reverse proxy/load balancers. New processor to support modifying the compression algorithm of content which still incurs the CPU hit substantially improves IO by avoiding writing and reading the intermediary decompressed form. New processor for Azure Queue Storage using Azure SDK 12. Put Database Record now allows for Upserts. Deprecated additional components and features in preparation toward an Apache NiFi 2.0. Upgraded numerous dependencies due to potential vulnerabilities and latest stable lines. Numerous bug fixes to processors but also core NiFi framework behavior A full list of issues that were resolved can be found at: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12353069
https://cwiki.apache.org/confluence/display/NIFI/Deprecated+Components+and+Features
CODE + COMMUNITY
Please join my meetup group NJ/NYC/Philly/Virtual.
http://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
https://www.meetup.com/futureofdata-philadelphia/
**This is Issue #90 **
You may notice a version jump, Linked in says we had 89 already, so I am assuming two other articles got assimilated. I will go with this, since 90 is a better number.
https://github.com/tspannhw/FLiPStackWeekly
https://www.linkedin.com/pulse/schedule-2023-tim-spann-/
Videos
https://www.youtube.com/watch?v=_uYp8s6_6GA&t=1s&pp=ygUQIlRpbSBzcGFubiIgbmlmaQ%3D%3D
https://www.cloudera.com/about/events/cloudera-now-cdp.html
Articles
https://dzone.com/articles/harnessing-the-power-of-nifi-building-a-seamless-f
https://medium.com/@tspann/apache-nifi-1-22-updates-e658eaff3308
https://blog.cloudera.com/one-big-cluster-stuck-platform-health/
https://medium.com/@ayushtkn/apache-hive-esri-geospatial-support-5ca815daaa7e
https://medium.com/@kiranprabhu/kafka-topic-naming-conventions-best-practices-6b6b332769a3
https://www.datanami.com/2023/01/30/five-drivers-behind-the-rapid-rise-of-apache-flink/
https://webtechie.be/post/2023-06-15-crac-on-raspberry-pi/
https://www.cdc.gov/nbs/modernization/stories/202304.html
https://www.smarthomebeginner.com/podman-vs-docker/
https://speakerdeck.com/stevenz3wu/streaming-from-apache-iceberg-qcon-ny-2023?slide=8
https://thenewstack.io/kelsey-hightower-predicts-how-the-kubernetes-community-will-evolve/
Recent Talks
https://www.youtube.com/watch?v=Ws7YmAHE1O8
https://www.cloudera.com/about/events/evolve.html
https://speakerdeck.com/stevenz3wu/streaming-from-apache-iceberg-qcon-ny-2023
Events
June 26-28, 2023: NLIT Summit. Milwaukee.
https://www.fbcinc.com/e/nlit/default.aspx
June 28, 2023: NiFi Meetup. Milwaukee and Hybrid. https://www.meetup.com/futureofdata-princeton/events/292976004/
July 19, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html
October 18, 2023: 2-Hours to Data Innovation: Data Flow https://www.cloudera.com/about/events/hands-on-lab-series-2-hours-to-data-innovation.html
Cloudera Events https://www.cloudera.com/about/events.html
More Events: https://www.linkedin.com/pulse/schedule-2023-tim-spann-/
LLM Generated Code
Code
- https://github.com/tspannhw/FLaNK-ParticulateMatterSensor
- https://hub.docker.com/r/apache/hive
- https://impala.apache.org/docs/build/html/topics/impala_iceberg.html
NiFi Code
- https://github.com/tspannhw/FLaNK-CDC/blob/main/flinkcdc.MD
- https://github.com/tspannhw/FLaNK-CDC/blob/main/kafkacdc.md
- https://mvnrepository.com/artifact/org.apache.nifi/nifi-media-nar/1.22.0
- https://github.com/tspannhw/FLaNK-NARs
- https://github.com/tspannhw/FLaNK-Edge
- https://github.com/tspannhw/FLaNK-ParticulateMatterSensor
- https://github.com/tspannhw/FLaNK-LLM
Tools
- https://github.com/stanfordroboticsclub/StanfordQuadruped
- https://www.newsletter.swirlai.com/p/sai-notes-03-apache-flink-architecture
- https://github.com/AI4Finance-Foundation/FinGPT
- https://github.com/jedisct1/libsodium
- https://waabi.ai/oyster/
- https://www.baeldung.com/gatling-load-testing-rest-endpoint
- https://github.com/alibaba/fastjson2
- https://federatedscope.io/docs/algozoo/
- https://designable-antd.formilyjs.org/
- ChatGPT Mac/Windows App for SQL https://github.com/alibaba/Chat2DB
- https://github.com/alibaba/arthas
- Recommendation Framework https://github.com/alibaba/EasyRec
- All in One Computer Vision https://github.com/alibaba/EasyCV
- https://github.com/alibaba/EasyCV/blob/master/docs/source/tutorials/cls.md
- Model as a Service https://github.com/modelscope/modelscope
- CDC from MySQL https://github.com/alibaba/canal/wiki/Docker-QuickStart
- Feature Hub for Flink and Spark https://github.com/alibaba/feathub
- https://jsoup.org/
- https://github.com/dask-contrib/dask-sql
- https://github.com/leondz/garak/
- https://github.com/zilliztech/VectorDBBench
- M1! https://github.com/apple/ml-stable-diffusion
- Fun game https://pixelastic.github.io/pokemonorbigdata/
- https://github.com/tloen/alpaca-lora
- https://shishirpatil.github.io/gorilla/
- https://github.com/boyter/cs?
- https://github.com/algolia/autocomplete?
- https://github.com/lmorg/murex
- https://github.com/keephq/keep
- https://github.com/Safiullah-Rahu/CSV-AI
- https://github.com/naushadh/hive-metastore
- https://github.com/bentoml/OpenLLM
- https://bucket4j.com/
- https://github.com/dainiusjocas/lucene-grep
- https://gpt4all.io/index.html
- https://github.com/AntonOsika/gpt-engineer
- https://github.com/linux-china/chatgpt-spring-boot-starter
- https://jsoncrack.com/
- https://github.com/slidevjs/slidev
- https://docs.flowman.io/en/latest/index.html
- https://github.com/brexhq/prompt-engineering
- https://replicate.com/andreasjansson/musicgen-looper
- https://omnimotion.github.io/
© 2020-2023 Tim Spann