Pulsar Summit Europe 2021 is taking place virtually on October 6. Sessions include industry experts from Apache Pulsar PMC, CleverCloud, and Databricks. You’ll learn about the latest Pulsar project updates, technology. Register today and save your seat:
Building Bad Titles For Talks
Building Bad Titles For Talks
gentitles.pt
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.train_from_file('tim.txt', num_epochs=1)
textgen.generate()
Example Run
tspann@Timothys-MBP code % python3.7 gentitles.py
/Users/tspann/Library/Python/3.7/lib/python/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:375: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
"The `lr` argument is deprecated, use `learning_rate` instead.")
2021-08-02 10:40:28.146481: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
69 texts collected.
Training on 2,506 character sequences.
2021-08-02 10:40:28.710370: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
19/19 [==============================] - 6s 143ms/step - loss: 1.8994
####################
Temperature: 0.2
####################
Apache Streaming Streaming Station Stack
First Anti-Tatto Stack (A File State Stack Pack And Pussions
A Stack of Apache Stack
####################
Temperature: 0.5
####################
Cloud Dead Folk Streaming And Analance Art Past Flink
Into Apache Space Trades Channel Stack
Push Lake Station
####################
Temperature: 1.0
####################
Batt-Indunes Means Stgut
Sometimes time page
I real-posts, UIP Puming this reaction
Real-Timobitman with Apache and Flire
Note installing on Mac:
pip3 install git+git://github.com/minimaxir/textgenrnn.git
tim.txt
Apache NiFi 101: Introduction and Best Practices
Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
FLANK Stack for Cloud Data Lakes
FLIP Stack for Cloud Data Lakes
Lightning Introduction to FLaNK
Pack Your Bags, We’re Going on a Data Journey!
Real-Time Streaming in Azure
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using the Mm FLaNK Stack for Edge AI (Flink, NiFi, Kafka, Kudu)
Utilizing Apache Kafka, Apache NiFi and MiNiFi for EdgeAI IoT at Scale
Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Hail Hydrate! From Stream to Lake with Pulsar and Friends
Continuous SQL with Kafka and Flink
FLiP Stack for Cloud Data Lakes
BUILDING EVENT STREAMING MICROSERVICES WITH NiFI Stateless AND APACHE PULSAR
CLOUD NATIVE STREAMING
USING REAL_TIME DATA FEEDS
IOT STREAMING WITH MQTT, MINIFI AND PULSAR
BUILDING REAL_TIME WEB APPLICATIONS WITH WEBSOCKETS AND PULSAR
KAFKA STREAM PROCESSING WITH SQL
CODELESS PIPELINES WITH KAFKA AND PULSAR
BUILD A REAL_TIME PIPELINE NOW WITH PULSAR FUNCTIONS
Cloud Enterprise Data Platforms
Hybrid Cloud
Streaming with Flink, Kafka, NiFi
AI at the Edge with Microcontrollers and Small Devices
Voice Data In Queries
Event Handler as a Service (Automatic Kafka Message Reading)
More Powerful Parameter Based Modular Streaming
Cloud First For Big Data
Log Handling Moves to MiNiFi
Full AI At The Edge with Deployable Models
More Powerful Edge TPU/GPU/VPU
Kafka is everywhere
Open Source UI Driven Event Engines
FLaNK Stack gains popularity
FLINK Everywhere
Real-Time Stock Processing
Edge to AI: Analytics from the Edge
Utilizing Apache NiFi for IoT
Let's Build A Simple Ingest To Cloud Datawarehouse with Low Code
Learning the Basics of Apache NiFi for IoT
Introduction to Flank Stack
Introduction to Flip Stack
Introduction to Pulsar
Apache Deep Learning 101
Big Data DevOps
Automating Social Media
Accessing Feeds from Etherdelta on Trades
Vision Thing
Deep Dive into Apache NiFi
Apache NiFi : Ingesting Enterprise Data at Scale
Continous SQL with Pulsar and Flink
Apache NiFi Deep Dive 300
Smart Transit: Real-time Transit Information with FliP
Build in the Cloud
Streaming SQL and Data Flow
Real-Time Streaming Pipelines with FLaNK
Real-Time Streaming Pipelines with FLiP
Apache NiFi DevOps
Flink SQL for Continuous SQL & ETL
Next-Gen Apache NiFi
Ask the Experts
Hello, NiFi
Using Apache MXNet in Production Deep Learning Streaming Pipelines
From Stream to Lake
Upcoming Apache Pulsar and Apache Flink Talks - ApacheCon Asia and ApacheCon 2021
ApacheCon Asia 2021
#messaging
- 2021-08-06 16:10 GMT+8 - A PULSAR USE CASE IN FEDERATED LEARNING by JIAHAO CHEN - (CHINESE SESSION)
- 2021-08-06 14:50 GMT+8 - THE PRACTICE OF APACHE PULSAR IN BIGO by HANG CHE - (CHINESE SESSION)
- 2021-08-06 13:30 GMT+8 - APACHE BOOKKEEPER (AS A KEY VALUE STORE) AND ITS USE CASE by SHIVJI KUMAR JHA (ENGLISH SESSION)
- 2021-08-06 15:30 GMT+8 - FROM APACHE KAFKA TO APACHE PULSAR - SYSTEM MIGRATION GUIDE by Yabin Meng (CHINESE SESSION)
- 2021-08-08 13:30 GMT+8 - APACHE PULSAR BEST PRACTICES FOR LOGGING by BIN WEI (CHINESE SESSION)
- 2021-08-08 14:10 GMT+8 - APACHE PULSAR -- CLOUD NATIVE MESSAGE QUEUES IN PRACTICE AT TENCENT CLOUD by LIN LIN (CHINESE SESSION)
- 2021-08-08 14:50 GMT+8 - APACHE PULSAR APPLICATION AND PRACTICE UNDER TENCENT MILLION TOPICS by XIALONG RAN (CHINESE SESSION)
- 2021-08-08 15:30 GMT+8 - RBAC AUTHORIZATION IN PULSAR by ZIKE YANG (CHINESE SESSION)
- 2021-08-08 16:10 GMT+8 - THE JOURNEY OF APACHE PULSAR IN HUAWEI CLOUD INTERNET OF THINGS PLATFORM by HEZHANGJIAN (CHINESE SESSION)
#streaming
- 2021-08-07 16:10 GMT+8 - EVOLUTION AND TYPICAL SCENES OF FLINK-BASED REAL TIME COMPUTING PLATFORM IN QIHOO 360 by FAN XINPU (CHINESE SESSION)
- 2021-08-07 13:30 GMT+8 - FLINK'S GREATEST AND LATEST AT ALIBABA by YUAN MEI (CHINESE SESSION)
StreamNative - David Kjerrumgaard's Talk
In this talk I will present a technique for deploying machine learning models to provide real-time predictions using Apache Pulsar Functions. In order to provide a prediction in real-time, the model usually receives a single data point from the caller, and is expected to provide an accurate prediction within a few milliseconds.
Throughout this talk, I will demonstrate the steps required to productionize a fully-trained ML that predicts the delivery time for a food delivery service based upon real-time traffic information, the customer;s location, and the restaurant that will be fulfilling the order.
Speaker:
David Kjerrumgaard: David is the author of “Pulsar in Action”
StreamNative - Tim Spann's Talk
oday, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the edge before we start our real-time streaming flows. Fortunately using the all FLiP & FLaNK stacks we can do this with ease! Streaming AI Powered Analytics From the Edge to the Data Center is now a simple use case. With MiNiFi we can ingest the data, do data checks, cleansing, run machine learning and deep learning models and route our data in real-time to Apache NiFi and Apache Pulsar for further transformations and processing. Apache Flink will provide our advanced streaming capabilities fed real-time via Apache Pulsar topics. Apache MXNet models will run both at the edge and in our data centers via Apache NiFi and MiNiFi.
Tools: Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet
Speaker:
Timothy Spann: Tim Spann is a Developer Advocate at StreamNative where he works with Apache NiFi, MiniFi, Kafka, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
https://www.datainmotion.dev/p/about-me.html
https://dzone.com/users/297029/bunkertor.html https://dev.to/tspannhw
ApacheCon Global 2021
StreamNative Talks
Replicated Subscriptions: taking Apache Pulsar Geo-Replication to next level
Matteo Merli
Tuesday 18:00 UTC - Apache Deep Learning 302 - Tim Spann
Wednesday 15:00 UTC - Smart Transit: Real-Time Transit Information with FLaNK- Tim Spann
Thursday 14:10 UTC - Apache NiFi 101: Introduction and Best Practices - Tim Spann
Apache Flink and Apache Pulsar
- Tuesday 15:00 UTC
Building resilient and scalable API backends with Apache Pulsar and Spring Reactive
Lari Hotari - Wednesday 14:10 UTC
Apache Flink StateFun: A Platform-Independent Stateful Serverless Stack
Tzu-Li (Gordon) Tai - Wednesday 17:10 UTC
Pulsar Beam, HTTP streaming over Apache Pulsar
Ming Luo - Tuesday 15:00 UTC
Getting Started with Event Stream Processing via Apache Flink on Microsoft Azure
Israel Ekpo - Tuesday 15:50 UTC
Great Expectations: Data Lake as a Source to Apache Flink to Better Support Machine Learning Use Cases
Sofya Irwin, Charles Tan - Wednesday 14:10 UTC
Architectures and Trends for Event Stream Processing on Azure with Open Source Software
Israel Ekpo
A New FLiP!
A New FLiP!
As some have noticed, I have left Cloudera. It has been an incredible journey. I joined Hortonworks in April of 2016 and then we merged with Cloudera in 2019. This is was my first article on Apache NiFi https://lnkd.in/e4pxg43. I got to grow with Apache NiFi as it grew from 1.0 to 1.14 during my time! A lot of things changed, evolved and the tech grew so much.
I got to do my first major conference talks at DataWorks Summit which will always be one of my favorite event series ever. I am excited to be involved with Pulsar Summit (https://pulsar-summit.org/) and many other conferences now
My Final Tallies at Hortonworks/Cloudera:
11 videos on my Youtube channel https://lnkd.in/eeRRCJv
1,719 members Future of Data Meetup Princeton from 0
Over 48 Meetups events around the world
Over 230K Blog Views
Over 192 Blog Articles
344 DZone Articles for 3 Million Views https://lnkd.in/ejdbXte
Over 41 Conferences Spoken at.
Hosted One Mardis Gras at Client, it was awesome
60 Slideshares https://lnkd.in/eUgtpxY
266 Github Repos https://lnkd.in/eM9JGks
I got to work with some of the best tech people in the world and also the best people. I really enjoyed the community and the teamwork.
Reports from 2017, 2018, 2019, 2020
https://lnkd.in/exxZVJc
https://lnkd.in/ehX6RE6
https://lnkd.in/enhJgQs
https://lnkd.in/eFGzHYV
I am really excited at what we are doing at StreamNative with Apache Pulsar. I still get to work with the amazing ASF open source community and all the great Streaming friends with Apache Flink and Apache NiFi. I am working on a FLiP Stack to demonstrate some cool apps you can build with Flink, Pulsar and Friends. Stay tuned. I will remain involved in the Apache NiFi community and I have a talk on Apache NiFi at ApacheCon later this year.
Reference: https://www.linkedin.com/feed/update/urn:li:activity:6825792846759563264/
Upcoming Events 2021
Upcoming Events 2021
Scenic City Summit - 24-September-2021
ApacheCon 2021 - 21-September-2021 to 23-September-2021
- https://www.apachecon.com/acah2021/tracks/bigdatastream.html
- https://www.apachecon.com/acah2021/tracks/bigdata.html
- https://www.apachecon.com/acah2021/tracks/iot.html
Tuesday 17:10 UTC - Apache NIFi Deep Dive 300
Tuesday 18:00 UTC - Apache Deep Learning 302
Wednesday 15:00 UTC - Smart Transit: Real-Time Transit Information with FLaNK
Wednesday 17:10 UTC - Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Thursday 14:10 UTC - Apache NiFi 101: Introduction and Best Practices
Big Data Conference EU - 28-September-2021 to 29-September-2021
https://bigdataconference.eu/Timothy-J-Spann/
API World - 26-October-2021 to 28-October-2021
https://apiworld.co/conference/
Autoscaling Apache NiFi with Data Flow Experience on Kubernetes (K8) on AWS
Autoscaling Apache NiFi with Data Flow Experience on Kubernetes (K8) on AWS
https://www.clouddataops.dev/data-flow-experience
NiFi on Cloudera Data Platform Upgrade - April 2021
CFM 2.1.1 on CDP 7.1.6
https://docs.cloudera.com/cfm/2.1.1/release-notes/topics/cfm-whats-new.html
https://docs.cloudera.com/cfm/2.1.1/upgrade-paths/topics/cfm-upgrade-paths.html
For changes: https://www.datainmotion.dev/2021/02/new-features-of-apache-nifi-1130.html
Get your download on: https://docs.cloudera.com/cfm/2.1.1/download/topics/cfm-download-locations.html
To start researching for the future, take a look at some of the technical preview features around Easy Rules engine and handlers.
https://docs.cloudera.com/cfm/2.1.1/release-notes/topics/cfm-technical-preview.html
Make sure you use the latest possible JDK 8 as there are some bugs out there. Use a recent version of the JDK like 8u282 or newer.
Size your cluster correctly! https://docs.cloudera.com/cfm/2.1.1/nifi-sizing/topics/cfm-sizing-recommendations.html. Make sure you have at least 3 nodes.
References
- https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance
- https://docs.cloudera.com/cfm/2.1.1/site-to-site/topics/cdf-datahub-site-to-site.html
- https://docs.cloudera.com/cfm/2.1.1/release-notes/topics/cfm-whats-new.html
- https://blog.cloudera.com/no-data-loss-and-no-service-interruption-hdf-to-cfm-rolling-migration/
- https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.2
Populating Your Secure Cloud Data Estates
Populating Your Secure Cloud Data Estates
Hydrating Your Clean Cloud Data Lake
I am hard pressed to keep up with Data Store + Query terminology du jour. Was it Data Lake House? All these giant bodies of water mostly stored in buckets (S3)? I agree there are lots of nuances and many different query engines on top of those various means for storing that data. I don't think everytime we add a twist we need to add increasingly silly terms on top. Is it to confuse users? developers? data engineers? companies? executives? Perhaps if we change our data warehouse name again we can get them to buy the same thing again.
Clearly it can't be one size fit all for all this different things? I know a lot of companies of various types and sizes and most don't approach the size of the data that companies like Netflix and LinkedIn have. I really like their innovation, but often those projects get released and then wither in obscurity.
A few projects look really good:
- Apache Iceberg - I have a good feeling on this one. https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/69503.html https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/
- Apache Hudi https://hudi.apache.org/
For me, if I can do the basic CRUD operations that applications, reports, dashboards and queries require then it works for me. With Apache NiFi, Apache Kafka, Apache Spark and Apache Flink supporting a data store then it is should be good. The one thing I have to be wary of is that datastores like Apache Kudu, Apache HBase and HDFS have been around for a long time and have many of the production killing bugs flushed out of it, multiple company support and robust Open Source Apache communities around them. If a new project doesn't it won't survive, get traction or will just sit out there orphaned. Let's build on what we have and try not to have a million half supported projects that are often abandoned or of unknown status. Apache Parquet and Apache ORC have shown themselves as really solid and having engines like Apache Hive and Apache Impala to query them is really important. Apache Ozone is looking very interesting for when Object Stores are not available. http://ozone.apache.org/