FLiP Stack Weekly for 15-Jan-2023

 

15-Jan-2023

FLiP Stack Weekly

Welcome to the second newsletter of 2023. I was on vacation so a little light this week. Next week will be superheavy.

Tim Spann @PaaSDev

vacation

PODCAST

Take a look at recent podcasts in audio or video format.

https://www.buzzsprout.com/2062659/11463086-messaging-streaming-and-events-101-episode-1-of-crossing-the-streams

https://www.youtube.com/watch?v=U8aPBhlvDHU&feature=embimpwoyt

CODE + COMMUNITY

Join my meetup group NJ/NYC/Philly/Virtual. We will have a hybrid event on December 8th.

https://www.meetup.com/new-york-city-apache-pulsar-meetup/

This is Issue #66!!

https://github.com/tspannhw/FLiPStackWeekly

https://www.linkedin.com/pulse/2022-schedule-tim-spann

News

Apache Pulsar 2.11 Released!

https://pulsar.apache.org/download/

https://pulsar.apache.org/release-notes/versioned/pulsar-2.11.0/ https://pulsar.apache.org/docs/2.11.x/administration-pulsar-shell/#install-pulsar-shell

  • Pulsar Shell (https://github.com/apache/pulsar/issues/16250)
  • Multi Cloud Sync
  • Pulsar Server JDK 17
  • Python 2 support removed
  • Performance Improvements
  • HTTP Sink Function Added
  • and hundreds more...

For Pulsar Node.js client release details and downloads, visit: https://www.npmjs.com/package/pulsar-client/v/1.8.0

Release Notes are at: https://pulsar.apache.org/release-notes/versioned/pulsar-client-node-1.8.0/

Videos

https://www.youtube.com/watch?v=RWasN8h3528

Articles

https://pulsar.apache.org/blog/2023/01/10/pulsar-2022-year-in-review/

Events

DevOps

Jan 26, 2023: DevOps 2023

https://www.conf42.com/DevOps2023TimSpannmoderndatastreaming_apps

Feb 15, 2023: Scylla Summit. Virtual

https://www.scylladb.com/scylladb-summit-2023/

Feb 28, 2023: Spring One: Virtual https://tanzu.vmware.com/developer/tv/

March 3, 2023: Spring One: Virtual https://tanzu.vmware.com/developer/tv/

April 4-6, 2023: DevNexus: Atlanta, GA https://devnexus.com/

https://www.linkedin.com/pulse/schedule-2023-tim-spann-/

Tools

FLiP Stack Weekly for 06-Jan-2023

 

06-Jan-2023

FLiP Stack Weekly

Light week this week, stay tuned for talks on Spring, Apache Pinot, ScyllaDB, Apache Flink, Apache NiFi and more.

PODCAST

Take a look at recent podcasts in audio or video format.

https://www.buzzsprout.com/2062659/11463086-messaging-streaming-and-events-101-episode-1-of-crossing-the-streams

https://www.youtube.com/watch?v=U8aPBhlvDHU&feature=embimpwoyt

CODE + COMMUNITY

Join my meetup group NJ/NYC/Philly/Virtual. We will have a hybrid event on December 8th.

https://www.meetup.com/new-york-city-apache-pulsar-meetup/

This is Issue #65!!

https://github.com/tspannhw/FLiPStackWeekly

https://www.linkedin.com/pulse/2022-schedule-tim-spann

New Stuff

HTAP Virtual Summit

https://www.pingcap.com/htap-summit/auth/login/?next=/htap-summit/auth/watch/super-charging-real-time-analytics-at-scale

I am on PTO this week, but here's a few little tid bits.

https://medium.com/@tspann/2022-wrap-up-for-streaming-247cd21fd483?source=user_profile---------0----------------------------

https://medium.com/@tspann/building-real-time-schema-pipelines-from-messaging-topics-291a8d569130?source=user_profile---------2----------------------------

https://medium.com/@tspann/lets-check-our-stocks-from-finnhub-and-do-some-real-time-analytics-1b7963008e19?source=user_profile---------3----------------------------

https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d7395d714

New update on Pulsar 2.10.3

https://pulsar.apache.org/release-notes/

ARTICLES

CODE

VIDEOS

https://www.youtube.com/watch?v=MnErwxQ0q_k

TOOLS

Consuming Streaming Stocks Data with Python, Websockets and Pulsar

 https://medium.com/@tspann/lets-check-our-stocks-from-finnhub-and-do-some-real-time-analytics-1b7963008e19


Let’s Check Our Stocks From FinnHub and Do Some Real-Time Analytics

Codehttps://github.com/tspannhw/FLiPN-Py-Stocks

The easiest application to build is a simple Python application since finnhub includes the basics in their documentation. We are going to use their free WEBSOCKET interface to Trades so we can get real-time events as they happen. We will get JSON data for each trade triggered.

Python App

Python application receives websocket stream of JSON arrays and sends individual JSON messages with a JSON schema.

architecture

Raw Data

{"data":[{"c":["1","8","24","12"],"p":122.1,"s":"TSLA","t":1672348887195,"v":1},{"c":["1","8","24","12"],"p":122.09,"s":"TSLA","t":1672348887196,"v":4},{"c":["1","8","24","12"],"p":122.09,"s":"TSLA","t":1672348887196,"v":10},{"c":["1","8","24","12"],"p":122.1,"s":"TSLA","t":1672348887196,"v":1},{"c":["1","8","24","12"],"p":122.1,"s":"TSLA","t":1672348887196,"v":2},{"c":["1","8","24","12"],"p":122.1,"s":"TSLA","t":1672348887196,"v":10},{"c":["1","8","24","12"],"p":122.1,"s":"TSLA","t":1672348887198,"v":79},{"c":["1","24","12"],"p":129.58,"s":"AAPL","t":1672348887666,"v":1},{"c":["1","24","12"],"p":129.575,"s":"AAPL","t":1672348887785,"v":1}],"type":"trade"}
{"c":["1","8","24","12"],"p":122.1,"s":"TSLA","t":1672348887195,"v":1}

Data Description

data
List of trades or price updates.
s
Symbol.
p
Last price.
t
UNIX milliseconds timestamp.
v
Volume.
c
List of trade conditions. A comprehensive list of trade conditions code can be found here

Let’s Build a Schema for our JSON data. Once we have a class definied for it in Python, we can send that to an Apache Pulsar cluster and it will generate the first version of our schema for us. When we have a schema it lets us treat that data as a table in Trino, Spark SQL and Flink SQL. So this is awesome.

By defining our data and making it fully structured with a schema even though it is still semi-structured JSON, it makes it very easy to work with. We know what we are getting. This will make it easier to stream into Apache Pinot, Apache Iceberg, Delta Lake or another analytics system.

class Stock (Record):
symbol = String()
ts = Float()
currentts = Float()
volume = Float()
price = Float()
tradeconditions = String()
uuid = String()

We then connect to our Pulsar cluster, very easy in Python.

client = pulsar.Client(‘pulsar://localhost:6650’)
producer = client.create_producer(topic=’persistent://public/default/stocks’ ,schema=JsonSchema(Stock),properties={“producer-name”: “py-stocks”,”producer-id”: “pystocks1” })

If we have never used this topic before, Pulsar will create it for you. For best practices, build your tenant, namespace and topic before your application while you are defining schemas and data contracts.

For more information on the Python interface for Pulsar, check out this link.

NEWBIE HINT:

For a free cluster and training, check out this training academy.

Example Python Projects

For all the real newbies, here is the real getting started.

Consume Pulsar Data

bin/pulsar-client consume "persistent://public/default/stocks" -s stocks-reader -n 0
----- got message -----
key:[20221230191756_42a4752d-5f66-4245-8153-a5ec8478f738], properties:[], content:{
"symbol": "AAPL",
"ts": 1672427874976.0,
"currentts": 20221230191756.0,
"volume": 10.0,
"price": 128.055,
"tradeconditions": "1 12",
"uuid": "20221230191756_42a4752d-5f66-4245-8153-a5ec8478f738"
}
----- got message -----
key:[20221230191756_a560a594-7c12-42e7-a76d-6650a48533e0], properties:[], content:{
"symbol": "TSLA",
"ts": 1672427874974.0,
"currentts": 20221230191756.0,
"volume": 100.0,
"price": 120.94,
"tradeconditions": "",
"uuid": "20221230191756_a560a594-7c12-42e7-a76d-6650a48533e0"
}

References