Showing posts with label apache pulsar. Show all posts
Showing posts with label apache pulsar. Show all posts

Utilizing Apache Pulsar to Populate Apache Iceberg and Apache Parquet based Lakehouses

 

FLiP-Pi-Iceberg-Thermal

Apache Iceberg + Apache Pulsar + Thermal Sensor Data from a Raspberry Pi

ice

Steps

  • Run Apache Pulsar 2.10.2 (standalone, docker, baremetal cluster, VM cluster, K8 cluster, AWS Marketplace Pulsar, StreamNative Cloud)
  • Run Apache Iceberg (docker, ...) 1.1.0
  • Run Apache Spark 3.2
  • Deploy Pulsar connector
  • Send data to Pulsar topic
  • Query Iceberg in Spark

Sensor Python App Sending messages


{'uuid': 'thrml_wse_20221216202136', 'ipaddress': '192.168.1.179', 'cputempf': 122, 'runtime': 0, 'host': 'thermal', 'hostname': 'thermal', 'macaddress': 'e4:5f:01:7c:3f:34', 'endtime': '1671222096.350368', 'te': '0.0005612373352050781', 'cpu': 5.5, 'diskusage': '101858.0 MB', 'memory': 9.9, 'rowid': '20221216202136_a0e9eae8-3b4f-4222-95c6-7657ba0e12e2', 'systemtime': '12/16/2022 15:21:41', 'ts': 1671222101, 'starttime': '12/16/2022 15:21:36', 'datetimestamp': '2022-12-16 20:21:40.012859+00:00', 'temperature': 30.5959, 'humidity': 26.07, 'co2': 767.0, 'totalvocppb': 0.0, 'equivalentco2ppm': 400.0, 'pressure': 99773.53, 'temperatureicp': 86.0}

Pulsar Sink Deploy

bin/pulsar-admin sink stop --name iceberg_sink --namespace default --tenant public

bin/pulsar-admin sinks delete --tenant public --namespace default --name iceberg_sink

bin/pulsar-admin sink create --sink-config-file conf/iceberg.json

Pulsar Sink Status

bin/pulsar-admin sinks status --tenant public --namespace default --name iceberg_sink

{
  "numInstances" : 1,
  "numRunning" : 1,
  "instances" : [ {
    "instanceId" : 0,
    "status" : {
      "running" : true,
      "error" : "",
      "numRestarts" : 0,
      "numReadFromPulsar" : 10,
      "numSystemExceptions" : 0,
      "latestSystemExceptions" : [ ],
      "numSinkExceptions" : 0,
      "latestSinkExceptions" : [ ],
      "numWrittenToSink" : 10,
      "lastReceivedTime" : 1671220772536,
      "workerId" : "c-standalone-fw-127.0.0.1-8080"
    }
  } ]
}

Iceberg data written via Pulsar Lakehouse Cloud Sink

ls -lt /Users/tspann/Downloads/iceberg/iceberg_sink_test/ice_sink_thermal
total 0
drwxr-xr-x  94 tspann  staff  3008 Dec 16 15:45 metadata
drwxr-xr-x  34 tspann  staff  1088 Dec 16 15:45 data

ice_sink_thermal/metadata
total 856
-rw-r--r--  1 tspann  staff      2 Dec 16 15:45 version-hint.text
-rw-r--r--  1 tspann  staff  19283 Dec 16 15:45 v15.metadata.json
-rw-r--r--  1 tspann  staff   4352 Dec 16 15:45 snap-8802315029762513718-1-78627844-0d69-4c2e-87db-016b9fdac119.avro
-rw-r--r--  1 tspann  staff   7536 Dec 16 15:45 78627844-0d69-4c2e-87db-016b9fdac119-m0.avro
-rw-r--r--  1 tspann  staff  18303 Dec 16 15:43 v14.metadata.json
-rw-r--r--  1 tspann  staff   4315 Dec 16 15:43 snap-1218246990201737819-1-1253a40d-fae5-4919-9d71-be51af402899.avro

iceberg_sink_test/ice_sink_thermal/data
total 360
-rw-r--r--  1 tspann  staff  9771 Dec 16 15:45 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00014.parquet
-rw-r--r--  1 tspann  staff  9782 Dec 16 15:43 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00013.parquet
-rw-r--r--  1 tspann  staff  9733 Dec 16 15:41 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00012.parquet
-rw-r--r--  1 tspann  staff  9637 Dec 16 15:39 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00011.parquet
-rw-r--r--  1 tspann  staff  9722 Dec 16 15:37 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00010.parquet
-rw-r--r--  1 tspann  staff  9663 Dec 16 15:35 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00009.parquet
-rw-r--r--  1 tspann  staff  9671 Dec 16 15:33 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00008.parquet
-rw-r--r--  1 tspann  staff  9652 Dec 16 15:31 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00007.parquet
-rw-r--r--  1 tspann  staff  9716 Dec 16 15:29 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00006.parquet
-rw-r--r--  1 tspann  staff  9731 Dec 16 15:27 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00005.parquet
-rw-r--r--  1 tspann  staff  9639 Dec 16 15:25 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00004.parquet
-rw-r--r--  1 tspann  staff  9721 Dec 16 15:23 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00003.parquet
-rw-r--r--  1 tspann  staff  7414 Dec 16 15:21 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00002.parquet
-rw-r--r--  1 tspann  staff  8492 Dec 16 14:59 00000-1-951a37fc-5069-4201-94fa-4ef9975f6293-00001.parquet
-rw-r--r--  1 tspann  staff  7978 Dec 16 14:56 00000-1-2acfc6ba-4f49-44c7-8f17-1f3491484fd1-00001.parquet
-rw-r--r--  1 tspann  staff  7886 Dec 16 14:54 00000-1-7e714ed7-0ba5-41a4-b8e1-1e1d261e3b83-00001.parquet

Schema Embedded in Parquet File

 {"type":"struct","schema-id":0,"fields":[{"id":1,"name":"uuid","required":true,"type":"string"},{"id":2,"name":"ipaddress","required":true,"type":"string"},{"id":3,"name":"cputempf","required":true,"type":"int"},{"id":4,"name":"runtime","required":true,"type":"int"},{"id":5,"name":"host","required":true,"type":"string"},{"id":6,"name":"hostname","required":true,"type":"string"},{"id":7,"name":"macaddress","required":true,"type":"string"},{"id":8,"name":"endtime","required":true,"type":"string"},{"id":9,"name":"te","required":true,"type":"string"},{"id":10,"name":"cpu","required":true,"type":"float"},{"id":11,"name":"diskusage","required":true,"type":"string"},{"id":12,"name":"memory","required":true,"type":"float"},{"id":13,"name":"rowid","required":true,"type":"string"},{"id":14,"name":"systemtime","required":true,"type":"string"},{"id":15,"name":"ts","required":true,"type":"int"},{"id":16,"name":"starttime","required":true,"type":"string"},{"id":17,"name":"datetimestamp","required":true,"type":"string"},{"id":18,"name":"temperature","required":true,"type":"float"},{"id":19,"name":"humidity","required":true,"type":"float"},{"id":20,"name":"co2","required":true,"type":"float"},{"id":21,"name":"totalvocppb","required":true,"type":"float"},{"id":22,"name":"equivalentco2ppm","required":true,"type":"float"},{"id":23,"name":"pressure","required":true,"type":"float"},{"id":24,"name":"temperatureicp","required":true,"type":"float"}]}Jparquet-mr version 1.12.0 (build db75a6815f2ba1d1ee89d1a90aeb296f1f3a8f20)
 

Validate our Parquet Files

pip3.9 install parquet-tools -U
 
parquet-tools inspect ice_sink_thermal/data/00000-1-7e714ed7-0ba5-41a4-b8e1-1e1d261e3b83-00001.parquet

############ file meta data ############
created_by: parquet-mr version 1.12.0 (build db75a6815f2ba1d1ee89d1a90aeb296f1f3a8f20)
num_columns: 24
num_rows: 4
num_row_groups: 1
format_version: 1.0
serialized_size: 4577


############ Columns ############
uuid
ipaddress
cputempf
runtime
host
hostname
macaddress
endtime
te
cpu
diskusage
memory
rowid
systemtime
ts
starttime
datetimestamp
temperature
humidity
co2
totalvocppb
equivalentco2ppm
pressure
temperatureicp

############ Column(uuid) ############
name: uuid
path: uuid
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: 28%)

############ Column(ipaddress) ############
name: ipaddress
path: ipaddress
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: -67%)

############ Column(cputempf) ############
name: cputempf
path: cputempf
max_definition_level: 0
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -38%)

############ Column(runtime) ############
name: runtime
path: runtime
max_definition_level: 0
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -85%)

############ Column(host) ############
name: host
path: host
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: -74%)

############ Column(hostname) ############
name: hostname
path: hostname
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: -74%)

############ Column(macaddress) ############
name: macaddress
path: macaddress
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: -62%)

############ Column(endtime) ############
name: endtime
path: endtime
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: 17%)

############ Column(te) ############
name: te
path: te
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: 16%)

############ Column(cpu) ############
name: cpu
path: cpu
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -75%)

############ Column(diskusage) ############
name: diskusage
path: diskusage
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: -69%)

############ Column(memory) ############
name: memory
path: memory
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -85%)

############ Column(rowid) ############
name: rowid
path: rowid
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: 33%)

############ Column(systemtime) ############
name: systemtime
path: systemtime
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: 33%)

############ Column(ts) ############
name: ts
path: ts
max_definition_level: 0
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -41%)

############ Column(starttime) ############
name: starttime
path: starttime
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: 33%)

############ Column(datetimestamp) ############
name: datetimestamp
path: datetimestamp
max_definition_level: 0
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: GZIP (space_saved: 37%)

############ Column(temperature) ############
name: temperature
path: temperature
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -51%)

############ Column(humidity) ############
name: humidity
path: humidity
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -49%)

############ Column(co2) ############
name: co2
path: co2
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -51%)

############ Column(totalvocppb) ############
name: totalvocppb
path: totalvocppb
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -85%)

############ Column(equivalentco2ppm) ############
name: equivalentco2ppm
path: equivalentco2ppm
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -75%)

############ Column(pressure) ############
name: pressure
path: pressure
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -49%)

############ Column(temperatureicp) ############
name: temperatureicp
path: temperatureicp
max_definition_level: 0
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -85%)

Setup

  • Download Spark 3.2_2.12
  • Download iceberg-spark-runtime-3.2_2.12:1.1.0

Run Spark Shell

bin/spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.1.0\
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
    --conf spark.sql.catalog.spark_catalog.type=hive \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=/Users/tspann/Downloads/iceberg/iceberg_sink_test

Spark Shell

desc local.ice_sink_thermal;
uuid                    string
ipaddress               string
cputempf                int
runtime                 int
host                    string
hostname                string
macaddress              string
endtime                 string
te                      string
cpu                     float
diskusage               string
memory                  float
rowid                   string
systemtime              string
ts                      int
starttime               string
datetimestamp           string
temperature             float
humidity                float
co2                     float
totalvocppb             float
equivalentco2ppm        float
pressure                float
temperatureicp          float

# Partitioning
Not partitioned

select * from local.ice_sink_thermal limit 10;

thrml_zlq_20221216202731    192.168.1.179   122 0   thermal thermal e4:5f:01:7c:3f:34   1671222451.063736   0.0005979537963867188   11.1    101858.0 MB 10.0    20221216202731_e414e311-7928-4408-91de-b44666cd14db 12/16/2022 15:27:35 1671222455  12/16/2022 15:27:31 2022-12-16 20:27:34.827283+00:00    26.8868 31.76   771.0   14.0    405.0   99783.77    86.0
thrml_fpa_20221216202735    192.168.1.179   122 0   thermal thermal e4:5f:01:7c:3f:34   1671222455.8950574  0.00046181678771972656  5.5 101858.0 MB 10.0    20221216202735_92a2af14-ebcb-42fd-a935-5a01a47ff95e 12/16/2022 15:27:40 1671222460  12/16/2022 15:27:35 2022-12-16 20:27:39.659337+00:00    26.8761 31.74   771.0   6.0 400.0   99784.8 86.0
thrml_gpv_20221216202740    192.168.1.179   122 0   thermal thermal e4:5f:01:7c:3f:34   1671222460.7061012  0.0006053447723388672   5.5 101858.0 MB 10.0    20221216202740_68f88218-1b40-4f0a-85e3-3fb8f447d65b 12/16/2022 15:27:45 1671222465  12/16/2022 15:27:40 2022-12-16 20:27:44.369295+00:00    26.8761 31.72   770.0   7.0 65535.0 99782.18    85.0
thrml_gxu_20221216202745    192.168.1.179   121 0   thermal thermal e4:5f:01:7c:3f:34   1671222465.500708   0.0006241798400878906   6.0 101858.0 MB 10.0    20221216202745_28685c57-88f5-422b-be23-b32ea12a0d75 12/16/2022 15:27:50 1671222470  12/16/2022 15:27:45 2022-12-16 20:27:49.161722+00:00    26.8681 31.83   771.0   12.0    65535.0 99778.97    85.0
thrml_nyn_20221216202750    192.168.1.179   122 0   thermal thermal e4:5f:01:7c:3f:34   1671222470.212329   0.0006175041198730469   5.5 101858.0 MB 10.0    20221216202750_bbbb7fa7-cebf-4414-b538-30ce84828cea 12/16/2022 15:27:55 1671222475  12/16/2022 15:27:50 2022-12-16 20:27:53.976872+00:00    26.8601 31.78   771.0   6.0 65535.0 99783.55    86.0
thrml_oxl_20221216202755    192.168.1.179   123 0   thermal thermal e4:5f:01:7c:3f:34   1671222475.1313043  0.0006723403930664062   6.6 101858.0 MB 10.0    20221216202755_bda36e1c-246f-4d99-8b39-b558111a1d9e 12/16/2022 15:27:59 1671222479  12/16/2022 15:27:55 2022-12-16 20:27:58.794626+00:00    26.8307 31.73   771.0   7.0 65535.0 99785.26    86.0
thrml_rvg_20221216202759    192.168.1.179   121 0   thermal thermal e4:5f:01:7c:3f:34   1671222479.8391178  0.00048804283142089844  5.5 101858.0 MB 10.4    20221216202759_25b8ee3b-6b59-42e6-8c4c-a8419b76ea40 12/16/2022 15:28:04 1671222484  12/16/2022 15:27:59 2022-12-16 20:28:03.601685+00:00    26.8441 31.76   771.0   6.0 406.0   99786.36    86.0
thrml_wbl_20221216202804    192.168.1.179   123 0   thermal thermal e4:5f:01:7c:3f:34   1671222484.6672618  0.0006029605865478516   5.5 101858.0 MB 10.0    20221216202804_7bd3de16-dbcd-4107-8ab0-b184a2eaf523 12/16/2022 15:28:09 1671222489  12/16/2022 15:28:04 2022-12-16 20:28:08.431971+00:00    26.8842 31.79   770.0   9.0 65535.0 99787.27    86.0
thrml_vwj_20221216202809    192.168.1.179   122 0   thermal thermal e4:5f:01:7c:3f:34   1671222489.4752936  0.0006010532379150391   9.9 101858.0 MB 10.0    20221216202809_fed14b6d-b211-48ad-b31f-0a486b8d0f0d 12/16/2022 15:28:14 1671222494  12/16/2022 15:28:09 2022-12-16 20:28:13.137929+00:00    26.9002 31.78   770.0   5.0 400.0   99782.4 86.0
thrml_oox_20221216202814    192.168.1.179   123 0   thermal thermal e4:5f:01:7c:3f:34   1671222494.2805836  0.0005502700805664062   4.8 101858.0 MB 10.0    20221216202814_89526e7c-980b-4dbe-8257-c1c6944cdbd3 12/16/2022 15:28:18 1671222498  12/16/2022 15:28:14 2022-12-16 20:28:17.943010+00:00    26.9162 31.72   769.0   1.0 65535.0 99789.35    86.0
Time taken: 0.835 seconds, Fetched 10 row(s)

select uuid, ts, datetimestamp, co2, humidity, pressure, temperature 
from local.ice_sink_thermal limit 10;

thrml_rwa_20221216203730    1671223055  2022-12-16 20:37:34.204439+00:00    783.0   32.22   99792.16    26.7613
thrml_qqs_20221216203735    1671223060  2022-12-16 20:37:39.013362+00:00    783.0   32.23   99791.39    26.78
thrml_szi_20221216203740    1671223064  2022-12-16 20:37:43.755430+00:00    784.0   32.14   99794.28    26.7934
thrml_rvb_20221216203744    1671223069  2022-12-16 20:37:48.563791+00:00    784.0   32.16   99796.71    26.8147
thrml_rto_20221216203749    1671223074  2022-12-16 20:37:53.373391+00:00    783.0   32.11   99796.0 26.8575
thrml_svv_20221216203754    1671223079  2022-12-16 20:37:58.184190+00:00    783.0   32.07   99791.21    26.8842
thrml_aov_20221216203759    1671223084  2022-12-16 20:38:02.991664+00:00    782.0   31.95   99794.04    26.9082
thrml_tzs_20221216203804    1671223088  2022-12-16 20:38:07.802992+00:00    782.0   32.02   99794.33    26.9322
thrml_fso_20221216203808    1671223093  2022-12-16 20:38:12.613666+00:00    783.0   32.05   99792.42    26.9589
thrml_czp_20221216203813    1671223098  2022-12-16 20:38:17.321001+00:00    783.0   31.98   99790.26    26.9803
Time taken: 0.898 seconds, Fetched 10 row(s)

References

  • https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal
  • https://iceberg.apache.org/docs/latest/getting-started/
  • https://github.com/streamnative/pulsar-io-lakehouse/blob/master/docs/lakehouse-sink.md
  • https://streamnative.io/blog/release/2022-12-14-announcing-the-iceberg-sink-connector-for-apache-pulsar/
  • https://hub.streamnative.io/connectors/lakehouse-sink/v2.10.1.12/
  • https://github.com/tspannhw/FLiP-Pi-Thermal
  • https://dzone.com/articles/pulsar-in-python-on-pi
  • https://github.com/tabular-io/docker-spark-iceberg
  • https://iceberg.apache.org/docs/latest/getting-started/
  • https://stackoverflow.com/questions/73791829/delta-lake-sink-connector-for-apache-pulsar-with-minio-throws-java-lang-illegal
  • https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/

2022/2023 - Tim Spann - @PaaSDev

December 5, 2022: FLiP Stack Weekly

 


December 5, 2022

FLiP Stack Weekly

A few good talks and some cool stuff for the rest of the year.

Check out our channel:

https://www.youtube.com/@streamnativecommunity8124/featured

New Stuff

https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.19.0

A quick preview of Apache Pulsar + Apache Pinot.

Arch




type="application/x-shockwave-flash"
wmode="transparent" width="425" height="350" />

https://youtu.be/KMbTlmoDXXA

https://github.com/tspannhw/pulsar-thermal-pinot

PODCAST

Take a look at recent podcasts in audio or video format.

https://www.buzzsprout.com/2062659/11463086-messaging-streaming-and-events-101-episode-1-of-crossing-the-streams

https://www.youtube.com/watch?v=U8aPBhlvDHU&feature=emb_imp_woyt

CODE + COMMUNITY

Join my meetup group NJ/NYC/Philly/Virtual. We will have a hybrid event on December 8th.

https://www.meetup.com/new-york-city-apache-pulsar-meetup/

This is Issue #61!!

https://github.com/tspannhw/FLiPStackWeekly

https://www.linkedin.com/pulse/2022-schedule-tim-spann

VIDEOS

https://youtu.be/7Yih40Gcr-w

https://youtu.be/y6kbRZae4TI




type="application/x-shockwave-flash"
wmode="transparent" width="425" height="350" />

https://www.youtube.com/embed/yU3UVhLz1Io




type="application/x-shockwave-flash"
wmode="transparent" width="425" height="350" />

https://www.youtube.com/watch?v=346PAVtrJNE




type="application/x-shockwave-flash"
wmode="transparent" width="425" height="350" />

https://www.youtube.com/watch?v=C684XAivnqg




type="application/x-shockwave-flash"
wmode="transparent" width="425" height="350" />

https://www.youtube.com/watch?v=mAKAP71JfQs




type="application/x-shockwave-flash"
wmode="transparent" width="425" height="350" />

ARTICLES

Cool award winners and friends: Apache Pulsar, Apache Iceberg, Starburst, Cloudera Data Platform, Apache Flink, Nvidia Jetson AGX Orin, Delta Lake, Databricks, Redis.

https://www.hpcwire.com/off-the-wire/bigdatawire-formerly-datanami-reveals-winners-of-2022-readers-and-editors-choice-awards/

https://streamnative.io/blog/community/2022-12-01-pulsar-summit-asia-2022-recap/

https://betterprogramming.pub/going-native-with-spring-boot-3-ga-4e8d91ab21d3

https://github.com/riferrei/devrel-mastodon

https://streamnative.io/blog/engineering/2022-11-29-spring-into-pulsar-part-2-spring-based-microservices-for-multiple-protocols-with-apache-pulsar/

https://www.ververica.com/blog/the-release-of-flink-cdc-2.3

https://inlong.apache.org/docs/quick_start/pulsar_example

TALKS

https://www.slideshare.net/bunkertor/machine-intelligence-guild-build-ml-enhanced-event-streaming-applications-with-java-and-python-microservices

https://www.meetup.com/real-time-analytics-meetup-ny/events/290037437/

TRAINING

Dates are Dec 5 - Dec 8, 2022

Link to register: https://www.eventbrite.com/e/463731161387

Dates are January 17 - 19, 2023 from 2pm - 5pm CET / 8am - 12pm EST
Link to register: https://www.eventbrite.com/e/465055021087

https://streamnative.io/training/

CODE

https://github.com/timeplus-io/pulsar-io-sink

https://github.com/spring-projects/spring-aot-smoke-tests/tree/main/integration/spring-pulsar

https://github.com/streamnative/psat_exercise_code

EVENTS

Dec 8, 2022: Open Source Summit Finance NYC

https://events.linuxfoundation.org/open-source-finance-forum-new-york/

Dec 14, 2022: Manhattan, NYC: Pulsar + Pinot Meetup

https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289817171/

Dec 15, 2022: TigerLabs, Princeton, NJ: Pulsar + NiFi + Flink Meetup

https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289674210/

Data Science Camp Online

https://dscamp.org/

Machine Intelligence Guild

HTAP Summit

Coming soon.

Pinot / Pulsar Meetup

Coming soon.

CockroachDB NYC Meetup

Hazelcast Event

https://docs.hazelcast.com/hazelcast/5.1/integrate/pulsar-connector

TOOLS

TIPS


pip3.9 install 'pulsar-client[all]'

Google Sheet Hack!   Import Any RSS Feed to a Sheet

=ImportFeed("http://example.com/feed")


`

JOBS

https://jobs.lever.co/stream-native/74d75fc1-1ad7-40a0-b907-66d8ac86009a

November 15-19, 2022 FLiP Stack Weekly

 # November 15-19, 2022 FLiP Stack Weekly



#### More talks coming this month!   Join Tim around the virtual world at various conferences.   



![MerryChristmas](https://github.com/tspannhw/FLiPStackWeekly/raw/main/images/merrychristmasflipstack.jpg)



### Podcast


Take a look at recent podcasts in audio or video format.


[https://www.buzzsprout.com/2062659/11463086-messaging-streaming-and-events-101-episode-1-of-crossing-the-streams](https://www.buzzsprout.com/2062659/11463086-messaging-streaming-and-events-101-episode-1-of-crossing-the-streams)


[https://www.youtube.com/watch?v=U8aPBhlvDHU&feature=emb_imp_woyt](https://www.youtube.com/watch?v=U8aPBhlvDHU&feature=emb_imp_woyt)


### CODE + COMMUNITY



Join my meetup group NJ/NYC/Philly/Virtual.   We will have a hybrid event on December 8th.


[https://www.meetup.com/new-york-city-apache-pulsar-meetup/](https://www.meetup.com/new-york-city-apache-pulsar-meetup/

)


**This is Issue #58!!**


[https://github.com/tspannhw/FLiPStackWeekly](https://github.com/tspannhw/FLiPStackWeekly)


[https://www.linkedin.com/pulse/2022-schedule-tim-spann](https://www.linkedin.com/pulse/2022-schedule-tim-spann)



#### Upcoming Events


I’ll be speaking at #PulsarSummit Asia 2022! Join the #ApachePulsar community and RSVP for the webinar: https://streamnative.zoom.us/webinar/register/8516668631400/WN_qKibcbEFTxKv6-MszyFeAg.


![Pulsar 101](https://github.com/tspannhw/FLiPStackWeekly/raw/main/images/Timothy%20Spann%20101.jpg)


*.  


![ModernApps](https://github.com/tspannhw/FLiPStackWeekly/raw/main/images/Timothy%20Spann.jpg)




#### Videos


[https://www.youtube.com/watch?v=ehlK_d-ItG0](https://www.youtube.com/watch?v=ehlK_d-ItG0)


<iframe width="560" height="315" src="https://www.youtube.com/embed/ehlK_d-ItG0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>



[https://www.youtube.com/watch?v=XH0nsTttSY0](https://www.youtube.com/watch?v=XH0nsTttSY0)



[https://www.youtube.com/watch?v=K-I2DJYIkTg&t=6s](https://www.youtube.com/watch?v=K-I2DJYIkTg&t=6s)


<iframe width="560" height="315" src="https://www.youtube.com/embed/K-I2DJYIkTg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>



#### Spring for Pulsar on Secure Cloud


[https://github.com/spring-projects-experimental/spring-pulsar/wiki/Stream-Native-Cloud](https://github.com/spring-projects-experimental/spring-pulsar/wiki/Stream-Native-Cloud)


[https://spring.io/blog/2022/11/16/spring-framework-6-0-goes-ga](https://spring.io/blog/2022/11/16/spring-framework-6-0-goes-ga)



### CODE


[https://pulsar.apache.org/release-notes/versioned/client-cpp-3.0.0/](https://pulsar.apache.org/release-notes/versioned/client-cpp-3.0.0/)


[https://github.com/tspannhw/spring-pulsar-gtfsrealtime](https://github.com/tspannhw/spring-pulsar-gtfsrealtime)


[https://github.com/tspannhw/FLiPN-FLaNK-KafkaConnectToMoP](https://github.com/tspannhw/FLiPN-FLaNK-KafkaConnectToMoP)


[https://github.com/tspannhw/pulsar-adsb-cockroachdb](https://github.com/tspannhw/pulsar-adsb-cockroachdb)


[https://github.com/tspannhw/pulsar-adsb-function](https://github.com/tspannhw/pulsar-adsb-function)



### ARTICLES


[https://www.mparticle.com/blog/apache-pulsar-migration/](https://www.mparticle.com/blog/apache-pulsar-migration/)


[https://streamnative.io/blog/case/2022-11-15-how-proxima-beta-implemented-cqrs-and-event-sourcing-on-top-of-apache-pulsar-and-scylladb/](https://streamnative.io/blog/case/2022-11-15-how-proxima-beta-implemented-cqrs-and-event-sourcing-on-top-of-apache-pulsar-and-scylladb/)


[https://medium.com/@tspann/gtfs-real-time-feed-ingest-with-java-67e0d324cfc4](https://medium.com/@tspann/gtfs-real-time-feed-ingest-with-java-67e0d324cfc4)


[https://www.immerok.io/blog/apache-flink-newsletter-nov-2022](https://www.immerok.io/blog/apache-flink-newsletter-nov-2022)


[https://nvidianews.nvidia.com/news/nvidia-microsoft-accelerate-cloud-enterprise-ai](https://nvidianews.nvidia.com/news/nvidia-microsoft-accelerate-cloud-enterprise-ai)


[https://verraes.net/2019/05/patterns-for-decoupling-distsys-passage-of-time-event/](https://verraes.net/2019/05/patterns-for-decoupling-distsys-passage-of-time-event/)


[https://streamnative.io/blog/community/2022-11-04-announcing-pulsar-summit-asia-2022-conference-schedule/](https://streamnative.io/blog/community/2022-11-04-announcing-pulsar-summit-asia-2022-conference-schedule/)


[https://dl.acm.org/doi/fullHtml/10.1145/3531146.3533231](https://dl.acm.org/doi/fullHtml/10.1145/3531146.3533231)



### TRAINING


Dates are Dec 5 - Dec 8, 2022


Link to register: [https://www.eventbrite.com/e/463731161387](https://www.eventbrite.com/e/463731161387)



Dates are January 17 - 19, 2023 from 2pm - 5pm CET / 8am - 12pm EST

Link to register: [https://www.eventbrite.com/e/465055021087](https://www.eventbrite.com/e/465055021087)


[https://streamnative.io/training/](https://streamnative.io/training/)




### EVENTS



Nov 20, 2022 9:00 pm. Pulsar Summit Asia. Virtual.


[https://pulsar-summit.org/event/asia-2022/schedule](https://pulsar-summit.org/event/asia-2022/schedule)



Nov 23, 2022: Big Data EU. Virtual.


[https://bigdataconference.eu/timothy-spann/](https://bigdataconference.eu/timothy-spann/)



Nov 29, 2022: Machine Intelligence Guild Speaker of the Month. Virtual.


Coming soon.


Dec 14, 2022: Manhattan, NYC:  Pulsar + Pinot Meetup



Dec 15, 2022: TigerLabs, Princeton, NJ: Pulsar + NiFi + Flink Meetup


[https://www.meetup.com/technologysolutionshub/events/289756167/](https://www.meetup.com/technologysolutionshub/events/289756167/)


**HTAP Summit**


Coming soon.


**Pinot / Pulsar Meetup**


Coming soon.


**CockroachDB NYC Meetup**



**Hazelcast Event**


[https://docs.hazelcast.com/hazelcast/5.1/integrate/pulsar-connector](https://docs.hazelcast.com/hazelcast/5.1/integrate/pulsar-connector)



### TOOLS


* [https://github-business-card.vercel.app/](https://github-business-card.vercel.app/)


* [https://github.com/Eugeny/tabby/releases/tag/v1.0.186](https://github.com/Eugeny/tabby/releases/tag/v1.0.186)

 

* [https://tabby.sh/](https://tabby.sh/)

 

* [https://github.com/aleixrodriala/wa-tunnel](https://github.com/aleixrodriala/wa-tunnel)

 

* [https://github.com/awslabs/deequ](https://github.com/awslabs/deequ)

 

* [https://github.com/MonitorControl/MonitorControl](https://github.com/MonitorControl/MonitorControl)


* [https://github.com/sibprogrammer/xq](https://github.com/sibprogrammer/xq)


* [https://github.com/danielgatis/rembg](https://github.com/danielgatis/rembg)


* [https://pomochat.com/](https://pomochat.com/)


* [https://github.com/enso-org/enso](https://github.com/enso-org/enso)


* [https://github.com/streamnative/pulsar-io-cloud-storage](https://github.com/streamnative/pulsar-io-cloud-storage)


* [https://github.com/FusionAuth/java-http](https://github.com/FusionAuth/java-http)


* [https://github.com/containers/podman-desktop](https://github.com/containers/podman-desktop)


* [https://github.com/cachix/devenv](https://github.com/cachix/devenv)


* [https://devenv.sh/](https://devenv.sh/)


* [https://teropa.info/musicmouse/](https://teropa.info/musicmouse/)



#### Blast from the Past


[https://www.youtube.com/watch?v=tnWq8opMI6s](https://www.youtube.com/watch?v=tnWq8opMI6s)




#### QUICK SCRIPTS



curl http://localhost:8080/admin/v2/clusters/standalone




FLiP-Py-Pi-GasThermal: Building an IoT Edge Application with Apache Pulsar and Python for TVOC and CO2 Ingest

 FLiP-Py-Pi-GasThermal


tags:  Apache Pulsar, Python, Raspberry Pi, Gas Sensor + Thermal Camera Sensors, Apache Flink, Trino/Presto SQL

ThermalCam

Sensors

  • Pimoroni BreakoutGarden: SGP30
    • Sensiron SGP30 TVOC and eCO2 sensor (datasheet)
    • TVOC sensing from 0-60,000 ppb (parts per billion)
    • CO2 sensing from 400 to 60,000 ppm (parts per million)
  • Pimoroni BreakoutGarden: MLX90640 Thermal Camera

HardWare

Architecture

designthis more sendmoredata

Build

bin/pulsar-admin topics create persistent://public/default/garden3

bin/pulsar-client consume "persistent://public/default/garden3" -s "garden3reader" -n 0

class Garden(Record):
    cpu = Float()
    diskusage = String()
    endtime = String()
    equivalentco2ppm = String()
    host = String()
    hostname = String()
    ipaddress = String()
    macaddress = String()
    memory = Float()
    rowid = String()
    runtime = Integer()
    starttime = String()
    systemtime = String()
    totalvocppb = String()
    ts = Integer()
    uuid = String()

----- got message -----
key:[garden3_uuid_yvs_20220306191528], properties:[], content:{
 "cpu": 0.0,
 "diskusage": "103496.6 MB",
 "endtime": "1646594128.2460103",
 "equivalentco2ppm": "  413",
 "host": "garden3",
 "hostname": "garden3",
 "ipaddress": "192.168.1.198",
 "macaddress": "dc:a6:32:32:98:20",
 "memory": 9.2,
 "rowid": "20220306191528_707b34d4-7299-4233-a495-d2d97393e834",
 "runtime": 0,
 "starttime": "03/06/2022 14:15:28",
 "systemtime": "03/06/2022 14:15:29",
 "totalvocppb": "    5",
 "ts": 1646594129,
 "uuid": "garden3_uuid_yvs_20220306191528"
}

presto> select * from pulsar."public/default"."garden3";
 cpu |  diskusage  |      endtime       | equivalentco2ppm |  host   | hostname |   ipaddress   |    macaddress     | memory |                        rowid                        | runtime |      
-----+-------------+--------------------+------------------+---------+----------+---------------+-------------------+--------+-----------------------------------------------------+---------+------
 6.5 | 103496.5 MB | 1646594650.7116666 |   418            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192410_197b6b0c-b86c-4191-9e9c-11777767825e |       0 | 03/06
 6.7 | 103496.5 MB | 1646594651.7441382 |   418            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192411_4dab3212-423b-46a9-ae39-b10eb363336d |       0 | 03/06
 1.3 | 103496.5 MB | 1646594652.7764313 |   421            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192412_d24a819b-4ca1-489a-9683-da48bc37185c |       0 | 03/06
 0.2 | 103496.5 MB | 1646594653.810233  |   421            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192413_f4e43177-2486-4a36-b27a-903028d6aacf |       0 | 03/06
 0.0 | 103496.5 MB | 1646594654.8467774 |   416            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192414_558d3db8-725a-46ec-ad1f-89f8067142f8 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594655.880628  |   420            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192415_1eeea533-7f7e-4945-a089-d4b2f8681e14 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594656.9145741 |   418            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192416_d7d8c6ea-2adf-4704-8f49-3108a7328f26 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594657.9489982 |   425            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192417_c71cbb6e-c59f-4855-84ae-b796fd3d7a76 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594658.9828157 |   420            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192418_c14204dd-4c1c-4e76-a272-8ddecd11a97c |       0 | 03/06
 0.0 | 103496.5 MB | 1646594660.0187812 |   420            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192419_c7583063-06e0-4601-806a-96619b6bb136 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594661.0531507 |   428            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192421_57e4a8ed-af6f-4dd0-b143-f29f1a211fdb |       0 | 03/06
 0.0 | 103496.5 MB | 1646594662.087301  |   421            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192422_cf2ce36e-c996-41ac-baec-db292cffdd37 |       0 | 03/06
 6.2 | 103496.5 MB | 1646594663.1214898 |   415            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192423_30b13428-ac2e-4929-b9f5-c4c3fbc3312a |       0 | 03/06
 6.5 | 103496.5 MB | 1646594664.1541135 |   424            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192424_d1517315-85fa-4923-b634-9b673c5b20ba |       0 | 03/06
 3.6 | 103496.5 MB | 1646594665.1890867 |   417            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192425_3984b0bc-a92e-4458-834c-e65259bb7a4d |       0 | 03/06
 0.0 | 103496.5 MB | 1646594666.221572  |   422            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192426_e0debd83-47e8-43f7-8ec5-cf0dbc27b87e |       0 | 03/06
 0.0 | 103496.5 MB | 1646594667.2555919 |   424            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192427_6ec920e7-56d2-4730-bddc-e18592cf1210 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594668.2893167 |   428            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192428_434b4458-6f89-4161-b967-f29796d8bf5d |       0 | 03/06
 0.0 | 103496.5 MB | 1646594669.3234618 |   426            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192429_5bb29c2e-d078-405d-b8f6-967ea4753b3f |       0 | 03/06
 0.0 | 103496.5 MB | 1646594670.359024  |   421            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192430_05b86f6b-4d28-497c-acd6-d14f7ce27157 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594671.392967  |   432            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192431_b7a2955a-6f39-4439-8eb9-27d7a2633836 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594672.4271743 |   426            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192432_29861384-fc30-4deb-bc1d-b7f2d27ef89d |       0 | 03/06
 0.0 | 103496.5 MB | 1646594673.4611707 |   424            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192433_25435335-39fe-4294-b913-95c63537a743 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594674.4951062 |   424            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192434_a52a90e8-b11c-467b-b87f-936432bc8988 |       0 | 03/06
 3.3 | 103496.5 MB | 1646594675.531778  |   436            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192435_48bab6a9-9f2e-443b-9fed-f096f6c24ebd |       0 | 03/06
 6.5 | 103496.5 MB | 1646594676.5642908 |   433            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192436_761f79dc-06f6-4ed3-8062-5227b6842b77 |       0 | 03/06
 6.2 | 103496.5 MB | 1646594677.5965276 |   421            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192437_6d8d49c9-0519-4825-b6b7-ed435e9fe747 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594678.6290672 |   423            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192438_a920872d-f152-4b18-94a7-b4dfe5641482 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594679.6629703 |   418            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192439_032ce574-10f6-4b50-b5e8-102b466af65a |       0 | 03/06
 0.0 | 103496.5 MB | 1646594680.699419  |   420            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192440_cde6f78a-d028-4f87-a95b-156bac0ee0c2 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594681.7338123 |   424            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192441_7400689a-6974-4ff2-a688-96c317d45d03 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594682.767776  |   417            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192442_46696fa7-295c-4fe6-bf49-8d8405e21cf5 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594683.8017883 |   420            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192443_4de9f79c-0169-49d6-b337-fb57dcb5cf7e |       0 | 03/06
 0.0 | 103496.5 MB | 1646594684.8360054 |   422            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192444_63081ac4-2c87-462d-bc16-ab506e9c6db2 |       0 | 03/06
 0.0 | 103496.5 MB | 1646594685.8720167 |   420            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192445_fee13e36-26c6-4f06-894c-8b96cb0f3bae |       0 | 03/06
 0.0 | 103496.5 MB | 1646594686.90594   |   431            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192446_8aaa423d-bfe3-43e8-8653-d554c2afb8d6 |       0 | 03/06
 0.8 | 103496.4 MB | 1646594687.939747  |   416            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192447_bfe339cd-3fb4-4d2a-b612-3a2363e1b83b |       0 | 03/06
 6.5 | 103496.4 MB | 1646594688.9728699 |   424            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192448_8809fc3e-75f8-4aca-a141-6e5d4872b0de |       0 | 03/06
 6.5 | 103496.4 MB | 1646594690.0051649 |   412            | garden3 | garden3  | 192.168.1.198 | dc:a6:32:32:98:20 |    9.2 | 20220306192449_0d0b4b62-0321-4bcd-952b-14607333d9f5 |       0 | 03/06

presto> desc pulsar."public/default"."garden3";
      Column       |   Type    | Extra |                                   Comment                                   
-------------------+-----------+-------+-----------------------------------------------------------------------------
 cpu               | real      |       | ["null","float"]                                                            
 diskusage         | varchar   |       | ["null","string"]                                                           
 endtime           | varchar   |       | ["null","string"]                                                           
 equivalentco2ppm  | varchar   |       | ["null","string"]                                                           
 host              | varchar   |       | ["null","string"]                                                           
 hostname          | varchar   |       | ["null","string"]                                                           
 ipaddress         | varchar   |       | ["null","string"]                                                           
 macaddress        | varchar   |       | ["null","string"]                                                           
 memory            | real      |       | ["null","float"]                                                            
 rowid             | varchar   |       | ["null","string"]                                                           
 runtime           | integer   |       | ["null","int"]                                                              
 starttime         | varchar   |       | ["null","string"]                                                           
 systemtime        | varchar   |       | ["null","string"]                                                           
 totalvocppb       | varchar   |       | ["null","string"]                                                           
 ts                | integer   |       | ["null","int"]                                                              
 uuid              | varchar   |       | ["null","string"]                                                           
 __partition__     | integer   |       | The partition number which the message belongs to                           
 __event_time__    | timestamp |       | Application defined timestamp in milliseconds of when the event occurred    
 __publish_time__  | timestamp |       | The timestamp in milliseconds of when event as published                    
 __message_id__    | varchar   |       | The message ID of the message used to generate this row                     
 __sequence_id__   | bigint    |       | The sequence ID of the message used to generate this row                    
 __producer_name__ | varchar   |       | The name of the producer that publish the message used to generate this row 
 __key__           | varchar   |       | The partition key for the topic                                             
 __properties__    | varchar   |       | User defined properties    
 

Presto/Trino gives us access to all that tasty meta data for each message/event/row/record/thing/data stuff. All with special names to help prevent collisions.

Meta Data

  • partition - if we have a partitioned topic, which partition does this message belong to.
  • event_time - you know what time it is! Timestamp in ms for event action.
  • publish_time - when was this message published to the topic?
  • message_id - unique id for this message
  • sequence_id - ordering information for this message
  • producer_name - who sent me this?
  • key - did you assign a key like I asked you to?
  • properties - all those extra fields you added around the payload

Spark Structured Streaming

val dfPulsar = spark.readStream.format("pulsar").option("service.url", "pulsar://localhost:6650").option("admin.url", "http://localhost:8080").option("topic", "persistent://public/default/garden3").load()
dfPulsar.printSchema()
val pQuery = dfPulsar.selectExpr("*").writeStream.format("parquet").option("truncate", false) .option("checkpointLocation", "/tmp/checkpoint").option("path", "/opt/demo/gasthermal").start()
    
pQuery.explain()
pQuery.awaitTermination()
pQuery.stop()

// can be "orc", "json", "csv", etc.

Spark

Show Me The Data

We can visualize data from Apache Pulsar by consuming it through the web sockets interface in a simple JQuery Single Page Web Application like below.

JavaScript

Example Parquet Files

pip3 install parquet-tools -U

parquet-tools inspect part-00000-b7e1f8dc-956d-4130-bc59-7b1435e41391-c000.snappy.parquet

############ file meta data ############
created_by: parquet-mr version 1.12.1 (build 2a5c06c58fa987f85aa22170be14d927d5ff6e7d)
num_columns: 23
num_rows: 1
num_row_groups: 1
format_version: 1.0
serialized_size: 5071


############ Columns ############
cpu
diskusage
endtime
equivalentco2ppm
host
hostname
ipaddress
macaddress
memory
rowid
runtime
starttime
systemtime
totalvocppb
ts
uuid
__key
__topic
__messageId
__publishTime
__eventTime
key
value

############ Column(cpu) ############
name: cpu
path: cpu
max_definition_level: 1
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE

############ Column(diskusage) ############
name: diskusage
path: diskusage
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(endtime) ############
name: endtime
path: endtime
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(equivalentco2ppm) ############
name: equivalentco2ppm
path: equivalentco2ppm
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(host) ############
name: host
path: host
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(hostname) ############
name: hostname
path: hostname
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(ipaddress) ############
name: ipaddress
path: ipaddress
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(macaddress) ############
name: macaddress
path: macaddress
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(memory) ############
name: memory
path: memory
max_definition_level: 1
max_repetition_level: 0
physical_type: FLOAT
logical_type: None
converted_type (legacy): NONE

############ Column(rowid) ############
name: rowid
path: rowid
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(runtime) ############
name: runtime
path: runtime
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE

############ Column(starttime) ############
name: starttime
path: starttime
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(systemtime) ############
name: systemtime
path: systemtime
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(totalvocppb) ############
name: totalvocppb
path: totalvocppb
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(ts) ############
name: ts
path: ts
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE

############ Column(uuid) ############
name: uuid
path: uuid
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(__key) ############
name: __key
path: __key
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: None
converted_type (legacy): NONE

############ Column(__topic) ############
name: __topic
path: __topic
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(__messageId) ############
name: __messageId
path: __messageId
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: None
converted_type (legacy): NONE

############ Column(__publishTime) ############
name: __publishTime
path: __publishTime
max_definition_level: 1
max_repetition_level: 0
physical_type: INT96
logical_type: None
converted_type (legacy): NONE

############ Column(__eventTime) ############
name: __eventTime
path: __eventTime
max_definition_level: 1
max_repetition_level: 0
physical_type: INT96
logical_type: None
converted_type (legacy): NONE

############ Column(key) ############
name: key
path: __messageProperties.key_value.key
max_definition_level: 2
max_repetition_level: 1
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

############ Column(value) ############
name: value
path: __messageProperties.key_value.value
max_definition_level: 3
max_repetition_level: 1
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8

Flink

CREATE CATALOG pulsar WITH (
   'type' = 'pulsar',
   'service-url' = 'pulsar://pulsar1:6650',
   'admin-url' = 'http://pulsar1:8080',
   'format' = 'json'
);

USE CATALOG pulsar;

SHOW TABLES;

Flink SQL> describe garden3;
+------------------+--------+------+-----+--------+-----------+
|             name |   type | null | key | extras | watermark |
+------------------+--------+------+-----+--------+-----------+
|              cpu |  FLOAT | true |     |        |           |
|        diskusage | STRING | true |     |        |           |
|          endtime | STRING | true |     |        |           |
| equivalentco2ppm | STRING | true |     |        |           |
|             host | STRING | true |     |        |           |
|         hostname | STRING | true |     |        |           |
|        ipaddress | STRING | true |     |        |           |
|       macaddress | STRING | true |     |        |           |
|           memory |  FLOAT | true |     |        |           |
|            rowid | STRING | true |     |        |           |
|          runtime |    INT | true |     |        |           |
|        starttime | STRING | true |     |        |           |
|       systemtime | STRING | true |     |        |           |
|      totalvocppb | STRING | true |     |        |           |
|               ts |    INT | true |     |        |           |
|             uuid | STRING | true |     |        |           |
+------------------+--------+------+-----+--------+-----------+
16 rows in set

select equivalentco2ppm, totalvocppb, cpu, starttime, systemtime, ts, cpu, diskusage, endtime, memory, uuid from garden3;

select max(equivalentco2ppm) as MaxCO2, max(totalvocppb) as MaxVocPPB from garden3;


// TODO

Add your own table

  publishTime TIMESTAMP(3) METADATA,
  WATERMARK FOR publishTime AS publishTime - INTERVAL '5' SECOND
  

Flink Flink2 Flink3

Apache NiFi (FLiPN)

  • Choose a processor from the palette

ShowNiFi

  • Consume messages from Apache Pulsar

ConsumePulsar

  • Full Apache NiFi Data Flow with Apache Pulsar and MongoDB

Flow

MongoDB

mongo -u username1 -p password1 --authenticationDatabase admin pulsar1:27017/inventory

show databases

db.createCollection("garden3")

show collections

db.garden3.find().pretty()

{
        "_id" : ObjectId("622f7315f99b9a338d60592f"),
        "cpu" : 0,
        "diskusage" : "101615.9 MB",
        "endtime" : "1647276083.2033532",
        "equivalentco2ppm" : "  407",
        "host" : "garden3",
        "hostname" : "garden3",
        "ipaddress" : "192.168.1.199",
        "macaddress" : "dc:a6:32:32:98:20",
        "memory" : 8.8,
        "rowid" : "20220314164123_0e19c5e6-45f5-405e-bd93-9aed05b37630",
        "runtime" : 0,
        "starttime" : "03/14/2022 12:41:23",
        "systemtime" : "03/14/2022 12:41:24",
        "totalvocppb" : "  5",
        "ts" : 1647276084,
        "uuid" : "garden3_uuid_xrl_20220314164123"
}

mongo mongoquery

References

Additional Heat Images

hot hot2 more3 doesanyonereadthis4