Skip to main content


Showing posts from June, 2019

Dataworks Summit DC and NoSQL DC 2019 Videos

In this quick article are the videos from some of my talks (they didn't record the lightning talks or BoF).   Also is a great talk by my friends from American Water. Check them out. Tracking Crime as It Happens with Apache Phoenix, Apache HBase and Apache NiFi Henry Sowell and Timothy Spann All the talks at NoSQL 2019 DataWorks Summit 2019 DC Internet of Fleet Management Things American Water Cold Chain Logistics with NiFi Timothy Spann and Mehul Shah Edge to AI:  Analytics From Edge to Cloud Timothy Spann and John Kuchmek

Performance Testing Apache NiFi - Part 1 - Loading Directories of CSV

Performance Testing Apache NiFi - Part 1 - Loading Directories of CSV I am running a lot of different flows on different Apache NiFi configurations to get some performance numbers in different situations. One situation I thought of was access directories of CSV files from HTTP.  Fortunately there's some really nice data available from NOAA ( Example Flow:    NOAA In this example performance testing flow I use my LinkProcessor to grab all of the links to CSV files on the HTTP download site.  I then split this JSON list into individual records and pull out the URL.   If it's a valid URL with a .CSV ending then I call invokeHTTP to download the CSV.   I then query the CSV for all the records (SELECT *) and for a count (SELECT COUNT(*)).   As part of this the records are written to JSON. In this example we grab a specific CSV file and get 739 records.  This CSVReader  uses Jackson to parse th