Performance Testing Apache NiFi - Part 1 - Loading Directories of CSV
I am running a lot of different flows on different Apache NiFi configurations to get some performance numbers in different situations.
One situation I thought of was access directories of CSV files from HTTP. Fortunately there's some really nice data available from NOAA (https://www.ncei.noaa.gov/data/global-hourly/access/2019/).
Example Flow: NOAA
In this example performance testing flow I use my LinkProcessor to grab all of the links to CSV files on the HTTP download site. I then split this JSON list into individual records and pull out the URL. If it's a valid URL with a .CSV ending then I call invokeHTTP to download the CSV. I then query the CSV for all the records (SELECT *) and for a count (SELECT COUNT(*)). As part of this the records are written to JSON.
In this example we grab a specific CSV file and get 739 records.
This
CSVReader uses Jackson to parse the CSV files and figures out fields from the header.
I pull out the URL returned from the Link Processor.
This is my JSON Record Set Writer, it doesn't include a schema since I never built one.
I am looking at some performance stats for my NiFi instance which has 31GB of JVM space. 32GB causes issues due to the JVM's problem with 32bit addressing.
In this flow I generate unique JSON files in mass quantities at about 250bytes, merge them together, compress them, then push them to a file system. This is to see how many records I can push.
QueryRecord is easy on CSV files even with no known schema.
The Results of the recordCount query:
I can also test with really fast multithreaded calls to a popular btc.com BitCoin exchange REST API.
Even encrypting and compressing won't slow me down.
Example Translated Data Segment
[{"STATION":"16541099999","DATE":"2019-01-07T05:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"330,1,N,0010,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0030,1","DEW":"+0020,1","SLP":"99999,9","MA1":null,"MD1":null,"REM":null},{"STATION":"16541099999","DATE":"2019-01-07T06:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"330,1,N,0010,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0030,1","DEW":"+0030,1","SLP":"99999,9","MA1":null,"MD1":null,"REM":null},{"STATION":"16541099999","DATE":"2019-01-07T07:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"300,1,N,0010,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0030,1","DEW":"+0020,1","SLP":"99999,9","MA1":null,"MD1":null,"REM":null},{"STATION":"16541099999","DATE":"2019-01-07T09:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"280,1,N,0026,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0070,1","DEW":"+0050,1","SLP":"99999,9","MA1":null,"MD1":null,"REM":null},{"STATION":"16541099999","DATE":"2019-01-07T10:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"260,1,N,0046,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0080,1