Performance Testing Apache NiFi - Part 1 - Loading Directories of CSV
I am running a lot of different flows on different Apache NiFi configurations to get some performance numbers in different situations.
One situation I thought of was access directories of CSV files from HTTP. Fortunately there's some really nice data available from NOAA (https://www.ncei.noaa.gov/data/global-hourly/access/2019/).
Example Flow: NOAA
This CSVReader uses Jackson to parse the CSV files and figures out fields from the header.
This is my JSON Record Set Writer, it doesn't include a schema since I never built one.
In this flow I generate unique JSON files in mass quantities at about 250bytes, merge them together, compress them, then push them to a file system. This is to see how many records I can push.
QueryRecord is easy on CSV files even with no known schema.
The Results of the recordCount query:
Example Translated Data Segment
I am running a lot of different flows on different Apache NiFi configurations to get some performance numbers in different situations.
One situation I thought of was access directories of CSV files from HTTP. Fortunately there's some really nice data available from NOAA (https://www.ncei.noaa.gov/data/global-hourly/access/2019/).
Example Flow: NOAA
In this example performance testing flow I use my LinkProcessor to grab all of the links to CSV files on the HTTP download site. I then split this JSON list into individual records and pull out the URL. If it's a valid URL with a .CSV ending then I call invokeHTTP to download the CSV. I then query the CSV for all the records (SELECT *) and for a count (SELECT COUNT(*)). As part of this the records are written to JSON.
In this example we grab a specific CSV file and get 739 records.
This CSVReader uses Jackson to parse the CSV files and figures out fields from the header.
This is my JSON Record Set Writer, it doesn't include a schema since I never built one.
I am looking at some performance stats for my NiFi instance which has 31GB of JVM space. 32GB causes issues due to the JVM's problem with 32bit addressing.
In this flow I generate unique JSON files in mass quantities at about 250bytes, merge them together, compress them, then push them to a file system. This is to see how many records I can push.
QueryRecord is easy on CSV files even with no known schema.
The Results of the recordCount query:
I can also test with really fast multithreaded calls to a popular btc.com BitCoin exchange REST API.
Even encrypting and compressing won't slow me down.
Example Translated Data Segment
[{"STATION":"16541099999","DATE":"2019-01-07T05:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"330,1,N,0010,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0030,1","DEW":"+0020,1","SLP":"99999,9","MA1":null,"MD1":null,"REM":null},{"STATION":"16541099999","DATE":"2019-01-07T06:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"330,1,N,0010,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0030,1","DEW":"+0030,1","SLP":"99999,9","MA1":null,"MD1":null,"REM":null},{"STATION":"16541099999","DATE":"2019-01-07T07:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"300,1,N,0010,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0030,1","DEW":"+0020,1","SLP":"99999,9","MA1":null,"MD1":null,"REM":null},{"STATION":"16541099999","DATE":"2019-01-07T09:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"280,1,N,0026,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0070,1","DEW":"+0050,1","SLP":"99999,9","MA1":null,"MD1":null,"REM":null},{"STATION":"16541099999","DATE":"2019-01-07T10:55:00","SOURCE":"4","LATITUDE":"39.6666667","LONGITUDE":"9.4333333","ELEVATION":"645.0","NAME":"PERDASDEFOGU, IT","REPORT_TYPE":"FM-15","CALL_SIGN":"99999","QUALITY_CONTROL":"V020","WND":"260,1,N,0046,1","CIG":"99999,9,9,Y","VIS":"999999,9,9,9","TMP":"+0080,1