Tracking Air Quality with Apache NiFi, Cloudera Data Science Workbench, Pyspark and Parquet

Tracking Air Quality 

Indoor vs Outdoor

Using a few sensors on a MiniFi node we are able to generate some air quality sensor readings.
Data:
row['bme680_tempc'] = '{0:.2f}'.format(sensor.data.temperature)
row['bme680_tempf'] = '{0:.2f}'.format((sensor.data.temperature * 1.8) + 32)
row['bme680_pressure'] = '{0:.2f}'.format(sensor.data.pressure)
row['bme680_gas'] = '{0:.2f}'.format(gas)
row['bme680_humidity'] = '{0:.2f}'.format(hum)
row['bme680_air_quality_score'] = '{0:.2f}'.format(air_quality_score)
row['bme680_gas_baseline'] = '{0:.2f}'.format(gas_baseline)
row['bme680_hum_baseline'] = '{0:.2f}'.format(hum_baseline)
See Part 1:
Newark / NYC Hazecam
Example
{"bme680_air_quality_score": "82.45", "uuid": "20190131191921_59c5441c-47b4-4f6f-a6d6-b3943bc9cf2b", "ipaddress": "192.168.1.166", "bme680_gas_baseline": 367283.28, "bme680_pressure": "1024.51", "bme680_hum_baseline": 40.0, "memory": 11.7, "end": "1548962361.4146328", "cputemp": 47, "host": "piups", "diskusage": "9992.7", "bme680_tempf": "87.53", "te": "761.2184100151062", "starttime": "01/31/2019 14:06:40", "systemtime": "01/31/2019 14:19:21", "bme680_humidity": "13.22", "bme680_tempc": "30.85", "bme680_gas": "363274.92"}
{
"end" : "1548967753.7064438",
"host" : "piups",
"diskusage" : "9990.4",
"cputemp" : 47,
"starttime" : "01/31/2019 15:44:11",
"bme680_hum_baseline" : "40.00",
"bme680_humidity" : "13.23",
"ipaddress" : "192.168.1.166",
"bme680_tempc" : "30.93",
"te" : "301.96490716934204",
"bme680_air_quality_score" : "83.27",
"systemtime" : "01/31/2019 15:49:13",
"bme680_tempf" : "87.67",
"bme680_gas_baseline" : "334942.60",
"uuid" : "20190131204913_4984a635-8dcd-408a-ba23-c0d225ba2d86",
"bme680_pressure" : "1024.69",
"memory" : 12.6,
"bme680_gas" : "336547.19"
}
Outdoor air quality
https://community.cloudera.com/t5/Community-Articles/Tracking-Air-Quality-with-HDP-and-HDF-Part-1-Apache-NiFi/ta-p/248265

https://openweathermap.org/api/pollution/co

https://airquality.weather.gov/probe_aq_data.php?city=hightstown&state=NJ&Submit=Get+Guidance

http://feeds.enviroflash.info/rss/realtime/445.xml

http://feeds.enviroflash.info/cap/aggregate.xml

http://www.airnowapi.org/aq/forecast/zipCode/?format=application/json&zipCode=08520&date=2019-09-05&distance=25&API_KEY=code

https://docs.airnowapi.org/webservices

http://www.airnowapi.org/aq/observation/zipCode/current/?format=application/json&zipCode=08520&distance=50&API_KEY=

code


https://api.openaq.org/v1/measurements?country=US&date_from=2018-05-04

https://api.openaq.org/v1/latest?country=US

http://www.airnowapi.org/aq/observation/zipCode/current/?format=application/json&zipCode=08520&distance=25&API_KEY=code









Flight Data

https://community.cloudera.com/t5/Community-Articles/Ingesting-Flight-Data-ADS-B-USB-Receiver-with-Apache-NiFi-1/ta-p/247940

Air Traffic Overhead

https://opensky-network.org/api/states/all?lamin=40.270599&lomin=-74.522430&lamax=40.270599&lomax=-74.522430

http://scorecard.goodguide.com/about/txt/data.html

https://www.epa.gov/visibility

https://www.airnow.gov/

https://www.state.nj.us/dep/daq/

http://www.nynjpollen.com/

http://www.njaqinow.net/

https://www.fsvisimages.com/descriptions.aspx

https://www.datainmotion.dev/2019/03/iot-series-sensors-utilizing-breakout_74.html

https://github.com/tspannhw/minifi-breakoutgarden/blob/master/aqminifi.py