Skip to main content

Building Search Indexes with Apache NiFi Streams

Building Search Indexes with Apache NiFi Streams



I can go to my Apache Solr dashboard hosted on Cloudera Data Platform - Data Center via port :8983/solr/#/

For those who have done the CFM or CDF Hands on Labs, we extend that lab to store to SOLR.

Login to Solr




Create a Collection for Your Data (sensors)




NiFi Flow (Data in, Output Records to Solr)



We need to configure the processor.   Just set Solr Type to Cloud, set Solr Location to URL:2181/solr and then pick your record reader ours is JSON.   You also need to select Fields to Index in the data.  In our JSON file I pick the timestamp and Id.



View The Collection To See Data Sent From NiFi





 To Query Via REST API

http://YOURURL:8983/solr/sensors/select?q=*%3A*


Example Collection Data

{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":2, "params":{ "q":"*:*", "_":"1587566383627"}}, "response":{"numFound":153,"start":0,"docs":[ { "sensor_id":[73], "sensor_ts":[1587566202642550], "id":"e8652da3-dddf-4735-a187-799f95d2e362", "_version_":1664683988343586816}, { "sensor_id":[22], "sensor_ts":[1587566203806855], "id":"9bae2dff-e5ec-4751-b20e-f4d3a9c6e894", "_version_":1664683988399161344}, { "sensor_id":[10], "sensor_ts":[1587566204972747], "id":"4cc9dd26-e559-4368-b0c2-47095a0f78ab", "_version_":1664683988407549952}, { "sensor_id":[48], "sensor_ts":[1587566206142079], "id":"ba73b5d7-4c06-4202-8145-0f3437f37759", "_version_":1664683988415938560}, { "sensor_id":[9], "sensor_ts":[1587566207304793], "id":"be85793a-1c3c-46ab-96d8-14057e6a2e27", "_version_":1664683988422230016}, { "sensor_id":[77], "sensor_ts":[1587566208471159], "id":"94a1a150-2836-4b60-aafd-9b09fe2ac62a", "_version_":1664683988428521472}, { "sensor_id":[75], "sensor_ts":[1587566209635840], "id":"729b488c-7a75-49a8-9121-3e192bd84a81", "_version_":1664683988434812928}, { "sensor_id":[64], "sensor_ts":[1587566210807450], "id":"e6f76296-f70a-430d-a532-8040fbf4139e", "_version_":1664683988440055808}, { "sensor_id":[1], "sensor_ts":[1587566211973630], "id":"a3cb1028-76d1-4dba-a2ef-03b267d6714b", "_version_":1664683988447395840}, { "sensor_id":[33], "sensor_ts":[1587566213135150], "id":"dea6ea44-28ab-4d6e-9fad-b0ffaef69dcb", "_version_":1664683988454735872}] }}

Example Code

https://github.com/tspannhw/nifi-solr-example

Reading From Apache SOLR


QuerySolr and PutKudu.


SOLR results in provenance.

# of Results (427)


QuerySolr parameters:   Cloud, Zookeeper URL:2181/solr, sensors, XML, *.*, /select


XML Results from SOLR convert to JSON




References

https://lucene.apache.org/solr/

https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/search_introducing.html

https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_solr_service.html

https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/search_tutorial.html

https://www.cloudera.com/campaign/how-tos-for-gurus/chapters/chapter-8/cloudera-search--solr--overview.html