Processing Fixed Width and Complex Files

Processing Fixed Width and Complex Files


Pointers

The first decision you will have to make is if it's structured at all.   If it is a known type like CSV, JSON, AVRO, XML or Parquet then just use a record.

If it's semi-structured like a log file, GrokReader may work or ExtractGrok.

If it's like CSV, you may be able to tweak the CSV reader to work (say header or no header) or try one of the two CSV parsers NiFi has (Jackson or Apache Commons).    

If it's a format like PDF, Word, Excel, RTF or something like that, I have a custom processor that uses Apache Tika and that should be able to parse it into text.   Once it is text you can probably work with it.


Examples



Documentation




Processors To Use For File Manipulation

  • AttributesToCSV
  • AttributesToJSON
  • ConvertExcelToCSVProcessor 
  • ConvertRecord
  • ConvertText
  • CSVReader
  • EvaluateJSONPath
  • EvaluateXPath
  • EvaluateXQuery
  • ExecuteScript
  • ExecuteStreamCommand
  • ExtractGrok
  • ExtractText
  • FlattenJson
  • ForkRecord
  • GrokReader
  • JsonPathReader
  • JsonTreeReader
  • JoltTransformJSON
  • JoltTransformRecord
  • LookupAttribute
  • LookupRecord
  • MergeContent
  • MergeRecord
  • ModifyBytes
  • ParseSyslog*
  • PartitionRecord
  • QueryRecord
  • ReaderLookup
  • ReplaceText
  • ReplaceTextWithMapping
  • ScriptedReader
  • ScriptedRecordSink
  • ScriptedTransformRecord
  • SegmentContent
  • SplitContent
  • SplitJson
  • SplitRecord
  • SplitText
  • SplitXml
  • SyslogReader
  • TransformXml
  • UnpackContent
  • UpdateAttribute
  • UpdateRecord
  • ValidCsv
  • ValidateRecord
  • ValidateXml

Custom Processors

Helper Projects, SDK, Libraries and Services