Best in Flow Competition Tutorials Part 3 - Tutorial 3

 

  1. Resize image flow deployed as serverless function

DataFlow Functions provides a new, efficient way to run your event-driven Apache NiFi data flows. You can have your flow executed within AWS Lambda, Azure Functions or Google Cloud Functions and define the trigger that should start its execution.



DataFlow Functions is perfect for use cases such as:

  • Processing files as soon as they land into the cloud provider object store

  • Creating microservices over HTTPS

  • CRON driven use cases

  • etc


In this use case, we will be deploying a NiFi flow that will be triggered by HTTPS requests to resize images. Once deployed, the cloud provider will provide an HTTPS endpoint that you’ll be able to call to send an image, it will trigger the NiFi flow that will return a resized image based on your parameters.


The deployment of the flow as a function will have to be done within your cloud provider.


The below tutorial will use AWS as the cloud provider. If you’re using Azure or Google Cloud, you can still refer to this documentation to deploy the flow as a function.


3.1 Designing the flow for AWS Lambda


  1. Go into Cloudera DataFlow / Flow Design and create a new draft with a name of your choice.

  2. Drag and drop an Input Port named input onto the canvas. When triggered, AWS Lambda is going to inject into that input port a FlowFile containing the information about the HTTPS call that has been made.


Example of payload that will be injected by AWS Lambda as a FlowFile:



  1. Drag and drop an EvaluateJsonPath processor, call it ExtractHTTPHeaders. We’re going to use this to extract the HTTP headers that we want to keep in our flow. Add two properties configured as below. It’ll save as FlowFile’s attributes the HTTP headers (resize-height and resize-width) that we will be adding when making a call with our image to specify the dimensions of the resized image.


resizeHeight => $.headers.resize-height
resizeWidth => $.headers.resize-width


Note: don’t forget to change Destination as “flowfile-attribute” and Click Apply.



  1. Drag and drop another EvaluateJsonPath processor and then change it’s name to a unique one. This one will be used to retrieve the content of the body field from the payload we received and use it as the new content of the FlowFile. This field contains the actual representation of the image we have been sending over HTTP with Base 64 encoding.


body => $.body




  1. Drag and drop a Base64EncodeContent processor and change the mode to Decode. This will Base64 decode the content of the FlowFile to retrieve its binary format.

  2. Drag and drop a ResizeImage processor. Use the previously created FlowFile attributes to specify the new dimensions of the image. Also, specify true for maintaining the ratio.



  1. Drag and drop a Base64EncodeContent processor. To send back the resized image to the user, AWS Lambda expects us to send back a specific JSON payload with the Base 64 encoding of the image.

  2. Drag and drop a ReplaceText processor. We use it to extract the Base 64 representation of the resized image and add it in the expected JSON payload. Add the below JSON in “Replacement Value” and change “Evaluation Mode” to “Entire text”.


{

"statusCode": 200,

"headers": { "Content-Type": "image/png" },

"isBase64Encoded": true,

"body": "$1"

}


  1. Drag and drop an output port.

  2. Connect all the components together, you can auto-terminate the unused relationships. This should look like this:



You can now publish the flow into the DataFlow Catalog in the Flow Options menu:



Make sure to give it a name that is unique (you can prefix it with your name):



Once the flow is published, make sure to copy the CRN of the published version (it will end by /v.1):



3.2 Deploying the flow as a function in AWS Lambda


First thing first, go into DataFlow Functions and download the binary for running DataFlow Functions in AWS Lambda:




This should download a binary with a name similar to:

naaf-aws-lambda-1.0.0.2.3.7.0-100-bin.zip


Once you have the binary, make sure, you also have:

  • The CRN of the flow you published in the DataFlow Catalog

  • The Access Key that has been provided with these instructions in “Competition Resources” section

  • The Private Key that has been provided with these instructions in “Competition Resources” section


In order to speed up the deployment, we’re going to leverage some scripts to automate the deployment. It assumes that your AWS CLI is properly configured locally on your laptop and you can use the jq command for reading JSON payloads. You can now follow the instructions from this page here.


However, if you wish to deploy the flow in AWS Lambda manually through the AWS UI, you can follow the steps described here.