Google Gemma for Real-Time Lightweight Open LLM Inference
Apache NiFi, Google Gemma, LLM, Open, HuggingFace, Generative AI, Gemma 7B-IT
When I saw the new model out on HuggingFace I had to try it with Apache NiFi for some Slack pipelines and compare it to ChatGPT and WatsonX AI.
This seems like a pretty fast interesting new open large language model, I am going to give it a try. Let’s go. As I am short on disk space I am going to call it via HuggingFace REST Inference. There are a lot of ways to use the models including HuggingFace Transformers, Pytorch, Keras-NLP/Keras/Tensorflow and more. We will try both 2B-IT and 7B-IT.
Google Gemma on HuggingFace
![](https://cdn-images-1.medium.com/max/2000/0*7dMujK5_kvpUDvqd.png)
![](https://cdn-images-1.medium.com/max/2000/1*HyC1xGDuIwXhGU11Srk5XA.png)
This is really easy to start using. We can test on the website before we get ready to roll out a NiFi.
![](https://cdn-images-1.medium.com/max/2000/1*CqHm5_h8RtHZmAgNZFZNUw.png)
Real-Time DataFlow With Google Gemma 7B-IT
Source Code:
FLaNK for using HuggingFace hosted Open Google Model Gemma - tspannhw/FLaNK-Gemmagithub.com
![](https://cdn-images-1.medium.com/max/2000/1*Ro8Mq8bGTRAwGcJPcpS_SA.png)
![](https://cdn-images-1.medium.com/max/2000/1*erDacEsO0ZodsJmuCzW9qA.png)
- ListenSlack — We connect via new Slack Sockets and get chat messages
- EvaluateJsonPath — We parse out the fields we like (we send the raw copy somewhere else in 6)
- RouteOnAttribute — We only want messages in the “general” channel
- RouteOnAttribute — We only want real messages
- ReplaceText — We build a new file to send
- ProcessGroup — We will process the raw JSON message from Slack in a sub process group
![](https://cdn-images-1.medium.com/max/2000/1*XnX_M0eeX7xYZSfVYUd5Ng.png)
8. InvokeHTTP — We call HuggingFace against the Google Gemma Model
9. QueryRecord — We clean up the JSON and return 1 row
10. UpdateRecord — We add fields to the JSON file
11. UpdateAttribute — We set headers
12. PublishKafkaRecord_2.6 — We send the data via Kafka
13. RetryFlowFile — If it failed let’s retry three times then fail
14. ProcessGroup — In this sub process group we will clean up and enrich the Google Gemma results and send to Slack.
We call HuggingFace for the Google Gemma 7b-IT model.
![](https://cdn-images-1.medium.com/max/2000/1*0IK5Lpdl4VBvfa_-PRdxyQ.png)
![](https://cdn-images-1.medium.com/max/2000/1*eyT2gyclZtfWSLBsuLfEeg.png)
Merlin, My Cat Manager, asks if I am done with this. It’s been over 3hours to build this.
![](https://cdn-images-1.medium.com/max/2000/1*awj8Av3Y9k1Mbb87c183pg.png)
We now parse the results from HuggingFace and send them to our slack channel.
We add a footer to tell us what LLM we used.
![](https://cdn-images-1.medium.com/max/2000/1*ab_xULuvWSw-OdSuii8sTA.png)
That’s it, three different LLM systems and models, plus output to Slack, Postgresql and Kafka. Easy.
![](https://cdn-images-1.medium.com/max/2000/1*nD9TB40m3ZQpGa7xXth3Pw.png)
We start off with a Slack message question in general channel to parse.
{
"inputs" : "How did Tim Spann use Apache NiFi to work with HuggingFace hosted Google Gemma models?"
}
The results of the inference from the Google Gemma model is:
[ {
"generated_text" : "How did Tim Spann use Apache NiFi to work with HuggingFace hosted Google Gemma models?\n\nTim Spann used Apache NiFi to work with HuggingFace hosted Google Gemma models by setting up a NiFi flow that interacted with the HuggingFace API. Here are the main steps involved:\n\n**1. Set up NiFi flow:**\n- Create a new NiFi flow and name it appropriately.\n- Add a processor to the flow.\n\n**2. Configure processor:**\n- Use an HTTP processor to make requests to the HuggingFace API.\n- Set the URL"
} ]
Example of Provenance Events
![](https://cdn-images-1.medium.com/max/2000/1*X-8eRAhNr5bCHQLKTKXSOg.png)
Input to Slack
![](https://cdn-images-1.medium.com/max/2000/1*Bq2IbIqTuhZQXWTiVli_Eg.png)
HuggingFace REST API Formatted Input to Gemma
![](https://cdn-images-1.medium.com/max/2000/1*DxS1HAWn7OLVHSduliv4IQ.png)
Output to Slack
![](https://cdn-images-1.medium.com/max/2000/1*7BDE0UqXk3AJ6mUlMtRQow.png)
![](https://cdn-images-1.medium.com/max/2000/1*YDNm8pZdKzE-AAE78UoEjw.png)
![](https://cdn-images-1.medium.com/max/2000/1*zi6dZPqlM3cyOdLDmOITdQ.png)
![](https://cdn-images-1.medium.com/max/2000/1*L7sHWhWgP-eAbXEDhUYJbg.png)
Output to Apache Kafka
![](https://cdn-images-1.medium.com/max/2000/1*KARIixmdQ_5RjVBMgUQ6-w.png)
![](https://cdn-images-1.medium.com/max/2000/1*GeG7sRb084Ue_RVWVDbqOA.png)
Also Let’s Run Against OpenAI ChatGPT and WatsonX.AI LLAMA 2–70B Chat
![](https://cdn-images-1.medium.com/max/2000/1*6e2pLDWQTrjqCoXlWod9mA.png)
![](https://cdn-images-1.medium.com/max/2000/1*nHArZlpLJYXwgLj7LlcIAw.png)
![](https://cdn-images-1.medium.com/max/2000/1*IHAjB8rNRi1aV29ErtX2UQ.png)
The New Slack Processing
![](https://cdn-images-1.medium.com/max/2000/1*HbB5UT_694Z7ZBF4XlNIjg.png)
Send all Slack JSON Events to Postgresql
![](https://cdn-images-1.medium.com/max/2000/1*uI2kLQHzUsE9OLw9Rbqt9A.png)
![](https://cdn-images-1.medium.com/max/2000/1*0dLQZdSEY3ulTTx-ajKCGw.png)
How to Connect NiFi to Slack
![](https://cdn-images-1.medium.com/max/2600/1*4G5_ij9vBBSr005sKAMftw.jpeg)
Make sure to Enable Socket Mode!
![](https://cdn-images-1.medium.com/max/2600/1*8Uik-D9gQseYt6CW_wDyxg.jpeg)
You need the User and Bot User OAuth Tokens.
![](https://cdn-images-1.medium.com/max/2000/1*PYzBLB-0cdoSyBhEUQ3oKg.jpeg)
![](https://cdn-images-1.medium.com/max/2600/1*-wECU4my32Aq8eH1ikLAgQ.jpeg)
![](https://cdn-images-1.medium.com/max/2600/1*Dgr54vrYV1V94hbBLq9VgQ.jpeg)
![](https://cdn-images-1.medium.com/max/2000/1*IFh3-F-6MRDBY3phMZ1euw.jpeg)
![](https://cdn-images-1.medium.com/max/2600/1*d3Ovs0IK2b4ujekoRhdcwA.jpeg)
![](https://cdn-images-1.medium.com/max/2000/1*RX5W1HQQy3vcg7rbj9Ygjw.jpeg)
This is the configuration:
display_information:
name: timchat
description: Apache NiFi Bot For LLM
background_color: "#18254D"
long_description: "chat testing"
features:
app_home:
home_tab_enabled: true
messages_tab_enabled: false
messages_tab_read_only_enabled: false
bot_user:
display_name: nifichat
always_online: true
slash_commands:
- command: /timchat
description: starts command
usage_hint: ask question
should_escape: false
- command: /weather
description: get the weather
usage_hint: /weather 08520
should_escape: false
- command: /stocks
description: stocks
usage_hint: /stocks IBM
should_escape: false
- command: /nifi
description: NiFi Questions
usage_hint: Questions on NiFi
should_escape: false
- command: /flink
description: Flink Commands
usage_hint: Questions on Flink
should_escape: false
- command: /kafka
description: Questions on Kafka
usage_hint: Ask questions about Apache Kafka
should_escape: false
- command: /cml
description: CML
usage_hint: Cloudera Machine Learning
should_escape: false
- command: /cdf
description: Cloudera Data Flow
should_escape: false
- command: /csp
description: Cloudera Stream Processing
should_escape: false
- command: /cde
description: Cloudera Data Engineering
should_escape: false
- command: /cdw
description: Cloudera Data Warehouse
should_escape: false
- command: /cod
description: Cloudera Operational Database
should_escape: false
- command: /sdx
description: Cloudera Shared Data Experience
should_escape: false
- command: /cdp
description: Cloudera Data Platform
should_escape: false
- command: /cdh
description: Cloudera Data Hub
should_escape: false
- command: /rtdm
description: Cloudera Real-Time Data Mart
should_escape: false
- command: /csa
description: Cloudera Streaming Analytics
should_escape: false
- command: /smm
description: Cloudera Streams Messaging Manager
should_escape: false
- command: /ssb
description: Cloudera SQL Streams Builder
should_escape: false
oauth_config:
scopes:
user:
- channels:history
- channels:read
- chat:write
- files:read
- files:write
- groups:history
- im:history
- im:read
- links:read
- mpim:history
- mpim:read
- users:read
- im:write
- mpim:write
bot:
- app_mentions:read
- channels:history
- channels:read
- chat:write
- commands
- files:read
- groups:history
- im:history
- im:read
- incoming-webhook
- links:read
- metadata.message:read
- mpim:history
- mpim:read
- users:read
- im:write
- mpim:write
settings:
event_subscriptions:
request_url:
user_events:
- channel_created
- channel_deleted
- file_created
- file_public
- file_shared
- im_created
- link_shared
- message.channels
- message.groups
- message.im
- message.mpim
bot_events:
- app_mention
- channel_created
- channel_deleted
- channel_rename
- group_history_changed
- member_joined_channel
- message.channels
- message.groups
- message.im
- message.mpim
- user_change
interactivity:
is_enabled: true
org_deploy_enabled: false
socket_mode_enabled: true
token_rotation_enabled: false
Retrieves real-time messages or Slack commands from one or more Slack conversations. The messages are written out in…nifi.apache.org
RESOURCES
We're on a journey to advance and democratize artificial intelligence through open source and open science.huggingface.co
We're on a journey to advance and democratize artificial intelligence through open source and open science.huggingface.co
Introducing Gemma, a family of open-source, lightweight language models. Discover quickstart guides, benchmarks, train…ai.google.dev
Gemma is a family of lightweight, open models built from the research and technology that Google used to create the…www.kaggle.com
Explore and run machine learning code with Kaggle Notebooks | Using data from Gemmawww.kaggle.com
Google's new family of models, Gemma, will be available to developers for language-based tasks. Unlike Gemini, it will…www.theverge.com
lightweight, standalone C++ inference engine for Google's Gemma models. - google/gemma.cppgithub.com