All Data and AI Weekly #207: 15 Sept 2025
https://bsky.app/profile/paasdev.bsky.social
NiFi + AI + AI Data Cloud + Iceberg. 
https://www.reddit.com/r/DataEngineeringForAI/hot/
Monthly NYC and Youtube Events

AWS New York Summit https://github.com/tspannhw/conferences/tree/main/2025/awsny
Hex + Snowflake Hackathon https://github.com/tspannhw/hackathons/tree/main/2025-07-15
Apache NiFi + AI Agents + Cortex AI + Snowflake AISQL
https://github.com/tspannhw/TrafficAI/tree/main/Agents
https://github.com/tspannhw/transit-ridership
https://github.com/tspannhw/conferences
https://github.com/tspannhw/hackathons/tree/main/2025-07-15
This edition is a special one, as we're highlighting the fantastic Community Over Code 2025 event where I had the opportunity to present three talks. It was a great chance to connect with the open-source community and share insights on a variety of topics, from Apache NiFi to real-time data optimization.
Below, you'll find a recap of the talks, along with other key updates from the world of data engineering and AI.

I was thrilled to present three talks at COC25. For anyone who missed them or wants to revisit the material, you can find the slide decks and related resources below:
- NiFi Man: We're Here, But Should We Have Come?
 This talk explored the practical considerations and real-world implications of deploying Apache NiFi. We went beyond the "how-to" and delved into the "why" and "when" of using this powerful dataflow tool.
- Utilizing Real-Time Transit Data for Travel Optimization
 This session demonstrated how to leverage real-time transit data streams to build intelligent travel optimization solutions. We discussed the architecture, data processing, and benefits of such a system.
- Enhancing Apache NiFi 2.x with Python Processors
 For the more technical audience, this talk showcased how to extend Apache NiFi's functionality using custom Python processors. It's a great way to integrate specialized logic and libraries directly into your dataflows.
All the code and materials for these presentations can be found in my public GitHub repository for the conference:
Here is a quick look at other noteworthy developments and releases from the past week:
- Apache Iceberg: A new article from The New Stack aims to dispel common myths about the complexity of open-source frameworks like Apache Iceberg. Additionally, the Snowflake Engineering blog released a post detailing the new features and fixes in Apache Iceberg 1.1.0.
- Snowflake: Snowflake has announced the general availability of Workspaces, a feature designed to enhance collaboration and organization. We also saw some great articles on using Snowflake Cortex Agents via a REST API and the upgrade of the open-source MCP Server for Snowflake.
Project Link: https://medium.com/@gabriel.mullen/ca-open-data-ai-agent-d09b10d09e32
Summary: This week, we're taking a look at the California Open Data AI Agent. Built in just 60 minutes using Snowflake, this agent demonstrates how to create a real-time Retrieval-Augmented Generation (RAG) workflow over live government data without setting up new servers. It showcases the power of agentic AI in synthesizing answers from thousands of datasets with clear citations.
Key Takeaway: The project highlights the practicality and speed of deploying production-ready, serverless agent solutions for real-world data challenges.
GitHub Link: https://github.com/agentscope-ai/agentscope
Summary: Agentscope is an agent-oriented programming library that makes it easier to build LLM applications. It's designed to be "developer-centric" with features like asynchronous execution, parallel tool calls, and real-time steering. It offers a transparent approach where prompt engineering and API invocation are fully visible and controllable.
Why it's important: Agentscope, along with its related libraries like agentscope-runtime and agentscope-studio, provides a comprehensive toolkit for not only developing but also deploying and visualizing agent-based applications.
Article Link: https://medium.com/@masato.takada/%EF%B8%8F-snowflake-cortex-agents-a-rest-api-guide-49b3a754ef92
Summary: The Snowflake Cortex Agent is a powerful AI data assistant that automates complex data workflows. This guide explains how to use its REST API to build applications that can orchestrate across both structured (using Cortex Analyst) and unstructured (using Cortex Search) data. It's designed to be secure, with existing Snowflake security controls applying automatically.
Key Concepts:
- Planning: The agent analyzes a request and creates a comprehensive plan.
- Tool Use: It selects the right tools (Cortex Analyst for SQL, Cortex Search for text).
- Reflection: It evaluates results and refines its approach.
Hugging Face Link: https://huggingface.co/google/vaultgemma-1b
Summary: VaultGemma is a variant of the Gemma family of open models from Google, but with a key difference: it's pre-trained from the ground up using Differential Privacy (DP). This provides strong, mathematically-backed privacy guarantees for its training data, making it a great choice for applications where data privacy is a critical concern.
Note: While it may have a utility trade-off compared to non-private models, its primary benefit is providing privacy by design, making it a significant step forward in private AI.
- Building Cortex Agents On Snowflake: Why It Matters And Best Practices: Building Cortex Agents On Snowflake: Why It Matters And Best Practices
https://github.com/timothyspann
© 2020-2025 Tim Spann https://www.youtube.com/@FLaNK-Stack

