Architecting a Modern Data Stack for AI Agents
CO-LOCATE COMPUTE AND STORAGE
Even if you’re looking at modern centralized data architectures, you’re mostly seeing a data lakehouse—not a data warehouse or data lake—as it has advantages of both. Often these architectures use technologies such as Apache Spark, Apache Hudi and the Databricks stack, Snowflake, and/or Apache Iceberg. The lakehouse is a brilliant architecture. I especially love that Iceberg is a compatible data storage environment for multiple query and data processing engines.
That’s powerful.
The other advantage of the lakehouse architecture is how it keeps data storage and data computation separate, so they can scale separately, depending on the need at the moment. This is a bit of a downfall when it comes to being a good foundation for AI. This is useful because they can be scaled separately. A lakehouse is not a bad starting point for pooling static data from across an enterprise, but, in the end, it becomes just another data source for your AI agents, not the data foundation.
For real-time response, co-locating compute and storage is just flat, faster than having them separate. This can sometimes mean “traditional” databases, which have come a long way in the last decade or so. A lot of modern databases handle vector and unstructured data, scale indefinitely in clusters or serverless on clouds, do isolated data transformation and cleaning without interfering with query response speed, handle analytical and transactional workloads simultaneously, and respond to queries in sub-100 millisecond ranges.
STORE AND PROCESS DATA IN MEMORY
A high-scale in-memory database, such as SingleStore or Aerospike; a data grid combined with in-memory computation, such as Hazelcast or Apache Ignite; at least a caching system, such as Redis, memcached, or Oracle Coherence; or some combination of all three needs to be the heart of an AI agent’s data architecture for several reasons:
- Low-latency query response
In-memory processing has inherent response speed advantages. Slow response latency can kill an otherwise useful agent. Agents are generally used to automate a task or set of tasks, respond to human input directly, or both. Humans are impatient, and machines are even more so. If an agent is detecting fraud, for instance, it needs to do it in milliseconds, before the transaction clears. If it’s talking to a customer, it needs to answer the customer’s questions, even if that involves large-scale data searches, before the customer gets bored or frustrated and takes their business elsewhere.
- Streaming and at-rest data processing together
Combining streaming data with its at-rest context is essential to understanding most situations and responding intelligently. Many streaming data processors, such as Apache Flink, operate independently of at-rest data stores, or treat everything—even historical data—as a stream, such as ksqlDB, which is inefficient. Data streams, such as logs, sensor readings, incoming transactions, or change data capture from source systems, provide what is happening right now, but the key to making sense of that is often knowing what came before. Answering questions such as, “Is X stock doing better now than it has in the last 6 months?” “Is the sensor’s current temperature typical for this time of day and year?” and “Is this login attempt fraudulent?” requires both current and historical data.
- Caching of repetitive query responses; recent, hot data; and relevant subject data
Repetitive queries are a huge waste. Intelligent caching reduces API calls, which can cost a lot in production LLMs. By caching responses, you can often cut AI operating costs by half or more. By holding the most recent, most often used, or most likely to be needed data in a memory cache, you reduce response latency for the majority of queries. This cache can also be a place to store prompt templates and other AI-specific data.
BUILDING AGENTIC WORKFLOWS
Now that you have the basic data management foundation, you can build agentic workflows on solid ground. Everything up to this point has been good advice for building any high-scale, fast-data, or fast-response data architecture. But for an organization diving into AI agents, that’s just the starting point.
You may also need a vector database for storage and search of embeddings. Vector embedding and vector search functionality are part of most in-memory data storage and processing technologies, although the vector capabilities are often immature. Embedding company data and storing it as searchable vectors are essential to retrieval-augmented generation (RAG) workflows that add important, purpose-specific data to generalized language models. If the vector capabilities in your central data grid or in-memory database aren’t mature enough, you may have to add a dedicated vector database such as Pinecone, Milvus, or Weaviate.
You will need a semantic database or graph knowledgebase. Having all the data is great, but it’s especially great when your AI agent can find it. Locating the correct data quickly, and trusting that it’s the correct definition of say, “customer,” for the question being asked, makes a huge difference. That’s old-school Data Governance 101.
A solid semantic layer, or knowledgebase, is even more important. GraphRAG using a knowledge graph can provide a better response in many cases than vector search-based RAG workflows. GraphRAG particularly helps language models locate and relate semantically similar information across large or disparate sources.