Data Architecture for AI Agents: What to Build Before You Deploy
The layer-by-layer data architecture AI agents actually need — ingestion, storage, transformation, semantic, and serving layer — with real implementation decisions (Snowflake vs Databricks, dbt vs raw SQL) and the failure modes that kill agent deployments at each layer.
Talk to an expertEight in ten companies cite data limitations as their primary roadblock to scaling agentic AI. Fewer than 10 percent have scaled agents to deliver tangible value. The gap between those two numbers is not a model problem. It is a data architecture problem.
We see this pattern every time a client comes to us after a failed agent deployment. The agents were fine. The LLM was capable. The problem was the floor they were standing on. Pipelines with no freshness monitoring. Transformations with no tests. Business metrics defined five different ways in five different tables. Agents querying raw source data because nobody restricted access to governed views.
This guide is the one I wish existed when clients first started asking us to help them deploy AI agents. Most of what you read about data architecture for AI agents is either too abstract to act on or too vendor-specific to trust. This is neither. It is the layer-by-layer architecture your agents actually need, the real implementation decisions at each layer, and the failure modes that destroy agent deployments when each layer is built wrong.
Why AI agents change your data requirements entirely
A dashboard with a wrong number is a conversation starter. A finance team sees it, questions it, investigates. An agent with a wrong number is an automated decision executed before a human catches the mistake. It might adjust pricing, flag a customer for churn, trigger a workflow, or generate a board report. All before anyone notices the input was wrong.
Dashboards were built for human consumption. Humans resolve ambiguity. An analyst who sees "revenue" in a query result will pause and ask which revenue definition is being used. An agent will not. It picks one and acts.
This changes the data architecture requirements in ways most teams have not fully internalized. The data infrastructure that was good enough for self-service analytics is not good enough for autonomous agents. The gap between those two bars is where most enterprise AI deployments fail.
McKinsey's research is clear: nearly two-thirds of enterprises have experimented with agents, but fewer than 10 percent have scaled them to deliver tangible value. Shaky data is almost always the reason. Build bottom-up. The most common mistake is starting at the agent layer and discovering the foundation was never built.
Layer 1: The ingestion layer
Ingestion is where data enters your architecture. Sources include CRM systems, transactional databases, event streams, third-party APIs, marketing platforms, and ERP systems. The decisions you make here determine freshness, completeness, and reliability for everything downstream, including your agents.
The core decision: batch vs. streaming
Most companies default to batch ingestion using tools like Fivetran or Airbyte. Scheduled syncs pull data every hour, every four hours, or nightly. This is sufficient for analytics dashboards where a few hours of latency is acceptable.
For AI agents making operational decisions, it may not be. An agent managing customer escalations needs near-real-time CRM data. An agent optimizing ad spend needs fresh conversion signals. If your ingestion layer runs on four-hour batch cycles, your agents are making decisions on a four-hour-old reality.
Streaming ingestion, via Apache Kafka, Confluent, or Amazon Kinesis, enables continuous data flow. It is more complex to operate and more expensive to run. The question is not whether streaming is better in principle, but whether your agent use cases actually require it. Many do not. The mistake is building complex streaming infrastructure for use cases that would be perfectly served by reliable hourly batch ingestion.
The ingestion failure mode for agents
Pipeline gaps and schedule drift. A morning ETL job that fails silently at 6am leaves agents querying incomplete data at 9am. Without observability tooling (Monte Carlo, Bigeye, or dbt's built-in source freshness tests), nobody knows the ingestion layer failed until an agent produces a suspicious output. By then, the decision has already been made.
Build source freshness monitoring from day one. Your agents should never query a table that has not been refreshed within its expected window. That is not a nice-to-have. It is a prerequisite for production-grade agent reliability.
Layer 2: The storage layer
Once data is ingested, it needs somewhere to live. The storage layer decision has compounding downstream consequences for every AI agent you deploy. Teams that choose the wrong platform for their workload face friction at every layer above it.
The core decision: Snowflake vs. Databricks vs. BigQuery
This is the most debated question in the modern data stack and often the most oversimplified. Here is how we actually think about it with clients:
Snowflake is the right choice when your workloads are SQL-heavy, your team is primarily analytics-focused, and you want strong native governance capabilities. Snowflake's separation of storage and compute gives you predictable performance and cost control. Snowflake Cortex brings AI capabilities native to the platform, which matters when you want agents to operate directly on governed warehouse data without exporting to external services. For analytics engineering teams using dbt, Snowflake is the most mature and well-supported target.
Databricks is the right choice when your workloads span structured data, unstructured data, and ML pipelines simultaneously. When your team has strong Python skills and you need to build feature stores, train models, and serve predictions alongside analytical queries. Unity Catalog provides solid governance across the full lakehouse, including unstructured assets that Snowflake handles less naturally. If your agents need to work across structured tables and unstructured content in the same pipeline, Databricks' lakehouse architecture handles that more natively.
BigQuery is the right choice when you are committed to Google Cloud, want to leverage Vertex AI tightly, and have large-scale analytical workloads with variable query patterns. Its serverless model eliminates compute management overhead, which matters at high agent query volumes. The cost model can create surprises if agents generate unpredictable query spikes.
There is no universally correct answer. There is a correct answer for your team's skills, your workload profile, and your governance requirements. The storage layer is hard to migrate. Choose based on actual workload analysis, not vendor relationships or which platform your last employer used.
The storage failure mode for agents
Schema inconsistency and ungoverned raw table access. When agents can query any table in your warehouse, they will reach raw tables directly. Raw tables have column names like "Cust_ID_123" or "rev_adj_2023_Q2_FINAL." An agent querying those tables without semantic context will produce outputs nobody can validate or trust.
The storage layer must be organized so agents query governed views and modeled tables, never raw source data. Access controls enforced at the warehouse level, not assumed at the application level.
Layer 3: The transformation layer
Raw data is not agent-ready data. The transformation layer is where raw source data gets cleaned, modeled, and shaped into structures your agents can actually use. This layer matters more than most teams realize, and the wrong choice creates technical debt that directly degrades agent reliability.
The core decision: dbt vs. raw SQL vs. Spark transformations
Raw SQL transformations written directly in the warehouse are fast to build and nearly impossible to govern at scale. There is no version control, no automated testing, no documentation. A transformation that was correct in 2023 might be wrong in 2026 because a source schema changed and nobody updated the SQL. Agents querying the output of that broken transformation will produce wrong answers. With no lineage to trace, nobody will know why.
Teams that built their transformation layer in raw SQL before dbt was mainstream are now running agents on top of undocumented, untested, unversioned logic that nobody fully understands. The transformations work. The business logic they encode may be partially wrong, partially outdated, or inconsistent across teams. Nobody knows for certain because there is no test suite and no documentation.
dbt is the standard for governed transformation pipelines. Transformations are version-controlled in Git, tested against data quality expectations, documented inline, and compiled to SQL that runs in your warehouse. When you run dbt test on a model that an agent depends on, you know before deployment whether the data meets quality thresholds. When an agent produces a suspicious output, you can trace it through the dbt lineage graph from the agent's query to the source system in minutes, not hours.
Spark transformations in Databricks make sense for transformations involving large-scale unstructured data or complex ML feature engineering. For most analytics and agent use cases, dbt running against Snowflake, BigQuery, or Databricks SQL is the right choice. Do not add Spark complexity unless your workload genuinely requires it.
The practical question for teams already running raw SQL is not whether to switch to dbt, but how to migrate incrementally. Start with the models your agents will query first. Build tests. Document business logic. Move from ungoverned SQL to governed dbt models one use case at a time. This is slower than a full rewrite and far more likely to succeed.
The transformation failure mode for agents
Undefined or inconsistently applied business logic. When the definition of "active customer" is different in the marketing transformation, the finance transformation, and the product transformation, agents querying those metrics will contradict each other. An agent answering "how many active customers do we have?" might return a different number depending on which model it hits.
This is not a model hallucination problem. It is a transformation governance problem. Business leaders blame the AI. The real problem is the transformation layer had no single source of truth for business definitions. Which flows directly into the next layer.
Layer 4: The semantic layer
The semantic layer is where business logic becomes governed, machine-readable vocabulary. It is the layer that lets your agents query "revenue" and get one consistent, tested, version-controlled answer, regardless of the complexity of the underlying joins and filters that calculate it.
As TechTarget's analysts put it: accessing data without a semantic model is like trying to drive to your destination without a roadmap. The semantic model is the roadmap. For a complete breakdown of what the semantic layer is and why it has become critical infrastructure, read our full semantic layer guide. For agent architecture specifically, here is what you need to know.
The core decision: dbt Semantic Layer vs. Looker vs. Cube vs. Omni
All four define business metrics in a governed layer that sits between your transformation layer and the tools or agents that consume data. The differences matter for agent architectures specifically.
The dbt Semantic Layer (built on MetricFlow) integrates directly with your dbt transformation pipeline. Metric definitions live alongside your dbt models in version control. Any tool or agent that connects via the Semantic Layer API gets the same metric definition. This is the right choice if you are already on dbt and want a single source of truth maintained by your data engineering team. The principle here is "author once, reuse everywhere" -- definitions are platform-native assets, not embedded in individual charts or agent prompts.
Looker LookML gives you rich, expressive metric definitions with tight BI integration. For agent architectures, Looker's API makes metrics queryable programmatically, which means agents can query governed metrics rather than raw tables. The constraint is that LookML governance is tied to the Looker platform, which creates some coupling if you want to serve multiple downstream consumers.
Cube is designed specifically as a headless semantic layer that serves multiple consumers simultaneously. If you have multiple BI tools, multiple agents, and multiple downstream systems all needing the same metric definitions, Cube's API-first architecture handles that cleanly. It works across both Snowflake and Databricks, which matters for teams with mixed storage infrastructure.
Omni is newer but worth watching for teams that want semantic definitions tightly coupled to ad-hoc analytics. Its MCP integration means agents can query governed metrics directly through Claude and other AI assistants, which reduces the integration work required to connect governed data to AI tooling.
The semantic layer is not optional for agent architectures. It converts your data warehouse from a collection of tables into a governed vocabulary that agents can reason over reliably. Without it, you are asking agents to interpret raw data the same way a senior analyst would, and they cannot.
The semantic failure mode for agents
Definition hallucination. When agents have no semantic layer to query, they infer metric definitions from table and column names. "Total_Rev" might mean gross revenue, net revenue, recognized revenue, or deferred revenue depending on which table it comes from. Agents will hallucinate a definition and apply it consistently enough that outputs look plausible. Nobody catches the error until it shows up in a board deck or a financial model.
This is one of the most common and most preventable causes of AI agent failure on business data. The fix is not a better model. The fix is a governed semantic layer.
Layer 5: The serving and agent layer
The serving layer is how agents access data and take action. This layer includes the query interfaces agents use, the retrieval mechanisms for unstructured content, memory and context management, and the governance controls that determine what agents can see and do.
The core decision: semantic API vs. MCP vs. direct SQL vs. RAG
Semantic layer APIs (via dbt Semantic Layer, Looker, or Cube) give agents access to governed metric definitions without exposing raw tables. This is the recommended approach for agents making business decisions. The agent queries a metric, not a table. The semantic layer handles the underlying complexity, governance, and access controls.
MCP (Model Context Protocol) is an emerging standard that allows AI assistants like Claude to connect to data sources via standardized interfaces. Several semantic layer tools including Omni and Cube now offer MCP servers, which means agents can query governed metrics through a protocol that handles authentication, permissions, and data scoping automatically. This reduces the custom integration work required to put governed data in front of agents.
Direct SQL gives agents the most flexibility but the least governance. An agent with SQL access to your warehouse can query any table, including raw ungoverned ones. This is appropriate only when the query space is tightly constrained and the layers below have already governed what the agent can reach. Do not give agents direct SQL access to your warehouse and assume they will stay on the governed path. They will not.
RAG (Retrieval-Augmented Generation) is used for unstructured content: policy documents, customer emails, support tickets, product documentation. RAG retrieves relevant context from a vector database and provides it to the agent at query time. It is complementary to the semantic layer, not a replacement. Structured data needs a semantic layer. Unstructured data needs RAG. Most production agent deployments need both.
For everything involved in safely deploying this layer at scale, read our guide to deploying AI agents in production.
The serving failure mode for agents
Governance bypass. When agents have multiple data access paths, they will use the path of least resistance. If that path bypasses the semantic layer and goes directly to raw tables, you lose all the governance you built into Layers 1 through 4. Access controls must be enforced technically at the serving layer, not assumed from agent behavior. Good intentions are not a governance strategy.
How each layer connects to specific agent failure modes
The five layers above are not abstract. Each one, when built wrong, creates a predictable category of failure. Understanding the mapping helps you prioritize where to invest first.
Ingestion failures create stale-data decisions. When pipeline monitoring is absent and agents query data that has not refreshed, they make decisions on an outdated reality. These failures are silent. The agent operates normally. The output looks plausible. The underlying data is hours or days old. In operational contexts, stale data is wrong data.
Storage failures create schema trust gaps. When agents can reach ungoverned raw tables, they operate without business context. Column names, data types, and null conventions differ between source systems. Agents will query what they can find, not what is governed. The output may be mathematically correct given the inputs and semantically meaningless in a business context.
Transformation failures create metric inconsistency. When the same business concept is calculated differently across models, agents answering the same question at different entry points will produce different answers. This is not hallucination. It is a governance failure presenting as a model failure. Executives blame the AI. The real problem is in the transformation layer.
Semantic layer failures create definition hallucination. Without a governed semantic layer, agents infer what metrics mean from the data they find. They will be internally consistent and factually wrong. This is the failure mode that destroys trust in AI agents most quickly, because the outputs look authoritative until someone who knows the business digs in.
Serving layer failures create governance bypass. Agents with unrestricted data access will find pathways to raw data over time. Whether through tool use, SQL generation, or API exploration, the governed path is not automatically the only path. Serving layer governance must be enforced technically, not assumed.
All five of these failure modes are preventable. They require governance discipline applied at the right layer. For the complete governance framework, read our guide to data governance for AI.
The Intelligence Allocation Stack: why you must build from the bottom
At Unwind Data, we use a framework called the Intelligence Allocation Stack to explain the correct sequence for building AI capabilities. The five layers above map directly to it. The framework has four levels:
Layer 1: Data Foundation. Data governance, data quality, ingestion pipelines, warehousing, and single source of truth. This corresponds to the ingestion and storage layers above. It is the floor everything else stands on.
Layer 2: Semantic Layer. Business logic translated for machines. Metric definitions, governed vocabulary, and machine-readable business rules. Implemented via dbt Semantic Layer, Looker, Cube, or Omni. This is where raw data becomes agent-interpretable vocabulary.
Layer 3: Orchestration Layer. Data pipelines, CRM syncs, reverse ETL, workflow automation, API integrations, and real-time event processing. This is the transformation and serving infrastructure that keeps data fresh and makes it accessible to agents when they need it.
Layer 4: AI Layer. AI agents, conversational AI, autonomous systems, and predictive models. This is where agents live. It is the most visible layer. It is the one executives get excited about. It is entirely dependent on the three layers below it.
The pattern we see in failed deployments is organizations building Layer 4 first. They connect an LLM to the warehouse, deploy an agent, and discover three weeks later that the agent is inconsistent, unauditable, and producing outputs nobody can explain. They blame the model. They upgrade the prompt. The problem does not go away because the problem is in Layer 1.
Start at Layer 1, not Layer 4. This is the single most important architectural principle for AI agent deployments. For every dollar spent on AI agents, six should go to the data architecture underneath them. That ratio reflects the actual investment distribution of organizations that have successfully scaled agents past pilot.
For a deeper look at what building Layer 1 correctly involves, read our guide to data foundations for AI.
Snowflake vs. Databricks: the storage decision that cascades everywhere
The storage layer decision deserves its own section because it shapes every decision above it. Teams that choose the wrong platform for their workload face compounding friction at the transformation, semantic, and serving layers.
Here is how we frame the decision with clients at Unwind Data:
Choose Snowflake when your team is primarily SQL-native, your workloads are structured analytics, you want strong native data sharing and governance, and you are planning to use Cortex AI for in-warehouse AI capabilities. Snowflake's architecture is well-suited to analytics engineering teams using dbt. The governance tooling, particularly for data contracts and access controls, is mature.
Choose Databricks when your workloads span structured data, unstructured data, and ML pipelines. When your team has strong Python skills and you need to build feature stores and train models in addition to running analytical queries. Unity Catalog handles governance across the full lakehouse, including file-based and unstructured assets. If your agents need to work across multiple data modalities, Databricks' architecture handles that more naturally.
Choose BigQuery when you are committed to Google Cloud, want Vertex AI integration, and have large analytical workloads with variable query patterns. The serverless model eliminates compute management overhead but requires careful cost governance at agent query volumes.
The most expensive storage layer decision is the one made based on vendor pressure or convenience. The storage layer is hard to migrate. A wrong call here means friction at every layer above it for the lifetime of your data architecture.
dbt vs. raw SQL: why the transformation decision determines agent quality
This is where we see the most preventable technical debt. Teams that built their transformation layer in raw SQL before dbt was mainstream are now running agents on top of logic that nobody fully understands. The transformations work, mostly. The business logic they encode is partially outdated and partially inconsistent. There is no test suite and no documentation to confirm otherwise.
Deploying agents on top of that foundation is not a data architecture decision. It is a liability decision.
dbt solves this by making transformations first-class software artifacts. Every model is version-controlled. Every model has tests. Every model is documented. The dbt lineage graph maps every dependency from source to output. When an agent produces a suspicious result, you trace it from output to source in minutes.
The incremental migration path matters here. Do not attempt a full rewrite of your transformation layer before deploying agents. Start with the models your agents will query first. Build tests for those models specifically. Document the business logic they encode. Move from ungoverned SQL to governed dbt models one use case at a time. This is slower than a full rewrite and far more likely to succeed in an organization with existing production workloads.
The practical agent architecture readiness checklist
Before deploying any AI agent in production, your data architecture should pass every item below. These are not aspirational goals. They are the minimum requirements for agents that produce outputs you can trust and defend.
Ingestion layer
- Source freshness monitoring is active and alerting on failures before agents query
- Pipeline SLAs are defined: which tables must be fresh within what time window
- Agents are blocked from querying tables that have missed their freshness SLA
- Batch vs. streaming decision is justified by agent latency requirements, not defaults
Storage layer
- Raw source tables are not directly accessible to agents
- Schema conventions are documented and enforced across models
- Data contracts exist between source system owners and the data team
- Row-level security is configured for sensitive data domains
- The platform choice (Snowflake, Databricks, BigQuery) matches your actual workload profile
Transformation layer
- All business-critical transformations that agents will query are in dbt or equivalent governed tooling
- dbt tests cover nullability, uniqueness, and referential integrity for agent-facing models
- Transformation lineage is documented and traceable from output to source
- Business logic changes go through review and testing before deployment
Semantic layer
- Every metric an agent will query is defined in the semantic layer
- Metric definitions are version-controlled and tested
- There is exactly one definition for each business metric, no synonyms and no alternatives
- Agents query metrics via the semantic API, not directly against underlying tables
Serving layer
- Agent data access is scoped to governed views and semantic layer endpoints
- Authentication and authorization are enforced at the infrastructure level
- Agent query logs are captured and auditable
- Escalation paths exist when agents encounter data they cannot interpret confidently
Four gut-check tests for any agent deployment
Run these before signing off on production deployment:
The three-person test. Can three different people in your organization run the same query and get the same answer? If not, your definitions are not governed and your agents will produce inconsistent outputs.
The vacation test. If your most knowledgeable data engineer takes two weeks off, can your agents still operate correctly? If not, your architecture depends on tribal knowledge that never made it into documentation or code. Stanford research shows AI has already cut entry-level technical roles significantly. Teams are smaller. The tribal knowledge held by those roles does not migrate to documentation automatically. It disappears.
The audit test. If the CFO questions a number an agent produced in a board presentation, can you trace it from output to source system in under 30 minutes? If not, your lineage is incomplete and your agents are not production-ready for high-stakes use cases.
The swap test. If your AI model provider doubled their prices tomorrow, could you switch providers without rebuilding your data infrastructure? If not, you have vendor concentration risk at the agent layer that your data architecture should protect against. The data foundation, semantic layer, and orchestration should be provider-agnostic by design.
What this actually costs to get right
The question we hear most often from data leaders: what does it cost to build this, and how long does it take?
The honest answer depends entirely on where you start. Teams on a modern stack -- Snowflake or Databricks, dbt, Fivetran -- with reasonable data quality may only need to add a semantic layer and tighten governance. That is weeks of work, not months.
Teams running on legacy infrastructure with undocumented SQL transformations, no data quality monitoring, and no semantic layer are looking at a more significant investment. The agents they want to deploy in six months require a foundation that takes three to six months to build correctly. The alternative is deploying agents on broken foundations and discovering the problems one production incident at a time.
The calculation that changes every data leader's perspective on this investment is the cost of a wrong agent decision at scale. An agent making incorrect pricing recommendations across 50,000 SKUs. An agent flagging healthy accounts as churn risks and triggering retention campaigns. An agent generating financial summaries with the wrong revenue definition for the board.
These are not edge cases. They are the predictable outputs of agents deployed on ungoverned data architecture. For every dollar spent on AI agents, six should go to the data architecture underneath them. That ratio feels counterintuitive until you calculate what a single large-scale agent error costs in recovery, trust, and rework.
Building the architecture agents actually need
At Unwind Data, we have implemented agent-ready data architecture across fintech, e-commerce, SaaS, and sustainability verticals. From co-founding DataBright in 2018 to the work we do now, the lesson is consistent: the organizations that scale agents successfully are the ones that built the foundation before the agents arrived, not after.
We build the ingestion layer, storage layer, transformation layer, semantic layer, and serving governance that agents require. Provider-agnostic by design, built for auditability, with agents consuming a single source of truth from day one.
The architecture described in this post is not theoretical. It is the architecture running underneath agent deployments that are generating reliable outputs in production today. Fix the floor before you let the agents run.
Stay current on AI in data
Hands-on insights on deploying AI agents, LLMs, and automation in real data workflows.
Unwind Data
Speak with a data expert
We've helped scale-ups and enterprises move faster on exactly this kind of work — without the trial and error. Strategy, architecture, and hands-on delivery.
Schedule a consultation