Making Your Data Platform Agent-Ready: A 5 Dimension Framework and Scorecard

In our previous post, we explained why data platforms are fundamentally different from application codebases for AI agents - the metadata problem, the multi-tool problem, the blast radius problem, and the verification gap. We showed six predictable ways agents fail when platforms aren’t data ready.

This post is the fix. A five-dimension framework for tailored data platforms, with concrete examples and a scored rubric you can use to assess where you stand today.

Each dimension is scored 0-3, for a maximum of 15. For each dimension, we'll explain the core idea, walk through a before-and-after example, and give you the single highest-leverage quick win.

The 5 dimensions of data platform AI-readiness

Dimension 1: Agent Configuration & Context

Does the agent know how your data platform works?

This is the closest to the original framework's "Rules File & Agent Config" dimension, but the data platform version demands more specificity. An application agent mostly needs to know the language, the test command, and the project structure. A data agent needs to know the warehouse dialect (Snowflake SQL is not BigQuery SQL), the transformation tool and how to run it, the naming conventions for models, and - critically - what operations are forbidden.

Before: Your team asks an agent to add a new metric for customer churn. The agent writes BigQuery-flavored SQL against your Snowflake warehouse. It doesn't know you use dbt, so it writes raw DDL. It can't figure out how to run tests. When it encounters an error, it issues a DROP TABLE to clean up and start fresh - against your production schema.

After: Your CLAUDE.md specifies that the warehouse dialect is Snowflake, includes dbt build --select state:modified+ as the test command, provides a one-paragraph map of the repo structure (models/staging/, models/intermediate/, models/marts/), and lists naming conventions (stg_, int_, fct_, dim_). A PreToolUse hook blocks any command containing DROP or TRUNCATE targeting production schemas. Domain-specific rules in models/finance/.claude.md specify finance-specific conventions for revenue recognition timing.

Quick win: Write a 30-line CLAUDE.md with four things: your warehouse dialect, the dbt build command, a one-paragraph repo map, and a "do not" list covering destructive operations. This takes about 20 minutes and immediately prevents the most common agent failures.

Dimension 2: Schema & Metadata Readiness

Can the agent understand your data without asking a human?

This dimension has no direct equivalent in the codebase framework, and it might be the single most important one for data platforms. Column descriptions are the highest-leverage investment you can make for agent readiness. Without them, the agent is pattern-matching on column names and hoping for the best. With them, the agent has the context it needs to write semantically correct queries.

Before: An agent is asked to build a report on revenue trends. It finds users.xf_rev_3 and orders.gmv_net. It guesses that xf_rev_3 is some kind of revenue field and gmv_net is gross merchandise value. It joins them together. It produces a query that "looks right". But it double-counts refunded orders because gmv_net is gross of refunds and xf_rev_3 is net, and nobody documented the difference.

After: Every column in your dbt project has a description in schema.yml. The agent reads that gmv_net is "Gross merchandise value minus refunds and cancellations, in USD" and that xf_rev_3 has been deprecated in favor of net_revenue_usd. It writes the correct calculation on the first try. A semantic layer defines "churn rate" as a company-standard metric, so the agent uses the same formula everyone else does.

Quick win: Add column descriptions to your top 10 most-queried models. You can identify these by checking your dbt docs site or running dbt ls --output json. If you can only do one thing from this entire post, do this. It takes one to two hours and has an outsized impact on agent accuracy.

Dimension 3: Pipeline & Model Organization

Can the agent navigate your DAGs and models confidently?

This adapts the codebase framework's "File Organization" dimension for the specific structure of data pipelines. The core insight is the same - if a human needs to "just know" where things live, an agent will get lost - but the implementation is different. Data pipelines have a natural layered architecture (sources → staging → intermediate → marts) that, when followed consistently, gives agents a powerful navigational framework. When it's not followed, agents flounder.

Before: Your dbt project has 200 SQL files in a flat models/ directory. A file called revenue.sql does extraction, cleaning, three different joins, business logic for revenue recognition, and a final aggregation - all in 400 lines with unnamed CTEs. When an agent is asked to add a new dimension to the revenue model, it doesn't know if it should modify revenue.sql, create a new file, or look for an intermediate model that might already have the join it needs. It creates a new 15-CTE model from scratch, duplicating logic that already exists in three other files.

After: Your project follows a clear layered architecture: stg_stripe__payments.sql → int_payments__joined.sql → fct_orders.sql. The agent infers the layer and domain from the filename alone. Each model does one logical transformation. CTEs are named (import_payments, filter_active, calculate_totals, final). When asked to add a dimension, the agent modifies only fct_orders.sql, joining to the appropriate intermediate model.

Quick win: Rename your top 5 models to follow the layer_source__entity convention (stg_stripe__payments, int_orders__enriched, fct_revenue). This takes about 30 minutes plus a dbt build to verify nothing breaks, and immediately makes those models agent-navigable.

Dimension 4: Data Testing & Quality Gates

Can the agent verify its work without a human checking query results?

This adapts the codebase framework's "Test & Verification" dimension, but the data version is arguably more critical. In application code, a wrong function usually produces a visible error. In data, a wrong query produces a plausible-looking number. Without tests that encode business rules - not just schema checks - the agent has no feedback signal to tell it whether its output is correct.

The codebase framework introduced the concept of "protected baseline tests" - tests the agent can't modify or delete. This matters even more in data, where the temptation for an agent to "fix" a failing test by relaxing the threshold is high and the consequences are invisible.

Before: An agent modifies your revenue model. The model has two tests: not_null on order_id and unique on order_id. The agent's change subtly alters the join logic - it fans out on a many-to-one relationship, inflating row count by 8% and revenue by 12%. Both schema tests pass. The numbers are just plausible enough that nobody questions them until the monthly reconciliation surfaces a discrepancy two weeks later. This is the scenario that should keep you up at night: not the 200% spike that triggers immediate alarms, but the 12% drift that's quiet enough to survive a dashboard glance.

After: Your baseline tests include not_null, unique, and accepted_values on every model, plus custom data tests: "total revenue must be within 5% of yesterday's value," "row count must not change by more than 10% day-over-day," "order_date must always be less than or equal to ship_date." The agent's grain-changing edit triggers the row-count test immediately. The agent sees the failure, diagnoses the problem, and fixes the model before opening a PR. The baseline tests are marked as protected - the agent can add new tests but cannot modify or remove existing ones.

Quick win: Add one custom data test to your most critical revenue or user model - something that encodes a business rule, not just a schema check. A row-count stability test or a revenue-range test takes about 30 minutes to write and immediately gives the agent a meaningful verification signal.

Dimension 5: Observability & Lineage

Can the agent understand what breaks when something changes?

This dimension is new - it doesn't appear in the codebase framework at all. Application code has compilers and type checkers that catch most dependency-breaking changes at build time. Data platforms don't. If an agent renames a column in a staging model, there's no compiler that tells it which downstream marts reference that column. Without queryable lineage, the agent makes changes in isolation and hopes for the best.

Before: An agent is asked to refactor a staging model by renaming a column from user_id to customer_id for consistency. It makes the change. Twelve downstream models reference user_id from that staging model. All twelve break silently - they either error on the next dbt run (best case) or, if they have coalesce fallbacks, start producing nulls (worst case). The data team discovers the breakage when a Slack message lands: "the dashboard looks weird."

After: Before making any changes, the agent queries manifest.json to check downstream dependencies. It sees 12 models depend on the staging model, identifies the 3 that directly reference the user_id column, and updates all of them in the same PR. It runs dbt build on the full downstream graph to verify everything passes.

Quick win: Make sure manifest.json (or your catalog artifact) is accessible in the repo or generated as a CI build artifact. If you're using dbt, add dbt docs generate to your CI pipeline. This takes about 15 minutes and gives the agent access to the full dependency graph.

Quick wins by persona

Pick your role, do the quick win. Each takes less than an hour.

The Analytics Engineer with no rules file. Your agent writes wrong-dialect SQL and can't figure out how to run tests. Write a 30-line CLAUDE.md with your warehouse dialect, the dbt build command, your naming conventions, and a "do not" list. Twenty minutes.

The dbt Project Lead with sparse documentation. Your agent guesses at what columns mean and produces metrics that don't match your company's definitions. Add column descriptions to your top 10 models in schema.yml. Focus on the marts that analysts query most often. One to two hours.

The Data Platform Engineer with a flat model structure. Your agent can't find the right model to modify and keeps creating duplicate logic. Implement the stg_ / int_ / fct_ / dim_ naming convention and folder structure. Start with one domain (e.g., models/finance/). Thirty minutes.

The Airflow or Dagster Owner with monolithic DAGs. Your agent can't isolate a single workflow to modify because everything is tangled in one mega-DAG. Split your largest DAG into per-domain DAGs with explicit dependencies. One hour.

The Data Team Lead with no CI on data PRs. Every agent PR requires a full manual SQL review because there's no automated verification. Add a CI pipeline: sqlfluff lint → dbt build → dbt test → dbt docs generate. Two hours for the first setup, but it pays for itself on the first agent PR you can merge without reading every line of SQL.

The Scorecard

Below is a full Data Platform AI-Readiness Scorecard that you can use to assess your platform across all five dimensions. Each dimension scores 0–3 based on specific, auditable criteria, with a set of honest-answer audit questions to calibrate your assessment.

0–5: Your platform is fighting the agent. It's guessing at schemas, can't verify anything, and has no guardrails. Start with a rules file, add column descriptions to your top 10 models, and configure a SQL linter.

6–9: Foundations are there, but gaps remain. The agent can do basic work but still needs hand-holding on most tasks. Add protected baseline tests, document your naming conventions, and make lineage queryable.

10–12: Strong position. The agent can navigate, build, and verify with moderate autonomy. Push on automated impact analysis, data contracts, and CI gates that give you confidence to merge agent PRs without manual SQL review.

13–15: Leading edge. Your data platform is an agent-native development environment. Extend to cross-platform awareness where the agent understands warehouse, orchestrator, and BI layer as one system.

Our suggestion: have each person on your data team score independently, then compare. The disagreements are the most valuable part - they reveal where the team's mental model of the platform diverges from reality. You can also paste the scorecard directly into Claude Code and have it audit your dbt project automatically.

How to use this

Don’t score this alone. Have each person on your data team score independently, then compare. The gaps between answers matter more than the score itself - they reveal where the team’s mental model of the platform diverges from reality. You can also paste the scorecard directly into Claude Code and have it audit your dbt project automatically.

Here’s the thing about the entire framework: every item on it also makes your data platform better for humans.

Column descriptions make onboarding faster. Naming conventions make code reviews easier. Protected tests prevent accidental regressions. Queryable lineage reduces debugging time. CI pipelines catch errors before they reach production. These are best practices the data community has talked about for years. Most teams know they should do them. Most teams haven't done all of them.

AI agents change the economics. The ROI of writing column descriptions was always positive, but it was diffuse and hard to quantify - faster onboarding, fewer Slack questions, slightly more accurate ad-hoc queries. With agents, the ROI is immediate and concrete: the agent either writes the correct query or it doesn't. The agent either finds the right model or it creates a duplicate. The agent either verifies its work or it ships wrong numbers.

Context has become the bottleneck for AI adoption in data engineering. The problem isn’t the agent. It’s that nothing feeds it a trustworthy, living understanding of the environment. The teams getting the most out of AI agents aren’t using better prompts - they’re investing in the boring stuff. Column descriptions. Naming conventions. Business-rule tests. Queryable lineage. The work that makes their platform legible to agents and to the humans who will work alongside them.

Data Platform AI-Readiness Scorecard

Score your data platform 0–3 on each dimension. Max score: 15.

Dimension 1: Agent Configuration & Context

Does the agent know how your data platform works?

Score	Criteria
0	No CLAUDE.md / rules file in your data repo. No description of warehouse dialect, orchestration tool, or project conventions. Agent must guess everything.
1	Rules file exists but is generic ("write clean SQL") or bloated (>300 lines). Missing: warehouse dialect, schema naming conventions, how to run/test models, environment setup.
2	Concise rules file (<200 lines) with: warehouse dialect and connection context, exact commands to run and test models (dbt build, airflow dags test), repo map of key directories (models/, dags/, macros/), definition of done (tests pass, docs generated, contracts honored).
3	All of 2, plus: folder-level rules for different domains (e.g., models/finance/.claude.md with finance-specific conventions); hooks or guardrails (e.g., auto-run dbt build --select state:modified+ after edits, block DROP TABLE in production); custom slash-commands for common workflows (backfill a model, add a metric, scaffold a new source).

Audit Questions

Can an agent run your models and tests in one command without asking you how?
Does your rules file specify the warehouse dialect (Snowflake SQL, BigQuery, Databricks SQL)?
Is there a repo map showing where staging models vs. marts vs. sources live?
Does the agent know your naming conventions (e.g., stg_, int_, fct_, dim_)?
Are dangerous operations (DROP, TRUNCATE, production writes) guarded by hooks or at minimum called out in a "do not" list?

Dimension 2: Schema & Metadata Readiness

Can the agent understand your data without asking a human?

Score	Criteria
0	No column descriptions, no schema docs, no data dictionary. Table and column names are cryptic (tbl_1, col_a, xf_rev_3). Agent is flying blind.
1	Some documentation exists (a wiki page, a stale spreadsheet) but it's not co-located with the code. Column descriptions are sparse or outdated. Business terms aren't defined anywhere an agent can access.
2	Descriptions live in code — dbt schema.yml with descriptions on every model and every column (or equivalent in-warehouse comments). Consistent naming conventions that encode meaning (order_created_at, is_active_customer). Source freshness and accepted values documented.
3	All of 2, plus: a machine-readable business glossary or semantic layer (metrics definitions, entity relationships); dbt contracts or column-level typing enforced on marts; exposures or data products documented so the agent knows downstream consumers; descriptions reference related models/dashboards for cross-navigation.

Audit Questions

Pick a random mart table. Can an agent understand what every column means without asking a human?
Are your column names self-describing, or do they require tribal knowledge (e.g., rev_3 means "revenue net of refunds, version 3")?
Is there a single source of truth for metric definitions that an agent can read (not a Confluence page)?
Do your dbt models have schema.yml entries with descriptions for every column?
Are dbt contracts enforced on your most critical models?
Do you document which dashboards and downstream systems consume each model (exposures)?

Dimension 3: Pipeline & Model Organization

Can the agent navigate your DAGs and models confidently?

Score	Criteria
0	SQL files dumped in a flat directory. Mixed concerns — a single model does extraction, transformation, business logic, and metric calculation. DAG files are monolithic. No consistent layering.
1	Some structure exists (a models/ folder, separate DAG files) but inconsistent. Models mix staging and business logic. Naming doesn't reflect the DAG layer. Macros are undocumented or duplicated.
2	Clear layered architecture the agent can reason about: staging/ → intermediate/ → marts/ (or equivalent). One model = one logical transformation. File names are greppable and encode layer + domain (stg_stripe__payments.sql, fct_orders.sql). DAGs/workflows are modular (one DAG per domain or cadence, not one mega-DAG). Macros and shared logic are documented with clear interfaces.
3	All of 2, plus: header comments in SQL files explain purpose, grain, and upstream/downstream dependencies. Models use CTEs with named steps (import, filter, transform, final) that an agent can parse. Jinja/macro interfaces are typed or documented with expected inputs. DAG dependencies are explicit (not relying on implicit scheduling order). Folder structure mirrors your business domains.

Audit Questions

How many layers does your dbt project have? Can an agent identify the layer from the filename alone?
Pick a random model. How many CTEs does it have? Does each CTE have a clear, named purpose?
Are your Airflow DAGs modular (per-domain) or monolithic (one mega-DAG)?
Can an agent find the right model by name alone, without reading the SQL?
Do your macros have doc blocks explaining what they do and what arguments they expect?
Is there a single model doing too many things (extraction + cleaning + business logic + aggregation)?

Dimension 4: Data Testing & Quality Gates

Can the agent verify its work without a human checking query results?

Score	Criteria
0	No data tests. No linting. Agent can write any SQL and ship it with zero verification. "We check the dashboard after deploy."
1	Some dbt tests exist (not_null, unique on primary keys) but coverage is spotty. No SQL linting (sqlfluff/sqlfmt). Agent can modify or delete existing tests freely. Test commands aren't documented in the rules file.
2	Solid baseline test coverage: schema tests on all models (not_null, unique, accepted_values, relationships). Baseline tests are protected — agent cannot modify them. Agent can add NEW tests for new models. SQL linter configured and runnable in one command. Clear test commands in rules file.
3	All of 2, plus: custom data tests for business logic (e.g., "revenue must be non-negative," "order_date <= ship_date"). dbt build (run + test) runs automatically after model changes via hooks. Unit tests for complex transformations (dbt unit tests or equivalent). CI pipeline runs on every PR: lint → build → test → docs. Data contracts on critical marts reject schema-breaking changes before merge.

Audit Questions

Can the agent run all data tests in one command?
What percentage of your models have schema tests? (Be honest.)
Are baseline tests protected from agent modification?
Do you have custom tests that encode business rules, or just generic not_null/unique?
Does your CI pipeline catch a broken model before it hits production?
Is SQL linting configured and part of the development workflow?
Do you use dbt unit tests for complex Jinja or multi-CTE transformations?

Dimension 5: Observability & Lineage

Can the agent understand what's happening across the platform — and what breaks when something changes?

Score	Criteria
0	No lineage visibility. No freshness monitoring. Agent has no way to understand upstream/downstream impact of a change. Debugging means reading Slack threads.
1	Basic lineage exists (dbt docs site or warehouse lineage view) but isn't accessible to the agent programmatically. Some freshness checks, but alerts go to email/Slack, not somewhere an agent can query. Logs exist but aren't structured.
2	Agent can access lineage information (dbt manifest.json, or a queryable catalog). Source freshness is configured and the agent can check it. Structured logging on DAG runs (success/failure, row counts, duration). The agent can answer "what models depend on this source?" without asking a human.
3	All of 2, plus: automated impact analysis — agent can determine blast radius of a schema change before making it. Anomaly detection on key metrics (row count drift, value distribution shifts). Run metadata is queryable (the agent can check "did this model succeed in the last run?"). Data SLAs are codified and the agent can validate against them.

Audit Questions

If the agent modifies a staging model, can it determine which downstream marts and dashboards are affected?
Can the agent check whether a source table was refreshed today — without asking a human?
Are your DAG run logs structured and queryable, or buried in Airflow's UI?
Does the agent have access to manifest.json or an equivalent lineage artifact?
If a model's row count drops 50%, would anyone (or anything) notice before a stakeholder complains?
Are data SLAs written down anywhere an agent can read?

Quick Interpretation

Score	What It Means	Next Step
0–5	Your platform is fighting the agent. It's guessing at schemas, can't verify anything, and has no guardrails.	Start with a rules file, add column descriptions to your top 10 models, and configure a SQL linter.
6–9	Foundations are there, but gaps remain. Agent can do basic work but still needs hand-holding.	Add protected baseline tests, document your naming conventions, and make lineage queryable.
2	Strong position. Agent can navigate, build, and verify with moderate autonomy.	Push on automated impact analysis, data contracts, and CI gates that give you confidence to merge agent PRs without manual SQL review.
3	Leading edge. Your data platform is an agent-native development environment.	Extend to cross-platform awareness (agent understands warehouse + orchestrator + BI layer as one system).

Your Score

Dimension	Score	Notes
Agent Configuration & Context
Schema & Metadata Readiness
Pipeline & Model Organization
Data Testing & Quality Gates
Observability & Lineage
Total	/15

Making Your Data Platform Agent-Ready: A 5 Dimension Framework and Scorecard

Dimension 1: Agent Configuration & Context

Dimension 2: Schema & Metadata Readiness

Dimension 3: Pipeline & Model Organization

Dimension 4: Data Testing & Quality Gates

Dimension 5: Observability & Lineage

Quick wins by persona

The Scorecard

How to use this

Dimension 1: Agent Configuration & Context

Audit Questions

Dimension 2: Schema & Metadata Readiness

Audit Questions

Dimension 3: Pipeline & Model Organization

Audit Questions

Dimension 4: Data Testing & Quality Gates

Audit Questions

Dimension 5: Observability & Lineage

Audit Questions

Quick Interpretation

Your Score

Related Articles

Your coffee can wait.
Your data can’t.

Making Your Data Platform Agent-Ready: A 5 Dimension Framework and Scorecard

Dimension 1: Agent Configuration & Context

Dimension 2: Schema & Metadata Readiness

Dimension 3: Pipeline & Model Organization

Dimension 4: Data Testing & Quality Gates

Dimension 5: Observability & Lineage

Quick wins by persona

The Scorecard

How to use this

Dimension 1: Agent Configuration & Context

Audit Questions

Dimension 2: Schema & Metadata Readiness

Audit Questions

Dimension 3: Pipeline & Model Organization

Audit Questions

Dimension 4: Data Testing & Quality Gates

Audit Questions

Dimension 5: Observability & Lineage

Audit Questions

Quick Interpretation

Your Score

Related Articles

Your coffee can wait. Your data can’t.

Your coffee can wait.
Your data can’t.