Your AI Agent’s SQL Is Correct. The Answer Is Still Wrong.

How NULL distributions silently corrupt agentic database workflows — and why the fix isn’t a better model.

Will Wright · Convergent Methods · March 2026

You ask your AI agent a simple question: “How many cancelled subscriptions do we have?”

The agent connects to your database. It reads the schema. It writes SQL:

SELECT COUNT(*) FROM subscriptions WHERE status = 'cancelled';

The SQL is syntactically perfect. The agent is confident. The query returns 200.

The real answer might be as high as 500. You’ll never know, because 30% of your table has status = NULL — and every one of those rows was silently excluded. The agent didn’t lie to you. It didn’t hallucinate. It generated correct SQL against an incomplete picture of reality, and the database did exactly what it was asked. The rows with no status weren’t counted as cancelled. They also weren’t counted as not cancelled. They simply didn’t exist as far as the query was concerned.

This is the Null Trap. It’s one of the many bits of embedded knowledge in the daily workflow of a data engineer, data scientist, or data analyst. What’s changed is that AI agents now run hundreds of queries a day against your data, autonomously, with no human reviewing the results. A senior engineer gets a number back and thinks “let me sanity check that.” An agent gets a number back and puts it in your dashboard. The Null Trap went from an occasional human mistake to a systematic bias across every filtered result an agent produces.

NULL is the cleanest demonstration of a more general pattern — one we’ll come back to at the end. For now, sit with NULL specifically, because it’s the crispest case of what goes wrong when an agent queries data without looking at it first.

Why NULLs Are Different From Wrong Values

Most data quality conversations focus on wrong data — a misspelled city name, a negative price, a date in the future. Wrong data is visible. It shows up in results. Someone eventually notices.

NULLs are invisible. SQL’s three-valued logic guarantees it.

WHERE status = 'cancelled' — in most programming languages, if status is null, this evaluates to false. In SQL, it evaluates to UNKNOWN. A third truth value that is neither true nor false. Rows where the predicate evaluates to UNKNOWN are excluded from the result set. Not flagged. Not warned about. Excluded.

Query	What you think it returns	What it actually returns
`WHERE status = 'cancelled'`	All cancelled rows	Only rows where status is literally the string ‘cancelled’
`WHERE status != 'cancelled'`	All non-cancelled rows	Only rows where status is a non-null string that isn’t ‘cancelled’
`WHERE status = 'cancelled' OR status != 'cancelled'`	All rows	Only rows where status is not NULL

That last one. x = 'a' OR x != 'a' should be a tautology. In SQL, it isn’t. The 300 rows where status is NULL satisfy neither condition. They fall through every filter you write.

A senior data engineer knows this. They’ve been burned by it. They habitually write AND status IS NOT NULL or COALESCE(status, 'unknown'). But the reason they know is because they shipped a wrong number, noticed it a week later, and traced it back to a nullable column.

An AI agent has no such scars.

“This Is Data Quality 101”

A fair objection. For humans, yes — every data engineer who has spent more than a year in production has internalized some version of “check your NULLs.” The correct response to WHERE status = 'cancelled' is, for a human with scars, “before I run that, how many NULLs are in status?”

What’s new is the volume. The hygiene that a senior engineer does by habit — pausing to sanity check before shipping a number — is a habit formed from having shipped wrong numbers and gotten paged at 2am. An agent forms no such habits. It generates clean SQL, gets a clean result, and moves on. When a single agent session runs hundreds of filtered queries with no human review, a habit that used to catch one wrong number a month now has to catch them all, at machine speed, or it doesn’t catch any of them. The hygiene has to move from a human habit to a property of the infrastructure.

“Data quality 101” was the right framing when humans were the bottleneck. When agents are the bottleneck, hygiene becomes infrastructure, or it doesn’t exist.

“Why Not Just Tell the Agent to Use COALESCE?”

Another fair objection. COALESCE(status, 'unknown') replaces NULL with a default value, which makes the row visible to the filter. Couldn’t you just prompt the agent to always use COALESCE?

Two problems.

First, COALESCE requires the agent to decide that NULLs matter for this specific column. That decision requires knowing how many NULLs there are. If the agent assumes status is fully populated — because nothing in the schema says otherwise — it won’t reach for COALESCE. The agent doesn’t skip COALESCE out of forgetfulness; it skips because it has no reason to think the column needs defending.

Second, COALESCEing blindly is itself a bug. COALESCE(status, 'cancelled') silently reclassifies every incomplete-onboarding user as cancelled — a different wrong answer, wearing the costume of a fix. The right default for a NULL depends on what the NULL means, and the NULL’s meaning lives in the data, not the schema.

Either way, the agent needs to see the data before deciding what to do about it. COALESCE is the cure; profiling is the diagnosis. You can’t skip the diagnosis.

Why LLMs Make This Worse

An LLM generating SQL works from the schema. The schema says status VARCHAR(20). It doesn’t say “30% of this column is NULL and those rows represent users stuck in an incomplete onboarding flow.” The schema describes structure. It says nothing about the distribution of actual data.

An experienced human looks at a VARCHAR column and thinks: “Is this nullable? How many NULLs does it actually have? What do those NULLs mean?” An LLM looks at the same column and thinks: “This is a string column. I can filter on it.”

The problem compounds when the LLM is good at SQL. A model that generates syntactically perfect, well-formatted, idiomatically correct SQL inspires confidence. The user sees a clean query, gets a clean result, and moves on. The number looks reasonable. Nobody checks.

A Concrete Example

Here’s a table with 1,000 rows:

status	count	notes
`'active'`	500	Paying customers
`'cancelled'`	200	Explicitly cancelled
`NULL`	300	Onboarding incomplete, data migration gap, or API error

An agent receives the question: “How many cancelled subscriptions do we have?”

Without data profiling, the agent generates:

SELECT COUNT(*) FROM subscriptions WHERE status = 'cancelled';

Returns 200. Clean number. Looks right. Except 300 rows — 30% of the table — were invisible to the query. Some of those NULLs might be cancelled users whose status never got written. Some might be active users stuck in a broken onboarding flow. The agent can’t distinguish and didn’t try. It answered a question about your data while ignoring almost a third of it.

With data profiling — meaning the agent looks at the column before writing the filter — the picture changes:

column: status
null_count: 300
null_pct:   30.0
distinct_count: 2   ← not 3. NULL isn't a distinct value.
min: 'active'
max: 'cancelled'

Now the agent sees the trap. distinct_count: 2 when you expected 3 means something is missing. null_pct: 30.0 tells you what. The agent asks the right follow-up question before writing a single line of SQL: “30% of this table has no status. Should those rows be included in the count?”

Asking this question before the query runs is the difference between a useful tool and a wrong answer that nobody catches.

“Doesn’t dbt’s Semantic Layer Already Solve This?”

dbt and other semantic layers define the relationships between entities and the formulas for metrics. “Monthly recurring revenue = sum of paid line item amounts, where subscription status is active.” That’s load-bearing infrastructure and it solves an important problem: it gives agents vocabulary at the concept level rather than the column level.

But the semantic layer sits on top of the data. It doesn’t inspect the data. A dbt model that computes “active subscriptions” using WHERE status = 'active' falls into the same trap — it’s just that the trap is now encoded in a definition file instead of an ad-hoc query. The concept active_subscriptions is defined in terms of a column whose NULL distribution nobody looked at. The dbt user gets a number; the number is wrong; the definition looks canonical.

Semantic layers say what to compute. Profiling says whether the data supports that computation. They compose. They don’t substitute.

The Fix Isn’t a Better Model

The next GPT Codex won’t fix this. The next Claude Opus won’t fix this. Smarter models can reason better about what data they see, but a model can’t see what the infrastructure doesn’t show it. The schema is metadata. The data is reality. No amount of reasoning about metadata reveals the distribution of actual values.

The fix is profiling. Before the agent writes a WHERE clause against a column, something needs to check:

How many NULLs are in this column?
What’s the actual distribution?
Are the values actually what the schema implies they are?

Data engineers have been doing this manually for decades. SELECT COUNT(*), COUNT(column), COUNT(DISTINCT column) is as old as SQL itself. What’s new is that the agent doing the querying doesn’t have the instinct to check — because it learned SQL from syntax, not from production postmortems.

The profiling has to happen in the tooling layer. Not as an optional step. Not as a follow-up query the model might or might not think to run. As infrastructure: before you filter on a column, look at it.

What the Solution Has to Do

Any layer sitting between agents and databases has to close the gap between what the schema says and what the data actually looks like. That closure requires three properties, and they’re independent of who implements them:

Profile before querying. When an agent constructs a filter on status = 'cancelled', something needs to check the NULL rate of that column and surface it. If 30% of the column is NULL, the agent knows before the query runs — not after the dashboard ships.

Validate before executing. Run EXPLAIN on the generated SQL before it touches real data. Catch type mismatches, missing tables, and impossible joins at planning time, not in the error log.

Make it deterministic. If the same question produces different SQL on different runs, you can’t audit it, reproduce it, or trust it. The SQL compilation step — from structured intent to query string — should be deterministic. Same inputs, same SQL, every time.

These three properties describe the shape of any correct solution. Specific implementations will differ. The properties don’t.

NULLs Are the Most Common Trap. They’re Not the Only One.

The deeper issue is the gap between what the schema says and what the data actually looks like. NULLs are the most common case, but the same structural problem shows up as cardinality surprises (a country column with 3 distinct values when you expected 195), stale data (a last_updated max of 18 months ago because the pipeline broke), and encoding drift ('active' vs 'Active' vs 'ACTIVE' — three values a case-sensitive filter treats as different).

Same pattern every time: correct SQL, incorrect assumptions, wrong answer. Nobody knew.

The Null Trap is the first instance you hit. Once you start looking for it, you find its siblings everywhere.

Try It

Boyce is an open-source MCP server that sits between your AI agent and your database. When an agent asks to query a table, Boyce profiles the relevant columns first — NULL rates, distinct values, distributions — and hands that context to the agent alongside the schema. The agent sees the trap before it writes the query, not after.

Agent asks “how many cancelled subscriptions?” — Boyce profiles the status column, surfaces the 30% NULL rate, and the agent asks you what to do about it before generating SQL. The trap surfaces before the query runs. No silent exclusion.

pip install boyce

MIT licensed. Works with Claude Desktop, Cursor, Claude Code, and any MCP-compatible host. No API key required.

The Null Trap scenario from this essay is included in the repo as a self-contained Docker setup. Reproduce it in under five minutes.

GitHub PyPI Product page