Text-to-SQL on Schemas That Don't Look Like Textbooks
The Problem With Demo-Grade Text-to-SQL
Every Text-to-SQL benchmark uses clean, normalized schemas: users, orders, products. Tables have sensible names, foreign keys are explicit, columns follow conventions.
Real databases look like this:
SELECT * FROM acct_mstr_v2_PROD
JOIN txn_log_20190312_bkp ON acct_mstr_v2_PROD.acct_id = txn_log_20190312_bkp.a_id
WHERE txn_log_20190312_bkp.flg_del != 1acct_mstr_v2_PROD. txn_log_20190312_bkp. flg_del. a_id. These names are meaningless to an LLM with no context. Most Text-to-SQL systems fail silently here — they generate plausible-looking SQL that returns wrong results.
Mercer is built to handle this.
The Core Architecture
Mercer has three stages before it ever touches an LLM:
1. Schema enrichment — parse the schema, infer column semantics from sample data, and build a rich context document that includes column names, types, example values, and inferred purpose.
2. Query decomposition — break the user's question into sub-goals. "Which accounts had the most transactions last quarter?" becomes: find the time boundary, identify the transaction table, group by account, order by count.
3. Constrained generation — use SGLang's structured output to enforce that the generated SQL only references tables and columns that actually exist.
@sgl.function
def generate_sql(s, schema_context: str, question: str):
s += sgl.system("You are a SQL expert. Only use tables and columns from the schema.")
s += sgl.user(f"Schema:\n{schema_context}\n\nQuestion: {question}")
s += sgl.assistant(sgl.gen("sql", max_tokens=512, stop=["```"]))Schema Enrichment in Practice
The hardest part is building useful context from a bad schema. Here's how Mercer approaches flg_del:
def infer_column_purpose(col_name: str, samples: list) -> str:
# Heuristic: short names with 0/1 values are likely boolean flags
if len(col_name) <= 6 and set(samples).issubset({0, 1, None}):
return f"{col_name}: boolean flag (0=false, 1=true)"
# Names with 'del', 'rm', 'arch' suggest soft-delete patterns
if any(kw in col_name.lower() for kw in ["del", "rm", "arch", "dlt"]):
return f"{col_name}: likely soft-delete marker"
return f"{col_name}: unknown purpose, sample values: {samples[:5]}"The enriched schema document gives the LLM enough signal to reason about ambiguous columns.
Results
On internal benchmarks against a real legacy ERP schema (87 tables, average column name length: 8 chars), Mercer achieves:
- 73% exact match on single-table queries
- 61% exact match on multi-table joins
- 91% syntactic validity (the SQL runs without errors)
Compare to a naive GPT-4 baseline: 44% exact match, 78% syntactic validity. The gap is entirely in schema comprehension.
What's Still Hard
Multi-hop joins across 4+ tables remain a weak point. The model loses track of the join path. The fix is probably a planning step — generate the join graph first, then fill in the SQL. That's next on the roadmap.
Mercer is open-source. If you're dealing with a legacy schema that makes you want to cry, try it.