x

Schema Autopilot: AI-assisted schema management for event pipelines

·EdgeMQ Team

Schema Autopilot: AI-assisted schema management for event pipelines

Defining a schema is a twenty-minute task. Maintaining it over months of production traffic is where the real work lives.

We launched schema-aware Parquet last month - view definitions that turn raw JSON into typed, query-ready columns. The response was immediate: teams loved skipping the ETL pipeline. But we also heard the same follow-up questions, over and over:

"My DLQ rate spiked to 15% overnight. Which field changed?"

"I'm staring at a rejected payload and I can't figure out why it failed the cast."

"A new field started appearing in our events last week. Should I add it to the view?"

Every question boiled down to the same problem: the schema was right when they wrote it, but the data moved.

Sources change their payload structure. A field that was always present starts arriving as null. An integer field suddenly contains strings. A new nested object appears that nobody documented. These aren't bugs - they're the natural lifecycle of event data. But each one requires a human to notice the problem, diagnose it, write a fix, test the fix, and deploy it safely.

Schema Autopilot handles all of that.

What it does

Schema Autopilot is three capabilities, each targeting a different stage of the schema lifecycle:

  CREATE            BREAK              EVOLVE
    |                 |                  |
    v                 v                  v
 ┌──────────┐   ┌──────────────┐   ┌──────────────┐
 │   View   │   │     DLQ      │   │    Drift     │
 │  Copilot │   │   Copilot    │   │    Guard     │
 └──────────┘   └──────────────┘   └──────────────┘
 Generate a      Explain why        Detect when
 view from       rows are           event shapes
 sample JSON     failing            change over
 + intent        + propose fix      time + recommend
                                    safe rollout

View Copilot generates a complete view definition from sample events and a short description of what you need. Paste your JSON, say "I need a purchases table with user, product, and revenue columns," and it produces the YAML with types, JSONPaths, required flags, and casting strategy. You preview it, tweak it, and publish - minutes instead of manual trial-and-error.

DLQ Copilot reads your Dead Letter Queue files, clusters the failures by root cause, and proposes the minimal fix for each. Instead of downloading DLQ Parquet files and manually inspecting rejected payloads, you click "Explain & Fix" and get a breakdown: "42% of failures are because $.price is arriving as a string instead of a number. Fix: change the column type to VARCHAR and add a try_cast." Each fix is validated - the system compiles the patched definition and runs it against DLQ samples to verify it actually resolves the failures before you see it.

Drift Guard monitors field distributions over time. It samples your good events, computes per-field statistics (null rates, distinct counts, new field appearances), compares against the last baseline, and flags drift patterns. When it detects a spike - a field that was 2% null is now 45% null - it recommends the minimal schema change and a canary rollout with specific success criteria ("DLQ rate < 0.5% for 30 minutes") and rollback triggers.

Why we built it

The short answer: because EdgeMQ's architecture makes it uniquely possible.

Most data pipelines validate schemas at the input layer - bad events are rejected at collection time with a 4xx error, and the producer has to retry or lose the data. EdgeMQ validates at the output layer. Every event is accepted and durably stored in the WAL. During Parquet materialisation, events that fail schema validation are captured in the DLQ with full error context. Nothing is lost.

This architecture has a consequence that turns out to be powerful: we have both the good data and the bad data, side by side, with the schema definition that produced them. That's exactly what an AI needs to diagnose failures and propose fixes. Systems that reject bad events at the door don't have this - the bad data is gone, returned to the producer as a cryptic error code.

Here's what we can do with it:

  Raw JSON events (all accepted, nothing lost)
       │
       ├──► Good Parquet  ──► Field statistics
       │                       (null rates, distinct counts,
       │                        new field discovery)
       │
       ├──► DLQ Parquet   ──► Error classification
       │                       (cast failures, missing fields,
       │                        type mismatches)
       │
       └──► View Definition ──► Schema context
                                (column specs, types, paths,
                                 required flags)

Schema Autopilot combines all three: the schema, the data that passed, and the data that failed. It uses this context to generate accurate fixes - not generic suggestions, but validated patches that compile, run against your actual DLQ samples, and show you the fix rate before you apply anything.

The three stages, in detail

View Copilot: from JSON to view definition

The fastest way to understand View Copilot is to use it. You provide two things:

  1. Sample events - a few representative JSON payloads from your source
  2. Intent - a short description of what you need ("analytics table for page views" or "extract trade execution data with timestamps")

Optionally, you can add constraints: required fields, casting strategy (strict vs. tolerant), partition preference.

View Copilot returns a complete view definition with:

  • Column definitions (name, type, JSONPath, required flag)
  • Casting strategy recommendations
  • Partition suggestions (low-cardinality + time-based)
  • Warnings about likely problems (unstable fields, high-cardinality partition keys)

The output isn't a black box. It's the same YAML you'd write by hand, shown in the Console editor where you can modify any field before publishing. You can preview it against your sample data to verify the output.

This matters most for onboarding. The gap between "I have events in my system" and "I have a queryable Parquet table in S3" used to involve reading documentation, understanding DuckDB types, writing JSONPath expressions, testing against sample data, iterating on failures, and deploying. View Copilot compresses most of that into a single interaction.

DLQ Copilot: explain failures, propose fixes

When your DLQ rate spikes, the first question is always "why?" The second question is "what's the smallest change that fixes it?"

DLQ Copilot answers both. From the Quality tab of any view, click "Explain & Fix" and the system:

  1. Fetches DLQ samples from your S3 bucket (using the same STS credential flow that wrote them - no new permissions needed)
  2. Classifies errors by re-executing your view's SQL against the failed payloads and comparing the output
  3. Clusters failures into root causes (e.g., "cast failure on price column" or "required field user_id is null")
  4. Proposes a fix for each cluster - the minimal change to the view definition that resolves the issue
  5. Validates the fix by compiling the patched definition and running it against DLQ samples to measure the fix rate

Each cluster shows: severity, affected row count, a diff of what changes, and a "Simulate Fix" button that runs the patched definition against both DLQ samples (to measure fix rate) and good samples (to check for regressions). If the simulation passes - fixes > 0%, regressions = 0 - you can apply it as a new draft version with one click.

The key design principle: every suggestion is previewable, explainable, and reversible. Autopilot never modifies a published view. It creates a new draft version that you review and deploy through the normal publish flow.

Drift Guard: detect changes before they break things

DLQ Copilot reacts to failures that already happened. Drift Guard watches for the conditions that cause them.

Every six hours (and on-demand via the Quality dashboard), Drift Guard:

  1. Samples good events from recent Parquet files in S3
  2. Computes per-field statistics - null rate, distinct count, sample values
  3. Discovers new fields - JSON keys in your payloads that aren't in the view definition
  4. Compares against the baseline (the last snapshot)
  5. Flags drift signals with severity levels

The detection thresholds are deliberately conservative:

SignalTriggerSeverity
Null spikeWas < 5% null, now > 40%High
Null increaseWas < 10% null, now > 30%Medium
New fieldNot in baseline, appearing in > 50% of eventsInfo
Error rate spikeDLQ rate jumped from < 2% to > 10%High

When drift is detected, Drift Guard recommends the minimal schema change - make a field optional, add a new column, change a type - along with canary rollout parameters: how long to run the canary, what success looks like, and when to roll back.

The drift monitor appears on the Quality dashboard for every published view. No configuration needed - it starts watching as soon as you have data flowing.

What Autopilot is not

Continuing our tradition of being clear about boundaries:

Autopilot does not modify production views. Every suggestion - from View Copilot, DLQ Copilot, or Drift Guard - creates a draft. You review, preview, and publish.

Autopilot does not run custom transformations. It works within the view definition system: column types, JSONPaths, required flags, filters. It doesn't write arbitrary SQL or call external services.

Autopilot does not replace understanding your data. It accelerates the workflow - generating, diagnosing, evolving - but you still need to know what your events contain and what your analytics needs are. The AI suggests; you approve.

Autopilot is not real-time alerting. Drift Guard runs on a schedule (every 6 hours) and on-demand. If you need sub-minute alerting on data quality, you'll want to combine EdgeMQ's quality metrics API with your existing monitoring stack.

The advantage of validate-at-output

Most managed ingestion services validate schemas at the input - reject bad events at the door. This gives fast feedback (synchronous 4xx errors) but creates a problem: the bad data is gone. When something changes in your source system and events start failing, you have an error count in a dashboard but no copy of the failed payloads to diagnose against.

EdgeMQ's output-layer approach stores everything. Good events become typed Parquet. Bad events become DLQ Parquet with error context. Both are in your S3 bucket, both are queryable, and both are available for AI-assisted analysis.

This turns out to be a significant advantage for automated schema management:

  • View Copilot can preview generated schemas against real data before you deploy
  • DLQ Copilot has the actual failed payloads, not just error counts - it can classify failures precisely and validate fixes against real samples
  • Drift Guard can compare good event distributions over time because the full data is always preserved

The "never lose data" architecture that we built for durability ends up being the foundation for intelligent schema management. We didn't plan it that way, but it works.

Availability

Schema Autopilot is available now on the Pro plan ($49/month). All three capabilities - View Copilot, DLQ Copilot, and Drift Guard - are included.

  • View Copilot: Available in the Console when creating or editing a view definition
  • DLQ Copilot: Available on the Quality tab of any view with DLQ files (click "Explain & Fix")
  • Drift Guard: Available on the Quality tab of any published view (the drift monitor section loads automatically)

If you're already on Pro with schema-aware views, Autopilot is ready to use - no configuration needed. If you're on the Starter plan, you can start a 14-day Pro trial to try Autopilot and schema-aware views with no commitment.

Try it now or read the view definitions docs to set up your first schema-aware view.