Schema Autopilot

Schema Autopilot uses AI to assist with the three hardest parts of schema management: creating view definitions, diagnosing DLQ failures, and detecting when your data shape drifts over time.

Availability

Schema Autopilot is available on Pro and Enterprise plans. A 14-day free trial is available for all new accounts.

Overview

Schema Autopilot is three capabilities, each targeting a different stage of the schema lifecycle:

  CREATE            BREAK              EVOLVE
    |                 |                  |
    v                 v                  v
 +----------+   +--------------+   +--------------+
 |   View   |   |     DLQ      |   |    Drift     |
 |  Copilot |   |   Copilot    |   |    Guard     |
 +----------+   +--------------+   +--------------+
 Generate a      Explain why        Detect when
 view from       rows are           event shapes
 sample JSON     failing            change over
 + intent        + propose fix      time + recommend
                                    safe rollout

All three capabilities share the same design principles:

Every suggestion creates a draft - Autopilot never modifies a published view
Every fix is validated - Proposed changes are compiled and tested before you see them
Everything is previewable - You review, tweak, and publish through the normal flow

View Copilot

Generate a complete view definition from sample JSON events and a short description of what you need.

How to use it

Navigate to Views > Create New View in the Console
Click Generate from Sample
Paste one or more sample JSON payloads into the editor
Describe your intent (e.g., "analytics table for purchase events with user, product, and revenue columns")
Optionally specify constraints: required fields, casting strategy, partition preferences
Click Generate

View Copilot returns a complete view definition with:

Column definitions (name, type, JSONPath, required flag)
Casting strategy recommendations
Partition suggestions (low-cardinality + time-based)
Warnings about likely problems (unstable fields, high-cardinality partition keys)

The generated YAML appears in the Console editor where you can modify any field before publishing.

Refining results

If the generated definition needs adjustments, use the Refine button to provide feedback:

"Make the price column required"
"Add a filter for event_type = 'purchase'"
"Change the user_id type to VARCHAR"

View Copilot applies your feedback and returns an updated definition.

Example

Input: Sample JSON payload

{
  "userId":: "usr_12345",
  "eventType":: "purchase",
  "amount":: 49.99,
  "currency":: "USD",
  "productId":: "prod_abc",
  "timestamp":: "2026-02-20T14:30:00Z"
}

Intent: "Purchase analytics table with user, product, and revenue"

Output: Generated view definition

name: purchase_analytics
version: 1
format: json

definition:
  columns:
    - name: user_id
      type: VARCHAR
      path: $.userId
      required: true
      description: Customer identifier

    - name: event_type
      type: VARCHAR
      path: $.eventType
      description: Type of event

    - name: amount
      type: DOUBLE
      path: $.amount
      required: true
      description: Purchase amount

    - name: currency
      type: VARCHAR
      path: $.currency
      default: "USD"
      description: Currency code

    - name: product_id
      type: VARCHAR
      path: $.productId
      description: Product identifier

    - name: purchased_at
      type: TIMESTAMP
      path: $.timestamp
      required: true
      description: Event timestamp

Where to find it

New view creation: Views > Create New View > Generate from Sample
Existing view editing: Views > [view name] > Edit > Generate from Sample

DLQ Copilot

When your DLQ rate spikes, DLQ Copilot diagnoses the failures and proposes the minimal fix for each root cause.

How to use it

Navigate to Views > [view name] > Quality in the Console
Locate the DLQ Copilot section (appears when DLQ files exist)
Click Explain & Fix

DLQ Copilot then:

Fetches DLQ samples from your S3 bucket (uses the same STS credentials that wrote the files - no new permissions needed)
Classifies errors by re-executing your view's SQL against the failed payloads
Clusters failures into root causes (e.g., "cast failure on price column" or "required field user_id is null")
Proposes a fix for each cluster - the minimal change to the view definition that resolves the issue
Validates the fix by compiling the patched definition and running it against DLQ samples to measure the fix rate

Reading the results

Each failure cluster shows:

Field	Description
Root cause	What went wrong (e.g., "field `$.price` arriving as string, fails DOUBLE cast")
Severity	High, Medium, or Low based on affected row count
Affected rows	Count and percentage of DLQ rows in this cluster
Proposed fix	The minimal view definition change (shown as a diff)
Fix rate	Percentage of DLQ rows this fix would resolve

Simulating fixes

Before applying any change, click Simulate Fix to run the patched definition against:

DLQ samples - Measures how many failed rows the fix resolves (fix rate)
Good samples - Checks that the fix doesn't break existing successful rows (regression rate)

A simulation passes when: fix rate > 0% and regression rate = 0.

Applying fixes

If the simulation passes, click Apply Fix to create a new draft version of the view with the proposed change. The draft goes through the normal publish flow - you review it, preview it, and deploy it.

Example

A DLQ rate spikes from 2% to 15% overnight. You click Explain & Fix and see:

Cluster 1 (HIGH) - 42% of failures
  Root cause: $.price arriving as string ("$49.99") instead of number
  Column: amount (type: DOUBLE, path: $.price)
  Fix: Change column type from DOUBLE to VARCHAR, add TRY_CAST
  Fix rate: 98%

Cluster 2 (MEDIUM) - 31% of failures
  Root cause: $.userId is null in events from mobile SDK v3.2
  Column: user_id (required: true)
  Fix: Remove required flag, add default: "anonymous"
  Fix rate: 100%

Where to find it

View Quality tab: Views > [view name] > Quality > DLQ Copilot section

Drift Guard

Drift Guard monitors field distributions over time and detects when your event shapes change - before those changes cause DLQ spikes.

How it works

Drift Guard runs automatically every 6 hours for all published views with active data. It can also be triggered on-demand from the Console.

Each analysis cycle:

Samples good events from recent Parquet files in your S3 bucket
Computes per-field statistics - null rate, distinct count, sample values
Discovers new fields - JSON keys in your payloads that aren't in the view definition
Compares against the baseline (the previous snapshot)
Flags drift signals with severity levels

Detection thresholds

Signal	Trigger	Severity
Null spike	Was < 5% null, now > 40% null	High
Null increase	Was < 10% null, now > 30% null	Medium
New field	Not in baseline, appearing in > 50% of events	Info
Error rate spike	DLQ rate jumped from < 2% to > 10%	High

Thresholds are deliberately conservative to minimize noise. You won't get alerts for normal fluctuations.

Drift recommendations

When drift is detected, Drift Guard recommends the minimal schema change along with canary rollout parameters:

Change types:

Change	When recommended
Make field optional	A required field's null rate has increased significantly
Add new column	A new field is appearing consistently in your events
Change column type	Field values no longer match the declared type
Remove column	A field has stopped appearing entirely

Canary parameters:

Each recommendation includes suggested rollout parameters:

Duration - How long to run the canary (e.g., 30 minutes)
Success criteria - What defines a passing canary (e.g., "DLQ rate < 0.5%")
Rollback trigger - When to automatically roll back (e.g., "DLQ rate > 5%")

Using Drift Guard

Navigate to Views > [view name] > Quality in the Console
The Drift Monitor section shows the latest status:

- Last checked - Timestamp of the most recent analysis - Signals - Any detected drift patterns with severity badges - Recommendation - Proposed changes (if drift was detected)

Click Analyze Now to trigger an on-demand analysis
If a recommendation is available, click Create New Version with Changes to apply the suggested changes as a new draft version

Where to find it

View Quality tab: Views > [view name] > Quality > Drift Monitor section
Loads automatically for all published views - no configuration needed

How it works under the hood

Schema Autopilot is powered by EdgeMQ's validate-at-output architecture. Because EdgeMQ accepts all events and validates during Parquet materialisation, the system has access to three data sources that most ingest pipelines don't have together:

  Raw JSON events (all accepted, nothing lost)
       |
       +---> Good Parquet  ---> Field statistics
       |                        (null rates, distinct counts,
       |                         new field discovery)
       |
       +---> DLQ Parquet   ---> Error classification
       |                        (cast failures, missing fields,
       |                         type mismatches)
       |
       +---> View Definition --> Schema context
                                 (column specs, types, paths,
                                  required flags)

This combination enables accurate fixes rather than generic suggestions:

View Copilot previews generated schemas against real data before you deploy
DLQ Copilot has the actual failed payloads - not just error counts - so it can classify failures precisely and validate fixes against real samples
Drift Guard compares good event distributions over time because the full data is always preserved

Systems that reject bad events at the input layer don't have the failed payloads available for analysis.

Limitations

Does not modify production views - Every suggestion creates a draft that you review and publish
Does not run custom transformations - Works within the view definition system (column types, JSONPaths, required flags, filters)
Does not replace understanding your data - Autopilot accelerates the workflow, but you still need to know what your events contain and what your analytics needs are
Not real-time alerting - Drift Guard runs every 6 hours and on-demand. For sub-minute alerting, combine EdgeMQ's quality metrics with your existing monitoring stack

FAQ

Do I need to configure anything to use Autopilot?

No. Autopilot is enabled automatically on Pro and Enterprise plans. View Copilot is available when creating or editing views. DLQ Copilot appears on the Quality tab when DLQ files exist. Drift Guard starts monitoring as soon as you have a published view with data flowing.

Does Autopilot need additional S3 permissions?

No. Autopilot uses the same STS credential flow that EdgeMQ already uses for writing Parquet and DLQ files. No additional IAM configuration is needed.

Can Autopilot break my production views?

No. Autopilot never modifies published views. Every suggestion - from View Copilot, DLQ Copilot, or Drift Guard - creates a new draft version. You review, preview, and publish through the normal flow.

How accurate are the DLQ Copilot fixes?

Each fix is validated before you see it. DLQ Copilot compiles the patched definition and runs it against actual DLQ samples to measure the fix rate. You can also simulate fixes against good samples to check for regressions before applying.

Can I change the Drift Guard schedule?

Drift Guard runs every 6 hours automatically. You can trigger an on-demand analysis at any time from the Quality tab by clicking "Analyze Now." Custom schedules are not currently supported.

What data does Autopilot send to the AI model?

Autopilot sends your view definition (YAML), sample payloads (from DLQ or good Parquet files), and field statistics. This data is used to generate recommendations and is not stored by the AI provider beyond the request lifecycle.

Next Steps

Create your first view definition to get started with schema-aware Parquet
Read the blog post for the full story behind Schema Autopilot
Start a 14-day Pro trial if you're on the Starter plan