Schema Autopilot
Schema Autopilot uses AI to assist with the three hardest parts of schema management: creating view definitions, diagnosing DLQ failures, and detecting when your data shape drifts over time.
Schema Autopilot is available on Pro and Enterprise plans. A 14-day free trial is available for all new accounts.
Overview
Schema Autopilot is three capabilities, each targeting a different stage of the schema lifecycle:
CREATE BREAK EVOLVE
| | |
v v v
+----------+ +--------------+ +--------------+
| View | | DLQ | | Drift |
| Copilot | | Copilot | | Guard |
+----------+ +--------------+ +--------------+
Generate a Explain why Detect when
view from rows are event shapes
sample JSON failing change over
+ intent + propose fix time + recommend
safe rolloutAll three capabilities share the same design principles:
- Every suggestion creates a draft - Autopilot never modifies a published view
- Every fix is validated - Proposed changes are compiled and tested before you see them
- Everything is previewable - You review, tweak, and publish through the normal flow
View Copilot
Generate a complete view definition from sample JSON events and a short description of what you need.
How to use it
- Navigate to Views > Create New View in the Console
- Click Generate from Sample
- Paste one or more sample JSON payloads into the editor
- Describe your intent (e.g., "analytics table for purchase events with user, product, and revenue columns")
- Optionally specify constraints: required fields, casting strategy, partition preferences
- Click Generate
View Copilot returns a complete view definition with:
- Column definitions (name, type, JSONPath, required flag)
- Casting strategy recommendations
- Partition suggestions (low-cardinality + time-based)
- Warnings about likely problems (unstable fields, high-cardinality partition keys)
The generated YAML appears in the Console editor where you can modify any field before publishing.
Refining results
If the generated definition needs adjustments, use the Refine button to provide feedback:
- "Make the
pricecolumn required" - "Add a filter for event_type = 'purchase'"
- "Change the
user_idtype to VARCHAR"
View Copilot applies your feedback and returns an updated definition.
Example
Input: Sample JSON payload
{ "userId":: "usr_12345", "eventType":: "purchase", "amount":: 49.99, "currency":: "USD", "productId":: "prod_abc", "timestamp":: "2026-02-20T14:30:00Z" }
Intent: "Purchase analytics table with user, product, and revenue"
Output: Generated view definition
name: purchase_analytics version: 1 format: json definition: columns: - name: user_id type: VARCHAR path: $.userId required: true description: Customer identifier - name: event_type type: VARCHAR path: $.eventType description: Type of event - name: amount type: DOUBLE path: $.amount required: true description: Purchase amount - name: currency type: VARCHAR path: $.currency default: "USD" description: Currency code - name: product_id type: VARCHAR path: $.productId description: Product identifier - name: purchased_at type: TIMESTAMP path: $.timestamp required: true description: Event timestamp
Where to find it
- New view creation: Views > Create New View > Generate from Sample
- Existing view editing: Views > [view name] > Edit > Generate from Sample
DLQ Copilot
When your DLQ rate spikes, DLQ Copilot diagnoses the failures and proposes the minimal fix for each root cause.
How to use it
- Navigate to Views > [view name] > Quality in the Console
- Locate the DLQ Copilot section (appears when DLQ files exist)
- Click Explain & Fix
DLQ Copilot then:
- Fetches DLQ samples from your S3 bucket (uses the same STS credentials that wrote the files - no new permissions needed)
- Classifies errors by re-executing your view's SQL against the failed payloads
- Clusters failures into root causes (e.g., "cast failure on
pricecolumn" or "required fielduser_idis null") - Proposes a fix for each cluster - the minimal change to the view definition that resolves the issue
- Validates the fix by compiling the patched definition and running it against DLQ samples to measure the fix rate
Reading the results
Each failure cluster shows:
| Field | Description |
|---|---|
| Root cause | What went wrong (e.g., "field $.price arriving as string, fails DOUBLE cast") |
| Severity | High, Medium, or Low based on affected row count |
| Affected rows | Count and percentage of DLQ rows in this cluster |
| Proposed fix | The minimal view definition change (shown as a diff) |
| Fix rate | Percentage of DLQ rows this fix would resolve |
Simulating fixes
Before applying any change, click Simulate Fix to run the patched definition against:
- DLQ samples - Measures how many failed rows the fix resolves (fix rate)
- Good samples - Checks that the fix doesn't break existing successful rows (regression rate)
A simulation passes when: fix rate > 0% and regression rate = 0.
Applying fixes
If the simulation passes, click Apply Fix to create a new draft version of the view with the proposed change. The draft goes through the normal publish flow - you review it, preview it, and deploy it.
Example
A DLQ rate spikes from 2% to 15% overnight. You click Explain & Fix and see:
Cluster 1 (HIGH) — 42% of failures
Root cause: $.price arriving as string ("$49.99") instead of number
Column: amount (type: DOUBLE, path: $.price)
Fix: Change column type from DOUBLE to VARCHAR, add TRY_CAST
Fix rate: 98%
Cluster 2 (MEDIUM) — 31% of failures
Root cause: $.userId is null in events from mobile SDK v3.2
Column: user_id (required: true)
Fix: Remove required flag, add default: "anonymous"
Fix rate: 100%Where to find it
- View Quality tab: Views > [view name] > Quality > DLQ Copilot section
Drift Guard
Drift Guard monitors field distributions over time and detects when your event shapes change - before those changes cause DLQ spikes.
How it works
Drift Guard runs automatically every 6 hours for all published views with active data. It can also be triggered on-demand from the Console.
Each analysis cycle:
- Samples good events from recent Parquet files in your S3 bucket
- Computes per-field statistics - null rate, distinct count, sample values
- Discovers new fields - JSON keys in your payloads that aren't in the view definition
- Compares against the baseline (the previous snapshot)
- Flags drift signals with severity levels
Detection thresholds
| Signal | Trigger | Severity |
|---|---|---|
| Null spike | Was < 5% null, now > 40% null | High |
| Null increase | Was < 10% null, now > 30% null | Medium |
| New field | Not in baseline, appearing in > 50% of events | Info |
| Error rate spike | DLQ rate jumped from < 2% to > 10% | High |
Thresholds are deliberately conservative to minimize noise. You won't get alerts for normal fluctuations.
Drift recommendations
When drift is detected, Drift Guard recommends the minimal schema change along with canary rollout parameters:
Change types:
| Change | When recommended |
|---|---|
| Make field optional | A required field's null rate has increased significantly |
| Add new column | A new field is appearing consistently in your events |
| Change column type | Field values no longer match the declared type |
| Remove column | A field has stopped appearing entirely |
Canary parameters:
Each recommendation includes suggested rollout parameters:
- Duration - How long to run the canary (e.g., 30 minutes)
- Success criteria - What defines a passing canary (e.g., "DLQ rate < 0.5%")
- Rollback trigger - When to automatically roll back (e.g., "DLQ rate > 5%")
Using Drift Guard
- Navigate to Views > [view name] > Quality in the Console
- The Drift Monitor section shows the latest status:
- Last checked - Timestamp of the most recent analysis - Signals - Any detected drift patterns with severity badges - Recommendation - Proposed changes (if drift was detected)
- Click Analyze Now to trigger an on-demand analysis
- If a recommendation is available, click Create New Version with Changes to apply the suggested changes as a new draft version
Where to find it
- View Quality tab: Views > [view name] > Quality > Drift Monitor section
- Loads automatically for all published views - no configuration needed
How it works under the hood
Schema Autopilot is powered by EdgeMQ's validate-at-output architecture. Because EdgeMQ accepts all events and validates during Parquet materialisation, the system has access to three data sources that most ingest pipelines don't have together:
Raw JSON events (all accepted, nothing lost)
|
+---> Good Parquet ---> Field statistics
| (null rates, distinct counts,
| new field discovery)
|
+---> DLQ Parquet ---> Error classification
| (cast failures, missing fields,
| type mismatches)
|
+---> View Definition --> Schema context
(column specs, types, paths,
required flags)This combination enables accurate fixes rather than generic suggestions:
- View Copilot previews generated schemas against real data before you deploy
- DLQ Copilot has the actual failed payloads - not just error counts - so it can classify failures precisely and validate fixes against real samples
- Drift Guard compares good event distributions over time because the full data is always preserved
Systems that reject bad events at the input layer don't have the failed payloads available for analysis.
Limitations
- Does not modify production views - Every suggestion creates a draft that you review and publish
- Does not run custom transformations - Works within the view definition system (column types, JSONPaths, required flags, filters)
- Does not replace understanding your data - Autopilot accelerates the workflow, but you still need to know what your events contain and what your analytics needs are
- Not real-time alerting - Drift Guard runs every 6 hours and on-demand. For sub-minute alerting, combine EdgeMQ's quality metrics with your existing monitoring stack
FAQ
Do I need to configure anything to use Autopilot?
No. Autopilot is enabled automatically on Pro and Enterprise plans. View Copilot is available when creating or editing views. DLQ Copilot appears on the Quality tab when DLQ files exist. Drift Guard starts monitoring as soon as you have a published view with data flowing.
Does Autopilot need additional S3 permissions?
No. Autopilot uses the same STS credential flow that EdgeMQ already uses for writing Parquet and DLQ files. No additional IAM configuration is needed.
Can Autopilot break my production views?
No. Autopilot never modifies published views. Every suggestion - from View Copilot, DLQ Copilot, or Drift Guard - creates a new draft version. You review, preview, and publish through the normal flow.
How accurate are the DLQ Copilot fixes?
Each fix is validated before you see it. DLQ Copilot compiles the patched definition and runs it against actual DLQ samples to measure the fix rate. You can also simulate fixes against good samples to check for regressions before applying.
Can I change the Drift Guard schedule?
Drift Guard runs every 6 hours automatically. You can trigger an on-demand analysis at any time from the Quality tab by clicking "Analyze Now." Custom schedules are not currently supported.
What data does Autopilot send to the AI model?
Autopilot sends your view definition (YAML), sample payloads (from DLQ or good Parquet files), and field statistics. This data is used to generate recommendations and is not stored by the AI provider beyond the request lifecycle.
Next Steps
- Create your first view definition to get started with schema-aware Parquet
- Read the blog post for the full story behind Schema Autopilot
- Start a 14-day Pro trial if you're on the Starter plan