Output formats

EdgeMQ writes incoming events durably to your S3 bucket. For each sealed WAL segment, EdgeMQ can produce one or more artifacts under your configured prefix.

Today there are three output formats to keep in context:

Segments: compressed WAL segment files (.wal.zst) when segment output is enabled
Parquet (raw / opaque payload): Parquet files that preserve the original payload without schema extraction
Parquet (schema-aware materialized views): typed Parquet generated from a View Definition (no public SQL)

At a glance

Output	What it is	Best for
Segments	Zstd-compressed WAL segments with framed payloads	Raw replay, exact payload retention, flexible downstream parsing
Parquet (raw / opaque payload)	“Opaque” Parquet that stores payload as-is plus ingest metadata	Direct querying/loads while preserving payload bytes
Parquet (schema-aware materialized views)	Typed Parquet produced from validated view definitions	Table-like datasets ready for warehouses/lakehouses without a separate parse job

1) Segments (compressed WAL)

What lands in S3

When segment output is enabled: .../segments/seg-000000NN.wal.zst (zstd-compressed)
A commit marker/manifest written after all required artifacts for the segment are uploaded

Properties

Payload preserved: the payload is stored as opaque bytes (often NDJSON or a record stream)
Streaming-friendly: you can decompress and parse frames incrementally
Best durability boundary: consumers should use commit markers/manifests as the “safe to read” signal

See:

2) Parquet (raw / opaque payload)

Raw Parquet output is designed to make the data easier to query/load directly while still keeping the original payload intact.

What it is

A Parquet file where each row corresponds to an ingested record.
Includes ingest metadata columns (timestamps, format/version) and an **opaque payload column** containing the original bytes/string.

What it’s for

Query engines and loaders that prefer Parquet files.
Lightweight analytics without writing a separate “expand + parse” job.

Note

Raw/opaque Parquet does not attempt global schema inference. It preserves the payload and keeps the ingest contract stable.

3) Parquet (schema-aware materialized views)

Schema-aware Parquet is a higher-level output that materializes typed columns directly at ingest time.

How it works

You define a View Definition: a declarative mapping (columns, types, JSONPath-like extraction, filters, partitioning, and resource limits).
EdgeMQ validates the definition and compiles it to safe, internal DuckDB execution (no user-submitted SQL).
For each sealed segment, EdgeMQ materializes Parquet (and optionally CSV in the future) under deterministic S3 keys.

What it’s for

Producing “table-like” datasets continuously in S3 for Snowflake/Databricks/DuckDB/Trino/etc.
Avoiding an extra transformation job for basic extraction, casting, and partitioning.

Operational model

View outputs can be treated as optional (don't block commit if a view fails) or required (gate the commit boundary), depending on endpoint configuration.
Commit markers/manifests list the artifacts that are complete and safe to consume for each segment.

Learn more

See the complete Materialized Views documentation for in-depth guides, function reference, examples, and best practices.

Choosing an output format

Most teams start with segments (raw, flexible), then add:

Raw Parquet when they want direct Parquet reads while still preserving the payload as-is.
Schema-aware materialized views when they want typed tables (Parquet) generated directly at ingest time.