Output formats

EdgeMQ writes incoming events durably to your S3 bucket. For each sealed WAL segment, EdgeMQ can produce one or more artifacts under your configured prefix.

Today there are three output formats to keep in context:

  • Segments: compressed WAL segment files (.wal.zst) when segment output is enabled
  • Parquet (raw / opaque payload): Parquet files that preserve the original payload without schema extraction
  • Parquet (schema-aware materialized views): typed Parquet generated from a View Definition (no public SQL)

At a glance

OutputWhat it isBest for
SegmentsZstd-compressed WAL segments with framed payloadsRaw replay, exact payload retention, flexible downstream parsing
Parquet (raw / opaque payload)“Opaque” Parquet that stores payload as-is plus ingest metadataDirect querying/loads while preserving payload bytes
Parquet (schema-aware materialized views)Typed Parquet produced from validated view definitionsTable-like datasets ready for warehouses/lakehouses without a separate parse job

1) Segments (compressed WAL)

What lands in S3

  • When segment output is enabled: .../segments/seg-000000NN.wal.zst (zstd-compressed)
  • A commit marker/manifest written after all required artifacts for the segment are uploaded

Properties

  • Payload preserved: the payload is stored as opaque bytes (often NDJSON or a record stream)
  • Streaming-friendly: you can decompress and parse frames incrementally
  • Best durability boundary: consumers should use commit markers/manifests as the “safe to read” signal

See:

2) Parquet (raw / opaque payload)

Raw Parquet output is designed to make the data easier to query/load directly while still keeping the original payload intact.

What it is

  • A Parquet file where each row corresponds to an ingested record.
  • Includes ingest metadata columns (timestamps, format/version) and an **opaque payload column** containing the original bytes/string.

What it’s for

  • Query engines and loaders that prefer Parquet files.
  • Lightweight analytics without writing a separate “expand + parse” job.
Note

Raw/opaque Parquet does not attempt global schema inference. It preserves the payload and keeps the ingest contract stable.

3) Parquet (schema-aware materialized views)

Schema-aware Parquet is a higher-level output that materializes typed columns directly at ingest time.

How it works

  • You define a View Definition: a declarative mapping (columns, types, JSONPath-like extraction, filters, partitioning, and resource limits).
  • EdgeMQ validates the definition and compiles it to safe, internal DuckDB execution (no user-submitted SQL).
  • For each sealed segment, EdgeMQ materializes Parquet (and optionally CSV in the future) under deterministic S3 keys.

What it’s for

  • Producing “table-like” datasets continuously in S3 for Snowflake/Databricks/DuckDB/Trino/etc.
  • Avoiding an extra transformation job for basic extraction, casting, and partitioning.

Operational model

  • View outputs can be treated as optional (don't block commit if a view fails) or required (gate the commit boundary), depending on endpoint configuration.
  • Commit markers/manifests list the artifacts that are complete and safe to consume for each segment.
Learn more

See the complete Materialized Views documentation for in-depth guides, function reference, examples, and best practices.

Choosing an output format

Most teams start with segments (raw, flexible), then add:

  • Raw Parquet when they want direct Parquet reads while still preserving the payload as-is.
  • Schema-aware materialized views when they want typed tables (Parquet) generated directly at ingest time.