Introduction

EdgeMQ is a managed HTTP → S3 ingest layer. It exposes regional HTTPS endpoints that accept NDJSON (and RecordIO-style streams) and delivers that data durably into your S3 bucket. Internally, EdgeMQ uses a write-ahead log (WAL), segmented uploads, and explicit commit markers so downstream systems can treat S3 as a reliable, continuously updated source of truth.

EdgeMQ is typically used as the ingest layer in data warehouse, lakehouse, and ML/AI pipelines where S3 is the Bronze/raw layer and tools like Snowflake, Databricks, ClickHouse, DuckDB, and Postgres consume from there.

What EdgeMQ provides

  • HTTP ingest at the edge

    Regional / ingest endpoints accept NDJSON/RecordIO over HTTPS from services, devices, and batch jobs.

  • Delivery into your S3 bucket

    Data is written into an S3 bucket that you own, via an IAM Role you configure. EdgeMQ does not store your data in its own buckets.

  • Durable write-ahead log (WAL)

    Each account/region runs in an isolated microVM with an NVMe-backed WAL. Requests are acknowledged only after they are safely written to disk.

  • Segmented, compressed uploads

    The WAL is cut into segments, sealed, compressed, and uploaded to S3 using multipart upload.

  • Commit markers in S3

    After a segment is fully uploaded, EdgeMQ writes a small commit marker object. Consumers can use these markers to know which segments are complete and safe to process.

  • At-least-once delivery and backpressure

    EdgeMQ provides at-least-once semantics with per-instance ordering, and uses bounded queues with 503 + Retry-After responses for overload protection rather than silently dropping data.

  • Per-account isolation

    Each account has a dedicated microVM per region (isolated WAL, queues, and network). No WAL or disk is shared between tenants.

How EdgeMQ fits your data stack

EdgeMQ is designed to be the ingest layer, not a warehouse or processing engine. Common patterns include:

Lakehouse / warehouse ingest

  • Producers send events to EdgeMQ over HTTP.
  • EdgeMQ writes compressed segments + commit markers to S3.
  • Tools such as Snowflake (Snowpipe / COPY INTO), Databricks Autoloader, Spark, or Trino load from those S3 prefixes into tables.

Burst buffer in front of databases

  • Services write into EdgeMQ instead of directly into ClickHouse or Postgres.
  • Batch jobs read from S3 and insert into the database at a controlled rate.

ML and analytics data feeds

  • Application, telemetry, or model logs are ingested into S3 via EdgeMQ.
  • ML pipelines and notebooks (Databricks, Snowflake, DuckDB, etc.) read from the same S3 prefixes to build training, evaluation, and feature datasets.

Webhook and partner ingest

  • Third-party webhooks terminate on EdgeMQ.
  • S3 retains a complete log of incoming events; downstream jobs parse and load as needed.

Data formats

Today, EdgeMQ focuses on line-oriented JSON:

  • Input formats
    • NDJSON (application/x-ndjson)
    • RecordIO-style streams carrying JSON records
  • Output format in S3 (current)
    • Compressed WAL segments containing NDJSON/RecordIO frames
    • JSON commit markers referencing each completed segment

The long-term direction is to support additional output formats on S3 (such as Parquet, CSV, and table-friendly layouts) so lakehouse and warehouse engines can read EdgeMQ data directly with minimal transformation.

Details of the current on-disk/on-S3 format are documented in S3 File Format and Expand segments → NDJSON.

When to use EdgeMQ

EdgeMQ is a good fit when:

  • You want a simple HTTP endpoint that reliably lands data in S3.
  • You treat S3 as the raw/Bronze layer for your data, analytics, or ML stack.
  • You want at-least-once, durable ingest without operating Kafka/Kinesis or building custom HTTP → S3 collectors.
  • You need to standardize how multiple teams, services, devices, or partners send data into your S3 environment.

If you only need occasional file uploads or direct application writes into a database/warehouse, EdgeMQ may be unnecessary. It is primarily intended for continuous or high-volume event/data streams.

Next steps

  • Quickstart — create an account, connect your S3 bucket via IAM, and send your first NDJSON payload to /ingest.
  • S3 File Format — understand how segments and commit markers are laid out in S3.
  • Expand segments → NDJSON — see how to turn compressed segments back into NDJSON for use with your tools and pipelines.