Introduction

EdgeMQ is a managed HTTP → S3 ingest layer. It exposes regional HTTPS endpoints that accept NDJSON (and RecordIO-style streams) and writes that data into your S3 bucket as:

Segments (compressed WAL segment files)
Parquet (raw / opaque payload)
Parquet (schema-aware materialized views) with typed columns

EdgeMQ is typically used as the ingest layer in data warehouse, lakehouse, and ML/AI pipelines where S3 is the Bronze/raw layer and tools like Snowflake, Databricks, ClickHouse, DuckDB, and Postgres consume from there.

See Output formats for a side-by-side view of what each artifact is for and when to use it.

What EdgeMQ provides

HTTP ingest at the edge
Regional / ingest endpoints accept NDJSON/RecordIO over HTTPS from services, devices, and batch jobs.
Delivery into your S3 bucket
Data is written into an S3 bucket that you own, via an IAM Role you configure. EdgeMQ does not store your data in its own buckets.
Three S3 output formats
Choose one or more artifact types for a given endpoint: segments, Parquet (raw/opaque payload), and Parquet (schema-aware views) for typed, table-like outputs.
At-least-once delivery and backpressure
EdgeMQ provides at-least-once semantics with per-instance ordering, and uses bounded queues with 503 + Retry-After responses for overload protection rather than silently dropping data.
Per-account isolation
Each account has a dedicated microVM per region (isolated WAL, queues, and network). No WAL or disk is shared between tenants.
AI-assisted schema management
Schema Autopilot generates view definitions from sample JSON, diagnoses DLQ failures with proposed fixes, and detects field drift over time — so your pipeline stays healthy as data evolves.

How EdgeMQ fits your data stack

EdgeMQ is designed to be the ingest layer, not a warehouse or processing engine. Common patterns include:

Lakehouse / warehouse ingest

Producers send events to EdgeMQ over HTTP.
EdgeMQ writes segments and/or Parquet artifacts to S3, depending on your output configuration.
Tools such as Snowflake (Snowpipe / COPY INTO), Databricks Autoloader, Spark, or Trino load from those S3 prefixes into tables.

Burst buffer in front of databases

Services write into EdgeMQ instead of directly into ClickHouse or Postgres.
Batch jobs read from S3 and insert into the database at a controlled rate.

ML and analytics data feeds

Application, telemetry, or model logs are ingested into S3 via EdgeMQ.
ML pipelines and notebooks (Databricks, Snowflake, DuckDB, etc.) read from the same S3 prefixes to build training, evaluation, and feature datasets.

Webhook and partner ingest

Third-party webhooks terminate on EdgeMQ.
S3 retains a complete log of incoming events; downstream jobs parse and load as needed.

Data formats

EdgeMQ focuses on line-oriented JSON input, and produces S3 artifacts you can query or transform downstream:

Input formats
- NDJSON (application/x-ndjson)
- RecordIO-style streams carrying JSON records
Outputs in S3
- Compressed WAL segments containing NDJSON/RecordIO frames (when segment output is enabled)
- Parquet (raw / opaque payload) for direct querying
- Parquet (schema-aware materialized views) with typed columns
- Commit markers/manifests that indicate what artifacts are complete and safe to read

Details are documented in Output formats, Segment file format and Expand segments → NDJSON.

When to use EdgeMQ

EdgeMQ is a good fit when:

You want a simple HTTP endpoint that reliably lands data in S3.
You treat S3 as the raw/Bronze layer for your data, analytics, or ML stack.
You want at-least-once, durable ingest without operating Kafka/Kinesis or building custom HTTP → S3 collectors.
You need to standardize how multiple teams, services, devices, or partners send data into your S3 environment.

If you only need occasional file uploads or direct application writes into a database/warehouse, EdgeMQ may be unnecessary. It is primarily intended for continuous or high-volume event/data streams.

Next steps

Quickstart - create an account, connect your S3 bucket via IAM, and send your first NDJSON payload to /ingest.
Output formats - understand segments, Parquet (raw), and schema-aware Parquet materialized views.
Schema Autopilot - generate view definitions, diagnose DLQ failures, and detect schema drift with AI assistance.
Segment file format - understand how segments are laid out in S3.
Expand segments → NDJSON - see how to turn compressed segments back into NDJSON for use with your tools and pipelines.