Introduction
EdgeMQ is a managed ingest layer that accepts data via HTTP and MQTT and delivers it into your S3 bucket. Regional endpoints accept NDJSON over HTTPS, and IoT devices can publish telemetry via standard MQTT client libraries — both protocols write into the same pipeline and produce the same S3 artifacts:
- Segments (compressed WAL segment files)
- Parquet (raw / opaque payload)
- Parquet (schema-aware materialized views) with typed columns
EdgeMQ is typically used as the ingest layer in data warehouse, lakehouse, and ML/AI pipelines where S3 is the Bronze/raw layer and tools like Snowflake, Databricks, ClickHouse, DuckDB, and Postgres consume from there.
See Output formats for a side-by-side view of what each artifact is for and when to use it.
What EdgeMQ provides
- HTTP and MQTT ingest at the edge
Regional endpoints accept NDJSON/RecordIO over HTTPS from services, backends, and batch jobs. The same endpoints also accept MQTT connections from IoT devices, sensors, and gateways. Both protocols share the same WAL, authentication, and S3 destination.
- Delivery into your S3 bucket
Data is written into an S3 bucket that you own, via an IAM Role you configure. EdgeMQ does not store your data in its own buckets.
- Three S3 output formats
Choose one or more artifact types for a given endpoint: segments, Parquet (raw/opaque payload), and Parquet (schema-aware views) for typed, table-like outputs. Schema-aware views can filter by MQTT topic for targeted data extraction.
- At-least-once delivery and backpressure
EdgeMQ provides at-least-once semantics with per-instance ordering, and uses bounded queues with 503 + Retry-After responses for overload protection rather than silently dropping data. MQTT clients receive the same durability guarantees — QoS 1 PUBACK is sent after the message is written to the WAL.
- Per-account isolation
Each account has a dedicated microVM per region (isolated WAL, queues, and network). No WAL or disk is shared between tenants.
- AI-assisted schema management
Schema Autopilot generates view definitions from sample JSON, diagnoses DLQ failures with proposed fixes, and detects field drift over time — so your pipeline stays healthy as data evolves.
Ingest protocols
Every EdgeMQ endpoint accepts both HTTP and MQTT. You can use one or both protocols on the same endpoint — data from both flows through the same WAL and lands in the same S3 destination.
- HTTP
POST /v1/ingestwith NDJSON or RecordIO payloads. Authenticate with an API key in theX-API-Keyheader. Best for application backends, webhooks, and batch uploads. - MQTT
Standard MQTT clients connect via
wss://(WebSocket, recommended) ormqtts://(native TCP). Authenticate with the same API key in the MQTTpasswordfield. MQTT topics are preserved as metadata and available in materialized views. Best for IoT devices, sensors, and gateways. See the MQTT guide for connection setup and client library examples.
How EdgeMQ fits your data stack
EdgeMQ is designed to be the ingest layer, not a warehouse or processing engine. Common patterns include:
Lakehouse / warehouse ingest
- Producers send events to EdgeMQ over HTTP or MQTT.
- EdgeMQ writes segments and/or Parquet artifacts to S3, depending on your output configuration.
- Tools such as Snowflake (Snowpipe / COPY INTO), Databricks Autoloader, Spark, or Trino load from those S3 prefixes into tables.
IoT telemetry landing
- Devices and sensors publish telemetry to EdgeMQ via MQTT.
- Materialized views filter by MQTT topic and extract typed columns directly at ingest time.
- Query-ready Parquet lands in S3 — no ETL job required between ingest and analytics.
Burst buffer in front of databases
- Services write into EdgeMQ instead of directly into ClickHouse or Postgres.
- Batch jobs read from S3 and insert into the database at a controlled rate.
ML and analytics data feeds
- Application, telemetry, or model logs are ingested into S3 via EdgeMQ.
- ML pipelines and notebooks (Databricks, Snowflake, DuckDB, etc.) read from the same S3 prefixes to build training, evaluation, and feature datasets.
Webhook and partner ingest
- Third-party webhooks terminate on EdgeMQ.
- S3 retains a complete log of incoming events; downstream jobs parse and load as needed.
Data formats
EdgeMQ focuses on JSON input via HTTP or MQTT, and produces S3 artifacts you can query or transform downstream:
- Input formats
- HTTP: NDJSON (application/x-ndjson) and RecordIO-style streams
- MQTT: JSON payloads published to topics (topic metadata preserved in WAL)
- Outputs in S3
- Compressed WAL segments containing framed payloads (when segment output is enabled)
- Parquet (raw / opaque payload) for direct querying
- Parquet (schema-aware materialized views) with typed columns — can filter by MQTT topic
- Commit markers/manifests that indicate what artifacts are complete and safe to read
Details are documented in Output formats, Segment file format and Expand segments → NDJSON.
When to use EdgeMQ
EdgeMQ is a good fit when:
- You want a simple HTTP or MQTT endpoint that reliably lands data in S3.
- You have IoT devices that speak MQTT and need telemetry in S3 as query-ready Parquet.
- You treat S3 as the raw/Bronze layer for your data, analytics, or ML stack.
- You want at-least-once, durable ingest without operating Kafka/Kinesis or building custom collectors.
- You need to standardize how multiple teams, services, devices, or partners send data into your S3 environment.
If you only need occasional file uploads or direct application writes into a database/warehouse, EdgeMQ may be unnecessary. It is primarily intended for continuous or high-volume event/data streams.
Next steps
- Quickstart — create an account, connect your S3 bucket via IAM, and send your first payload.
- MQTT — connect IoT devices via MQTT with client library examples for Python, Node.js, Go, and Arduino.
- Output formats — understand segments, Parquet (raw), and schema-aware Parquet materialized views.
- Schema Autopilot — generate view definitions, diagnose DLQ failures, and detect schema drift with AI assistance.
- Segment file format — understand how segments are laid out in S3.