Product

For ML Teams

Feed your models with live, reliable data

EdgeMQ is the AI data hose into your S3. Ingest events from apps, devices, and services over HTTPS and land them reliably in object storage—ready for Databricks, Snowflake, ClickHouse, DuckDB, and your feature pipelines.

Under the hood, EdgeMQ is the same lakehouse ingest layer your data team uses to keep the S3 Bronze layer fresh.

Stop babysitting brittle data feeds.

Start assuming S3 is always fresh.

SnowflakeDatabricksClickHouseDuckDB
The ML bottleneck

Getting data into the lake

As an ML engineer, MLOps engineer, or AI platform owner, you're held back by one thing over and over:

Data doesn't show up in S3 reliably.

Instead, you deal with:

  • Training pipelines that depend on homegrown data collectors that break quietly.
  • "Quick scripts" that upload JSON to S3... until someone changes a cron, a path, or a credential.
  • Constant questions like: "Is this dataset actually up to date?" "Did we drop any events during that incident?"
  • Painful back-and-forth with product / data engineering teams just to get a new event stream wired up.
  • Dreams of online-ish and continuous training, blocked by bad ingest.

You want to focus on models, features, and evaluation—not HTTP retries and S3 multipart uploads.

Solution

EdgeMQ: the AI data hose into your S3

EdgeMQ is a managed ingest layer for modern data and ML stacks. Producers send NDJSON over HTTPS to a single endpoint. EdgeMQ:

Writes to durable WAL

Writes each request to a durable write-ahead log (WAL) on NVMe at the edge.

Handles bursts

Handles bursts and reconnect storms with bounded queues and backpressure.

Compresses & ships

Compresses and ships segments into your S3 bucket.

Commit markers

Writes a commit marker only when the segment is safely stored.

From your point of view, S3 just keeps filling with fresh, trustworthy data you can build ML pipelines on.

Simple integration

ML-friendly ingest in one call

Your upstream teams can send training and feature data with a simple call:

curl -X POST "https://<region>.edge.mq/ingest" \
    -H "Authorization: Bearer $EDGEMQ_TOKEN" \
    -H "Content-Type: application/x-ndjson" \
    --data-binary @events.ndjson

EdgeMQ guarantees:

  • Events hit disk (WAL) before acknowledging.
  • If the system is overloaded, producers see 503 + Retry-After, not silent drops.
  • Compressed segments land under a prefix you control.
  • Commit markers tell your ML pipelines exactly which segments are safe to read.

You don't build or own any of this ingest plumbing. You just depend on it.

Training & evaluation

Continuous datasets for training and evaluation

Your best models come from:

EdgeMQ makes it realistic to treat S3 as a continuously updated ML data lake:

Training data

Segment your EdgeMQ S3 prefixes by time, cohort, or experiment, and train directly from them with Spark, Databricks, Snowflake, or DuckDB.

Evaluation slices

Pull specific date ranges or cohorts from EdgeMQ-managed prefixes to create consistent validation and test sets.

Experiment logs

Ingest model input/output events via EdgeMQ to analyze drift, failures, or regression behavior later.

Because ingest is handled centrally, you don't have to negotiate a new pipeline every time you want a new signal.

Feature engineering

Feature pipelines powered by S3

Most modern feature stores and custom feature pipelines assume object storage is the raw source of truth. EdgeMQ is built to keep that source of truth healthy:

Raw → feature pipeline

  • Apps / devices / services send NDJSON to EdgeMQ.
  • EdgeMQ lands segments in S3 under structured prefixes.
  • Your feature jobs (Spark, Flink, dbt, custom Python) transform those segments into feature tables or online stores.

Historical replay

  • Rebuild features from historical EdgeMQ segments when you change logic.
  • Reproduce past model behavior by training from the exact same raw data.

Multi-consumer

The same EdgeMQ raw data can feed both:

  • Offline training / evaluation
  • Online feature stores / monitoring pipelines

Once the data is in S3, you're free to wire it into any feature stack you want.

Integrations

Query with the tools you already use

EdgeMQ doesn't ask you to switch engines. It just keeps them fed.

Databricks / Spark

Treat EdgeMQ prefixes as your streaming input:

  • EdgeMQ continuously drops compressed segments + commit markers into S3.
  • Databricks Autoloader or Spark jobs monitor those prefixes and load data into Delta tables.
  • You train models on Delta and build feature tables on top.

Future-facing: as EdgeMQ emits Parquet and table-like layouts, Autoloader gets even faster and simpler to configure.

Snowflake

Use EdgeMQ as the raw staging area for training and analytics tables:

  • Producers → EdgeMQ / ingest → S3.
  • Snowpipe or COPY INTO pulls from those S3 prefixes into Snowflake tables.
COPY INTO ml_raw.events
FROM 's3://your-bucket/edge-events/ml/prod/'
CREDENTIALS=(AWS_ROLE='arn:aws:iam::123:role/edge-snowflake-access')
FILE_FORMAT = (TYPE = JSON)
PATTERN = '.*\.json\.gz';

You keep all the power of Snowflake; you just stop worrying about how data got to S3.

ClickHouse / Postgres

Use EdgeMQ as a buffer in front of online and near-real-time stores:

  • High-volume events (clicks, metrics, logs) flow to EdgeMQ, not directly to your database.
  • EdgeMQ absorbs the spikes and writes to S3.
  • A loader job ingests into ClickHouse or Postgres at the rate your cluster can safely take.

This is perfect for near-real-time feature tables in ClickHouse and monitoring/tracking tables in Postgres with controlled load.

DuckDB (and friends)

Give data scientists and ML researchers direct, notebook-friendly access to fresh data:

  • EdgeMQ writes NDJSON (and in future, Parquet) to S3.
  • DuckDB queries those S3 prefixes directly from a laptop or cloud notebook.
SELECT *
FROM read_json_auto('s3://your-bucket/edge-events/ml/prod/*.json.gz')
WHERE event_type = 'prediction'
AND ts >= now() - INTERVAL '7 days';

No duplicate pipelines, no extra infrastructure—just query the lake.

Roadmap

Future-ready formats for ML and analytics

Today, EdgeMQ is optimized for NDJSON → compressed segments + commit markers in S3. The roadmap expands this into a format-aware ingest layer:

Input formats

  • NDJSON and JSON batches.
  • Additional line-delimited formats over time.

Output formats on S3

  • NDJSON segments (today).
  • Parquet for efficient columnar training and feature extraction.
  • CSV where needed.
  • Table-friendly layouts (e.g. Iceberg-style directory structures) that slot into Databricks, Spark, Trino/Presto, Snowflake (via external tables), DuckDB and other engines.

For ML teams, that means faster training jobs (less parsing, more scanning), easier integration with new engines and tools, and less glue code translating "whatever we got from the app" into "what the engine expects."

Use cases

Example ML patterns with EdgeMQ

Real-time product signals → feature store

  • Product backend logs user actions and metadata as NDJSON.
  • EdgeMQ ingests those events from multiple regions into S3.
  • A feature pipeline in Spark/Databricks/Snowflake converts segments into offline feature tables for training and online features for a feature store or a low-latency DB.

Result: your models see fresh behavioral signals without anyone building a bespoke ingest stack.

Telemetry & sensor data → anomaly detection

  • IoT devices POST telemetry to regional EdgeMQ endpoints.
  • EdgeMQ handles flaky networks, reconnect storms, and spikes.
  • Telemetry accumulates in S3 as a clean, continuous stream of records.
  • You train and deploy anomaly detection models (Spark, Snowflake, ClickHouse, or notebooks) using this history.

Result: your detection models ride on top of a robust ingest backbone, not fragile scripts.

Model input/output logging → observability and evaluation

  • When your model serves a prediction, your service logs input features, model version, output, and extra context.
  • Those logs go to EdgeMQ, not local disk.
  • EdgeMQ lands them in S3 under a ml-logs/ prefix.
  • You analyze drift, calibration, failures, counterfactuals and "what if?" scenarios.

Because everything is centralized in S3, you can slice, audit, and replay model behavior over time.

You don't need to own ingest infrastructure

Most ML teams don't want to:

  • Run Kafka or Kinesis just for ingest.
  • Maintain critical HTTP services that absorb events.
  • Debug partial S3 uploads and edge-case retries.
  • Explain to security why there are random access keys in source trees.

EdgeMQ takes this off your plate:

Managed edge infrastructure

Per-tenant microVMs, WAL on NVMe, S3 shippers, and health checks are operated for you.

Predictable overload behavior

If things get hot, producers see 503 + Retry-After. You don't get silent gaps in datasets.

Security that fits your platform

S3 writes via short-lived IAM roles and scoped prefixes; data teams and platform teams can govern it using the tools they already know.

You get a dependable data hose; platform/infrastructure stays in control; ML teams move faster.

Collaborate cleanly with data and platform teams

EdgeMQ is a shared primitive you can rally around. It's the common lakehouse ingest layer that data engineers, ML teams, and platform engineers all depend on, with S3 as the shared source of truth.

Platform / infra

  • Set up S3 buckets, prefixes, and IAM roles.
  • Provision EdgeMQ endpoints as a "paved road" for ingest.

Data engineers

  • Define schemas, prefixes, and downstream load jobs.
  • Use EdgeMQ as the standard way data enters the lake.

ML teams

  • Consume from the same S3 lake for training, evaluation, and features.
  • Ask for "one more prefix + schema" instead of "a new ingest system."

Everyone aligns on a single, well-understood ingest layer.

Make S3 the live heart of your ML platform

Your models are only as good as the data they see—and how reliably they see it. EdgeMQ makes getting data into S3 something you can take for granted:

Reliable ingest from anywhere

Apps, devices, services, and partners.

ML-ready lake in S3

Constantly updated, structured, and easy to query.

No custom ingest infra

For you to own or debug.

Related pages

Ready to feed your models with live data instead of brittle pipelines?

Stop babysitting brittle data feeds. Start assuming S3 is always fresh.