Segment sealing

EdgeMQ batches incoming events into segments before writing them to S3. A segment is sealed (closed and uploaded) when it reaches a threshold - but which threshold?

EdgeMQ uses hybrid segment sealing: segments seal when either the size threshold or the time threshold is reached - whichever comes first. This approach automatically balances efficiency and responsiveness based on your traffic volume.

How it works

ThresholdDefaultPurpose
Size512 KBEfficient batching for high throughput
Time (segment_max_age_seconds)60 secondsBounded latency for low traffic / testing

When you send data to EdgeMQ:

  1. Events are appended to the current segment
  2. EdgeMQ checks both thresholds
  3. When either threshold is reached, the segment seals and uploads to S3
  4. A new segment opens for subsequent events

Self-balancing behavior

The hybrid approach automatically optimizes for your traffic pattern:

ThroughputWhat happensSealing trigger
High (1 MB/s)Segments fill in ~0.5 secondsSize (efficient large files)
Medium (100 KB/s)Segments fill in ~5 secondsSize (efficient batching)
Low (10 KB/s)Segments fill in ~50 secondsSize (still size-based)
Testing (1 KB/s)Segments would take 8+ minutes by size aloneTime (responsive UX)

Crossover point: At roughly 8.5 KB/s, size-based and time-based sealing are equivalent. Above this rate, size dominates; below, time kicks in.

Why this matters

For production workloads with steady traffic, size-based sealing naturally dominates. This produces efficient, consistently-sized files that are optimal for downstream processing (Parquet conversion, warehouse loads, etc.).

For testing and development, you often send small amounts of data and expect to see results quickly. Without time-based sealing, you'd wait until 512 KB accumulates - potentially hours or days. The 60-second default ensures data appears in S3 (and in Data Quality metrics) within a bounded time.

Configuration

You can adjust the time threshold per endpoint in the Console under Configuration → Data Processing → Segment Max Age.

SettingRangeDefaultNotes
segment_max_age_seconds10–360060Lower = faster feedback, Higher = larger files

Recommendations:

  • Testing / Development: Keep the default (60s) or lower it (e.g., 30s) for faster feedback
  • Production (steady traffic): Default (60s) works well - size-based sealing will dominate anyway
  • Production (optimizing for file size): Increase to 300–600s if you want to ensure larger files even during traffic lulls
Note

The size threshold (512 KB) is not currently user-configurable. If you have a use case that requires a different size threshold, contact support.

Effects on commit frequency and file sizes

Adjusting segment_max_age_seconds affects two things:

1. Commit frequency (data availability latency)

Lower values mean segments seal more frequently during low-traffic periods, so data appears in S3 sooner. This is useful when:

  • Testing new endpoints and expecting immediate feedback
  • Running low-volume endpoints where timely data matters more than file efficiency
  • Using Data Quality metrics to debug schema validation issues

2. File sizes during low traffic

Higher values allow more data to accumulate before time-based sealing triggers, producing larger files. This is useful when:

  • Downstream systems prefer fewer, larger files
  • You want to minimize S3 API costs (fewer PUTs)
  • Traffic is bursty and you'd rather wait for the next burst than seal a tiny segment

Example scenarios

Scenario 1: Testing a new endpoint

You create an endpoint, send 10 test records, and want to verify they appear correctly.

  • With default settings (60s), data appears in S3 within ~60 seconds
  • Data Quality metrics update shortly after
  • You can inspect the segment or Parquet output without waiting

Scenario 2: High-volume production

You're ingesting 5 MB/s of event data continuously.

  • Segments seal every ~0.1 seconds by size (512 KB reached quickly)
  • The time threshold (60s) is never reached
  • File sizes are consistent, optimal for downstream processing

Scenario 3: Sporadic / bursty traffic

Traffic comes in bursts of 100 KB every few minutes.

  • With default settings (60s), each burst seals as a ~100 KB segment after the time threshold
  • If you increase to 300s, multiple bursts might accumulate into a single larger segment
  • Trade-off: larger files vs. longer delay before data is available

Related