Segment sealing
EdgeMQ batches incoming events into segments before writing them to S3. A segment is sealed (closed and uploaded) when it reaches a threshold - but which threshold?
EdgeMQ uses hybrid segment sealing: segments seal when either the size threshold or the time threshold is reached - whichever comes first. This approach automatically balances efficiency and responsiveness based on your traffic volume.
How it works
| Threshold | Default | Purpose |
|---|---|---|
| Size | 512 KB | Efficient batching for high throughput |
Time (segment_max_age_seconds) | 60 seconds | Bounded latency for low traffic / testing |
When you send data to EdgeMQ:
- Events are appended to the current segment
- EdgeMQ checks both thresholds
- When either threshold is reached, the segment seals and uploads to S3
- A new segment opens for subsequent events
Self-balancing behavior
The hybrid approach automatically optimizes for your traffic pattern:
| Throughput | What happens | Sealing trigger |
|---|---|---|
| High (1 MB/s) | Segments fill in ~0.5 seconds | Size (efficient large files) |
| Medium (100 KB/s) | Segments fill in ~5 seconds | Size (efficient batching) |
| Low (10 KB/s) | Segments fill in ~50 seconds | Size (still size-based) |
| Testing (1 KB/s) | Segments would take 8+ minutes by size alone | Time (responsive UX) |
Crossover point: At roughly 8.5 KB/s, size-based and time-based sealing are equivalent. Above this rate, size dominates; below, time kicks in.
Why this matters
For production workloads with steady traffic, size-based sealing naturally dominates. This produces efficient, consistently-sized files that are optimal for downstream processing (Parquet conversion, warehouse loads, etc.).
For testing and development, you often send small amounts of data and expect to see results quickly. Without time-based sealing, you'd wait until 512 KB accumulates - potentially hours or days. The 60-second default ensures data appears in S3 (and in Data Quality metrics) within a bounded time.
Configuration
You can adjust the time threshold per endpoint in the Console under Configuration → Data Processing → Segment Max Age.
| Setting | Range | Default | Notes |
|---|---|---|---|
segment_max_age_seconds | 10–3600 | 60 | Lower = faster feedback, Higher = larger files |
Recommendations:
- Testing / Development: Keep the default (60s) or lower it (e.g., 30s) for faster feedback
- Production (steady traffic): Default (60s) works well - size-based sealing will dominate anyway
- Production (optimizing for file size): Increase to 300–600s if you want to ensure larger files even during traffic lulls
The size threshold (512 KB) is not currently user-configurable. If you have a use case that requires a different size threshold, contact support.
Effects on commit frequency and file sizes
Adjusting segment_max_age_seconds affects two things:
1. Commit frequency (data availability latency)
Lower values mean segments seal more frequently during low-traffic periods, so data appears in S3 sooner. This is useful when:
- Testing new endpoints and expecting immediate feedback
- Running low-volume endpoints where timely data matters more than file efficiency
- Using Data Quality metrics to debug schema validation issues
2. File sizes during low traffic
Higher values allow more data to accumulate before time-based sealing triggers, producing larger files. This is useful when:
- Downstream systems prefer fewer, larger files
- You want to minimize S3 API costs (fewer PUTs)
- Traffic is bursty and you'd rather wait for the next burst than seal a tiny segment
Example scenarios
Scenario 1: Testing a new endpoint
You create an endpoint, send 10 test records, and want to verify they appear correctly.
- With default settings (60s), data appears in S3 within ~60 seconds
- Data Quality metrics update shortly after
- You can inspect the segment or Parquet output without waiting
Scenario 2: High-volume production
You're ingesting 5 MB/s of event data continuously.
- Segments seal every ~0.1 seconds by size (512 KB reached quickly)
- The time threshold (60s) is never reached
- File sizes are consistent, optimal for downstream processing
Scenario 3: Sporadic / bursty traffic
Traffic comes in bursts of 100 KB every few minutes.
- With default settings (60s), each burst seals as a ~100 KB segment after the time threshold
- If you increase to 300s, multiple bursts might accumulate into a single larger segment
- Trade-off: larger files vs. longer delay before data is available
Related
- Output formats - what EdgeMQ writes to S3
- Segment file format - binary structure of sealed segments