SDK Event Pipeline & Crash Handling¶
✅ Implemented 🧪 Tested
Current state: Both Android and iOS SDKs implement a shared event pipeline architecture with disk-first persistence, buffered broadcasting, crash/hang detection, breadcrumb trails, and session tracking. See the Status Glossary for chip definitions.
Overview¶
The AutoMobile SDK event pipeline collects telemetry from instrumented apps and delivers it to the MCP server for real-time observability. Both platforms follow the same conceptual architecture: events flow through a thread-safe buffer, are persisted to disk before broadcast, and are delivered in batches to minimize overhead.
Event Flow¶
sequenceDiagram
participant App as App Code
participant Buffer as SdkEventBuffer
participant Disk as EventPersistence
participant Broadcaster as SdkEventBroadcaster
participant MCP as MCP Server
App->>Buffer: add(event)
Note over Buffer: Collect until
maxBufferSize or
flushIntervalMs
Buffer->>Disk: persist(batch)
Disk-->>Buffer: batchId
Buffer->>Broadcaster: onFlush(events)
Broadcaster->>MCP: Deliver batch
alt Delivery succeeds
Broadcaster->>Disk: removeBatch(batchId)
else Delivery fails
Note over Disk: Batch stays on disk
for retry on next launch
end
Note over Broadcaster: On next app launch
Broadcaster->>Disk: loadPending()
Disk-->>Broadcaster: unsent batches (FIFO)
Broadcaster->>MCP: Replay pending batches
Core Components¶
| Component | Description | Status |
|---|---|---|
SdkEventBuffer |
Thread-safe buffer that flushes on capacity or timer | ✅ Implemented |
SdkEventBroadcaster |
Serializes and delivers event batches cross-process | ✅ Implemented |
EventPersistence |
Disk-first persistence for crash resilience | ✅ Implemented |
DropCounter |
Back-pressure metrics tracking dropped events by reason | ✅ Implemented |
AutoMobileCrashes |
Unhandled crash detection with thread dumps | ✅ Implemented |
AutoMobileFailures |
Handled (non-fatal) exception recording | ✅ Implemented |
BreadcrumbTrail |
Ring buffer of recent actions attached to crash reports | ✅ Implemented |
SessionTracker |
Foreground/background lifecycle session rotation | ✅ Implemented |
SdkContext |
Thread-safe ambient state (session, user, tags) | ✅ Implemented |
AutoMobileAnr / AutoMobileHangs |
ANR/hang detection (platform-specific) | ✅ Implemented |
Disk-First Persistence¶
Events are written to disk before broadcast to survive process death. Each batch is stored as a single JSON file named events_{timestamp}_{uuid}.json, providing FIFO ordering by filename sort.
On successful delivery, the batch file is deleted. On failure (broadcast error, app crash during delivery), the file remains on disk and is replayed on the next app launch via loadPending().
Stale batches are cleaned up by cleanup(maxAgeDays:) (default 7 days) to prevent unbounded disk growth.
Buffer Tuning¶
SdkEventBuffer controls the trade-off between latency and overhead:
| Parameter | Default | Description |
|---|---|---|
maxBufferSize |
50 | Events collected before a forced flush |
flushIntervalMs |
500 | Periodic flush interval in milliseconds |
When either threshold is reached, the buffer drains into the broadcaster. The buffer is protected by a lock (ReentrantLock on Android, NSLock on iOS) and supports isEnabled toggling at runtime.
DropCounter¶
DropCounter tracks events that could not be delivered, categorized by reason:
| Reason | Trigger |
|---|---|
DISABLED |
Event added while buffer is disabled |
SHUTDOWN |
Event added after buffer shutdown (Android only) |
FLUSH_ERROR |
Delivery callback threw an exception |
The counter provides snapshot() for diagnostics and reset() to clear counts. Android uses ConcurrentHashMap<DropReason, AtomicLong> for lock-free increments; iOS uses NSLock with a [DropReason: Int] dictionary.
Crash Detection¶
Both platforms install an unhandled exception handler that fires before the process terminates.
Crash flow¶
- Exception handler captures the error (class, message, stack trace)
- All-thread dumps are collected for full crash context (Android only – iOS lacks a public API for cross-thread stacks)
- Breadcrumb trail snapshot is serialized and attached
- Device info and current screen name are collected
- Crash event is broadcast/buffered for delivery
- Original handler is chained to preserve default crash behavior
Android: AutoMobileCrashes¶
- Installs
Thread.UncaughtExceptionHandler - Calls
Thread.getAllStackTraces()for all-thread dumps, capped at 50KB to stay under Android’s 1MB Binder limit - Broadcasts crash via scoped
Intentto the accessibility service package - Sleeps 200ms after broadcast to allow dispatch before process termination
- Serializes breadcrumbs with binary-search truncation to fit within 50KB
iOS: AutoMobileCrashes¶
- Installs
NSSetUncaughtExceptionHandlerfor ObjC/Swift exceptions - Optional signal handlers (
enableSignalHandlers()) for SIGABRT, SIGSEGV, SIGBUS, SIGFPE, SIGILL (SIGTRAP excluded to avoid debugger interference) - Signal handler writes signal number to a file using only async-signal-safe POSIX calls (
open,write,close) - On next launch,
checkPreviousSignalCrash()reads the file and emits a crash event - Chains to previous exception/signal handlers to preserve other crash reporters
ANR / Hang Detection¶
Android: AutoMobileAnr¶
Uses the ApplicationExitInfo API (Android 11+ / API 30) to detect ANRs from previous sessions. On initialization:
- Queries
ActivityManager.getHistoricalProcessExitReasons()forREASON_ANRentries - Filters out already-reported ANRs using a persisted timestamp in
SharedPreferences - Reads ANR trace from
exitInfo.traceInputStream - Broadcasts new ANR events to the accessibility service
This is a retrospective approach – ANRs are reported on the next app launch, not in real time.
iOS: AutoMobileHangs¶
Uses a watchdog thread to detect main thread hangs in real time:
- A background thread dispatches a block to the main queue via
DispatchQueue.main.async - Waits on a
DispatchSemaphorewith a configurable timeout (hangThresholdMs, default 2000ms) - If the semaphore times out, the main thread is considered hung
- Reports a
SdkHangEventwith the measured duration - Polls at
pollIntervalMsintervals (default 500ms)
Note: iOS does not provide a public API to capture another thread’s call stack. The watchdog captures its own stack as a diagnostic marker. For production hang diagnostics, Apple recommends MetricKit’s MXHangDiagnostic (iOS 16+).
Session Tracking¶
SessionTracker manages user sessions based on app foreground/background lifecycle:
stateDiagram-v2
[*] --> ENDED
ENDED --> ACTIVE: onForeground() [new UUID]
ACTIVE --> BACKGROUNDED: onBackground()
BACKGROUNDED --> ACTIVE: onForeground() [same session]
BACKGROUNDED --> ENDED: timeout expires (30s default)
A new session ID (UUID) is generated on the first onForeground() call or after the background timeout expires. The timeout is configurable (default 30 seconds) and cancellable – returning to foreground before expiry resumes the same session.
Both platforms use injectable timer factories and UUID providers for deterministic testing.
Breadcrumb Trail¶
BreadcrumbTrail is a thread-safe ring buffer that records recent user actions. When full, the oldest breadcrumb is evicted.
| Property | Default |
|---|---|
maxSize |
100 |
Each Breadcrumb contains:
- timestamp – when the action occurred
- category – one of NAVIGATION, TAP, LIFECYCLE, NETWORK, LOG, CUSTOM
- message – human-readable description
- metadata – optional key-value pairs
Breadcrumbs are attached to crash reports to provide context about what the user was doing before the crash. On Android, the serialized JSON is truncated via binary search to fit within 50KB. On iOS, BreadcrumbTrail additionally supports writeToDisk() / loadFromDisk() for crash resilience across sessions.
SDK Context¶
SdkContext holds ambient state that can be attached to events:
sessionId– current session identifier (set bySessionTracker)userId– optional user identifierappVersion– app version stringtags– arbitrary key-value metadata
Thread-safe access is provided via locks. snapshot() returns an immutable copy of the current state.
Platform Comparison¶
| Aspect | Android | iOS |
|---|---|---|
| Language | Kotlin | Swift |
| Thread safety | ReentrantLock, @Volatile, ConcurrentHashMap |
NSLock, @unchecked Sendable |
| Event delivery | Scoped Intent broadcast to accessibility service package |
NotificationCenter (in-process) + HTTP POST to CtrlProxy (debug) |
| Batch size limit | 100KB per Intent (recursive split) | No hard limit (HTTP POST) |
| Crash handler | Thread.UncaughtExceptionHandler |
NSSetUncaughtExceptionHandler + optional signal handlers |
| All-thread dumps | Thread.getAllStackTraces() (50KB cap) |
Not available (no public API) |
| Signal crash persistence | N/A | Writes signal number to file via POSIX write() |
| ANR/Hang detection | ApplicationExitInfo API (retrospective, API 30+) |
Watchdog thread with semaphore (real-time) |
| Hang threshold | N/A (system-defined ANR timeout ~5s) | Configurable hangThresholdMs (default 2000ms) |
| Breadcrumb disk persistence | Via EventPersistence (crash events include breadcrumbs) |
Dedicated writeToDisk() / loadFromDisk() on BreadcrumbTrail |
| Session timeout | 30s (configurable via constructor) | 30s (configurable via constructor) |
| Timer abstraction | ScheduledExecutorService (injectable) |
TimerScheduling protocol with GCDTimer |
| Retry policy | None (keep on disk for next launch) | Exponential backoff for HTTP delivery |
| Handled exceptions | AutoMobileFailures (in-memory ring buffer, 100 max) |
AutoMobileFailures (in-memory array, 100 max) |
| Persistence format | JSON via org.json |
JSON via Codable + SdkEventEnvelope |