Overview¶
Each observation captures a snapshot of the current state of a device’s screen and UI. When executed, it collects multiple data points in parallel to minimize observation latency. These operations are incredibly platform specific and will likely require a different ordering of steps per platform. All of this is to drive the interaction loop.
All collected data is assembled into an object containing (fields may be omitted when unavailable):
updatedAt: device timestamp (or server timestamp fallback)screenSize: current screen dimensions (rotation-aware)systemInsets: UI insets for all screen edgesrotation: current device rotation valueactiveWindow: current app/activity information when resolvedviewHierarchy: complete UI hierarchy (if available)focusedElement: currently focused UI element (if any)intentChooserDetected: whether a system intent chooser is visiblewakefulnessandbackStack: Android-specific stateperfTiming,displayedTimeMetrics(Android launchApp “Displayed” startup timings),performanceAudit, andaccessibilityAudit: present when the relevant modes are enablederror: error messages encountered during observation
The observation gracefully handles various error conditions:
- Screen off or device locked states
- Missing accessibility service
- Network timeouts or ADB connection issues
- Partial failures (returns available data even if some operations fail)
Each error is captured in the result object without causing the entire observation to fail, ensuring maximum data availability for automation workflows.
See Also¶
- Video Recording for setting up screen recording for later analysis.
- Vision Fallback for how we fall back to LLM vision analysis when view hierarchy observation fails.
- Visual Highlighting for how we can draw on top of the observed app.