TalkBack/VoiceOver¶
🚧 Design Only
Current state: This document describes the full 4-phase implementation plan. Phase 1 detection infrastructure (TalkBack state via ADB secure settings) is partially implemented. Phases 2–4 (tool adaptations, focus tracking, advanced features) are not yet implemented. iOS VoiceOver support is planned. See the Status Glossary for chip definitions.
Overview¶
When TalkBack (Android) or VoiceOver (iOS) is enabled, mobile UX fundamentally changes:
- Navigation Model: Linear swipe-based navigation through accessibility nodes instead of visual/spatial navigation
- Interaction Model: Focus-based actions (e.g., double-tap to activate focused element) instead of direct coordinate-based taps
- View Hierarchy: Accessibility tree may differ from visual hierarchy due to content grouping, virtual nodes, hidden decorative elements, and alternative text
- Gestures: System reserves gestures (e.g., two-finger swipe for scrolling, single swipe for next/previous item)
Strategy: Auto-detect and adapt. MCP tools automatically detect when TalkBack/VoiceOver is enabled and adjust behavior accordingly, without requiring explicit mode parameters from agents.
Design Principles¶
- Transparency: Behavior adaptations are invisible to MCP tool consumers (agents)
- Backward Compatibility: All existing tool interfaces remain unchanged
- Graceful Degradation: If detection fails, fall back to standard behavior with appropriate warnings
- Performance: Detection is cached (<50ms overhead) and does not impact tool execution latency
- Explicit Override: Force accessibility mode via feature flags when needed
Accessibility Mode Detection¶
Detection Methods¶
Android TalkBack can be detected via multiple approaches:
| Method | Mechanism | Latency | Notes |
|---|---|---|---|
| AccessibilityManager (preferred) | settings get secure enabled_accessibility_services via ADB |
~20-40ms | Fast, reliable, cacheable |
| AccessibilityService query | In-process getEnabledAccessibilityServiceList() |
Instant | Requires AutoMobile AccessibilityService context |
dumpsys accessibility (fallback) |
Full accessibility configuration dump | ~100-200ms | Useful for debugging, not production |
iOS VoiceOver is detected via UIAccessibility.isVoiceOverRunning (native iOS API, requires XCTestService integration).
See Android TalkBack for platform-specific ADB commands and simulation details.
Caching Strategy¶
- Tool Initialization: Check once when device session starts
- Periodic Refresh: Re-check every 60 seconds (configurable TTL)
- Explicit Invalidation: After
setTalkBackEnabled()tool calls - Feature Flag Override: Allow manual force-enable for testing
View Hierarchy Differences¶
The accessibility tree exposed by AccessibilityNodeInfo (Android) or AXUIElement (iOS) differs from the visual view hierarchy:
Element Merging¶
TalkBack merges child text into parent for logical reading units:
Before (Visual Hierarchy):
LinearLayout (clickable)
ImageView (icon)
TextView "Settings"
TextView "Manage app preferences"
After (Accessibility Tree):
LinearLayout (clickable, focusable)
content-desc: "Settings, Manage app preferences"
[Children marked importantForAccessibility=NO]
Impact: tapOn with text: "Settings" may not find the TextView directly. Must search for parent with merged content-desc using substring matching.
Virtual Nodes¶
Some accessibility nodes (e.g., slider controls) don’t correspond to actual views. Standard coordinate-based taps fail on virtual nodes; must use accessibility actions (ACTION_SCROLL_FORWARD, ACTION_SCROLL_BACKWARD).
Hidden Decorative Elements¶
Elements marked importantForAccessibility="no" are excluded from the accessibility tree. observe returns fewer elements, and visual selectors may fail. Use semantic selectors (text, content-desc, role) instead.
Content Description Priority¶
When both text and contentDescription exist, TalkBack prioritizes contentDescription. Search logic must check both fields, with content-desc taking priority.
Hierarchy Extraction¶
AutoMobile’s ViewHierarchyExtractor.kt already uses AccessibilityNodeInfo APIs and captures text, contentDescription, isFocusable, and isFocused. No changes needed for basic TalkBack support.
Focus Management¶
Android has two types of focus:
| Aspect | Input Focus | Accessibility Focus |
|---|---|---|
| Purpose | Text input target | Screen reader cursor position |
| Visibility | Cursor/highlight | TalkBack announces, green outline |
| Movement | Via keyboard (Tab) or touch | Via TalkBack swipe gestures |
During scrolling, TalkBack focus may move off-screen, stay on a now-invisible element, or jump to the first visible focusable. The swipeOn tool clears accessibility focus before scrolling to avoid focus-follow issues.
Gesture Adaptations¶
TalkBack Gesture Conflicts¶
When TalkBack is active, Android reserves certain gestures:
| Standard Gesture | TalkBack Behavior | Impact on Automation |
|---|---|---|
| Single tap | Announces element | Does NOT activate element |
| Double tap (anywhere) | Activates focused element | Alternative to direct tap |
| Single swipe right/left | Next/previous element | Does NOT scroll content |
| Two-finger swipe | Scroll content | Required for scrolling |
| Three-finger swipe | System navigation | Reserved gesture |
Per-Tool Adaptations¶
tapOn: Use ACTION_CLICK on the target element instead of coordinate-based tap. Optionally set accessibility focus first to mimic user behavior and trigger TalkBack announcement. Long press uses ACTION_LONG_CLICK.
swipeOn / scroll: Three approaches in priority order:
1. Accessibility scroll actions (preferred for known scrollable containers) - uses ACTION_SCROLL_FORWARD/ACTION_SCROLL_BACKWARD
2. Two-finger swipe (general-purpose scrolling) - dispatches parallel two-finger gesture via GestureDescription
3. Temporarily suspend TalkBack (advanced, avoid) - requires extra permissions
For scroll-until-visible (lookFor), clear accessibility focus before scrolling, use accessibility scroll actions in a loop, and optionally set focus on the target when found.
inputText / clearText: No change needed. Already uses ACTION_SET_TEXT, which TalkBack handles correctly.
pressButton: Hardware keycodes work the same. Back button may exit TalkBack local context menu instead of navigating back; use GLOBAL_ACTION_BACK to bypass when needed.
launchApp / terminateApp / installApp / startDevice / killDevice: No change needed. App lifecycle and device management are unaffected by TalkBack state.
Use Cases¶
Login Flow¶
Standard automation script works unchanged with TalkBack enabled:
await tapOn({ text: "Username" }); // Uses ACTION_CLICK (not coordinate tap)
await inputText({ text: "user@example.com" }); // Uses ACTION_SET_TEXT (works in both modes)
await tapOn({ text: "Password" });
await inputText({ text: "password123" });
await tapOn({ text: "Log in" }); // ACTION_CLICK on button
Edge case: If “Username” is a label (not the EditText), search logic checks nearby EditText with matching content-desc or hint.
List Scrolling¶
await swipeOn({
container: { elementId: "item_list" },
direction: "up",
lookFor: { text: "Item 50" },
// Internally uses ACTION_SCROLL_FORWARD or two-finger swipe
});
await tapOn({ text: "Item 50" }); // ACTION_CLICK
Scroll-until-visible detects list end by checking if hierarchy changes after scroll. Accessibility focus is cleared before each scroll to prevent focus-follow issues.
Implementation Strategy¶
- Phase 1: Detection infrastructure -
AccessibilityDetectorclass with caching, expose TalkBack state in observation results, feature flag override - Phase 2: Core tool adaptations -
tapOnusesACTION_CLICK,swipeOnuses two-finger swipe or scroll actions, multi-touch gesture support - Phase 3: Advanced features - accessibility focus tracking in observations, scroll-until-visible with focus management, optional explicit focus control tools
- Phase 4: Documentation and polish - user-facing docs, example scripts, performance benchmarks
iOS VoiceOver¶
iOS VoiceOver follows the same phased approach. Key differences:
| Aspect | Android TalkBack | iOS VoiceOver |
|---|---|---|
| Detection | AccessibilityManager / settings query |
UIAccessibility.isVoiceOverRunning |
| Scroll Gesture | Two-finger swipe | Three-finger swipe |
| Focus API | FOCUS_ACCESSIBILITY |
UIAccessibilityFocus |
| Rotor | No equivalent | Two-finger rotate for navigation modes |
iOS is secondary priority; initial focus is Android TalkBack validation.
Future Enhancement Ideas¶
- Explicit focus control tools:
setAccessibilityFocus,getAccessibilityFocus,navigateFocus - Announcement control: Trigger screen reader announcements for user testing
- Enhanced scroll-until-visible: Smart loop detection, bi-directional search, focus tracking
- Accessibility tree export: Full node hierarchy with actions for debugging
- Complex gesture simulation: TalkBack local/global context menus, rotor navigation
- Accessibility auditing: Combine TalkBack support with WCAG auditing
- iOS VoiceOver parity: Three-finger swipe, rotor, Magic Tap support