Architecture Overview
UXpert uses a Stagehand-based architecture that runs multiple AI agents with different behavioral constraints to detect UX friction through behavioral divergence.
System Components
┌─────────────┐
│ CLI/Web UI │
│ (run.js) │
└──────┬──────┘
│ URL + Goal + Agents
▼
┌──────────────────────────────────────────────────────────┐
│ Agent Loop (per agent) │
│ ┌─────────┐ ┌───────────┐ ┌─────────┐ ┌────────┐ │
│ │Snapshot │──▶│DOM Filter │──▶│ Planner │──▶│Executor│ │
│ │(a11y │ │(allowlist │ │(Claude) │ │(act()) │ │
│ │ tree) │ │ by agent) │ │ │ │ │ │
│ └─────────┘ └───────────┘ └─────────┘ └────────┘ │
│ ▲ │ │
│ └───────────────────────────────────────────┘ │
│ repeat until done/stuck/budget │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────┐ ┌──────────────────┐
│ Trace Output │────▶│ Cross-Agent │
│ (JSON + │ │ Comparison │
│ Screenshots)│ │ (friction detect)│
└──────────────┘ └──────────────────┘Request Flow
- User Input: User provides URL, goal, and agent selection via CLI or web UI
- Browser Init: Stagehand (Playwright wrapper) launches a browser per agent
- Overlay Dismissal: Cookie banners, consent modals, and popups are dismissed
- Agent Loop (repeated until done/stuck/budget exhausted):
- Snapshot: Capture accessibility tree via
page.snapshot() - Filter: Apply agent-type DOM filter (scanner sees fewer elements)
- Plan: Claude Sonnet chooses next action from filtered element list
- Execute: Stagehand
act()via pre-resolved XPath (deterministic, no DOM leak) - Record: Save step to trace with screenshot
- Snapshot: Capture accessibility tree via
- Cross-Agent Comparison: Detect friction by comparing agent outcomes
- Output: JSON traces with screenshots saved to
traces/directory
Key Files
| File | Purpose |
|---|---|
stagehand-poc/run.js | CLI runner, orchestrates sequential agent execution |
stagehand-poc/server.js | Bun API server for web UI |
stagehand-poc/src/agent.js | Agent execution loop (runAgent) |
stagehand-poc/src/planner.js | LLM planning with Claude Sonnet |
stagehand-poc/src/dom-filter.js | Agent-type DOM filtering and microcopy heuristics |
stagehand-poc/src/overlays.js | Cookie/consent overlay dismissal |
stagehand-poc/frontend/ | React + Vite trace viewer |
Technology Stack
- Browser Automation: Stagehand (Playwright wrapper with accessibility tree snapshots)
- Planning Model: Claude Sonnet via Anthropic API
- DOM Filtering: Custom middleware layer with research-grounded heuristics
- Frontend: React + Vite
- Runtime: Bun
- API Server: Native Bun HTTP server
Agent Types
UXpert runs multiple agent types that differ in what they can see and how they behave:
| Agent | DOM Visibility | Behavior |
|---|---|---|
| Scanner | Limited: headings, links, buttons, landmarks, microcopy (short text, prices, numbers) | Simulates quick-skimming users who rely on obvious affordances |
| Reader | Full DOM access | Simulates careful users who read everything before acting |
| Explorer | Full DOM access | Simulates curious users who investigate side paths |
All agents share the same step budget (default: 15 steps).
Core Architecture Concepts
Constraint-Based Diversity
Agent diversity comes from mechanical constraints (DOM filtering, step budgets), not persona prompts. The planner LLM receives the same instructions for all agents—only the visible elements differ.
Allowlist Enforcement
The planner only sees elements that pass the agent's DOM filter. For Scanner:
- Always visible: interactive elements (links, buttons, inputs)
- Always visible: landmarks (headings, navigation, main)
- Filtered by heuristics: text nodes (microcopy only—short text, numbers, prices, goal-relevant keywords)
Pre-Resolved XPath Execution
Actions execute via pre-resolved XPath from the snapshot, not dynamic selectors. This ensures:
- No DOM leak during execution
- Deterministic action targeting
- Reliable failure diagnosis
Failure Diagnosis
When actions fail, diagnoseFailure() checks:
- Element existence (DOM shifted?)
- Visibility (hidden, zero-size?)
- Obstruction (blocked by overlay?)
- State (disabled, offscreen?)
Analysis Process
Friction Detection Signals
Divergent Outcomes: Scanner fails where Reader succeeds
- Indicates content/navigation not accessible to skimmers
Behavioral Patterns:
- Backtracking (navigating away then returning)
- Looping (same action repeated)
- Budget exhaustion (never reached goal)
- Getting stuck (agent declares no progress possible)
Failure Taxonomy:
- True ambiguity: UI genuinely unclear (multiple agents struggle)
- Execution failure: Overlay blocked click, element shifted
- Agent capability gap: Agent constraints too restrictive
Cross-Agent Comparison
After all agents run, outcomes are compared:
scanner | goal_achieved | 8 steps | 45/120 elements visible
reader | goal_achieved | 6 steps | 120/120 elements visibleKey signals:
- Scanner failed + Reader succeeded = friction for skimmers
- Both succeeded = flow works for different user types
- Both failed = fundamental usability issue