Architecture Overview

UXpert uses a Stagehand-based architecture that runs multiple AI agents with different behavioral constraints to detect UX friction through behavioral divergence.

System Components

┌─────────────┐
│  CLI/Web UI │
│  (run.js)   │
└──────┬──────┘
       │ URL + Goal + Agents
       ▼
┌──────────────────────────────────────────────────────────┐
│                    Agent Loop (per agent)                │
│  ┌─────────┐   ┌───────────┐   ┌─────────┐   ┌────────┐ │
│  │Snapshot │──▶│DOM Filter │──▶│ Planner │──▶│Executor│ │
│  │(a11y    │   │(allowlist │   │(Claude) │   │(act()) │ │
│  │ tree)   │   │ by agent) │   │         │   │        │ │
│  └─────────┘   └───────────┘   └─────────┘   └────────┘ │
│       ▲                                           │     │
│       └───────────────────────────────────────────┘     │
│                      repeat until done/stuck/budget     │
└──────────────────────────────────────────────────────────┘
       │
       ▼
┌──────────────┐     ┌──────────────────┐
│ Trace Output │────▶│ Cross-Agent      │
│ (JSON +      │     │ Comparison       │
│  Screenshots)│     │ (friction detect)│
└──────────────┘     └──────────────────┘

Request Flow

User Input: User provides URL, goal, and agent selection via CLI or web UI
Browser Init: Stagehand (Playwright wrapper) launches a browser per agent
Overlay Dismissal: Cookie banners, consent modals, and popups are dismissed
Agent Loop (repeated until done/stuck/budget exhausted):
- Snapshot: Capture accessibility tree via page.snapshot()
- Filter: Apply agent-type DOM filter (scanner sees fewer elements)
- Plan: Claude Sonnet chooses next action from filtered element list
- Execute: Stagehand act() via pre-resolved XPath (deterministic, no DOM leak)
- Record: Save step to trace with screenshot
Cross-Agent Comparison: Detect friction by comparing agent outcomes
Output: JSON traces with screenshots saved to traces/ directory

Key Files

File	Purpose
`stagehand-poc/run.js`	CLI runner, orchestrates sequential agent execution
`stagehand-poc/server.js`	Bun API server for web UI
`stagehand-poc/src/agent.js`	Agent execution loop (`runAgent`)
`stagehand-poc/src/planner.js`	LLM planning with Claude Sonnet
`stagehand-poc/src/dom-filter.js`	Agent-type DOM filtering and microcopy heuristics
`stagehand-poc/src/overlays.js`	Cookie/consent overlay dismissal
`stagehand-poc/frontend/`	React + Vite trace viewer

Technology Stack

Browser Automation: Stagehand (Playwright wrapper with accessibility tree snapshots)
Planning Model: Claude Sonnet via Anthropic API
DOM Filtering: Custom middleware layer with research-grounded heuristics
Frontend: React + Vite
Runtime: Bun
API Server: Native Bun HTTP server

Agent Types

UXpert runs multiple agent types that differ in what they can see and how they behave:

Agent	DOM Visibility	Behavior
Scanner	Limited: headings, links, buttons, landmarks, microcopy (short text, prices, numbers)	Simulates quick-skimming users who rely on obvious affordances
Reader	Full DOM access	Simulates careful users who read everything before acting
Explorer	Full DOM access	Simulates curious users who investigate side paths

All agents share the same step budget (default: 15 steps).

Core Architecture Concepts

Constraint-Based Diversity

Agent diversity comes from mechanical constraints (DOM filtering, step budgets), not persona prompts. The planner LLM receives the same instructions for all agents—only the visible elements differ.

Allowlist Enforcement

The planner only sees elements that pass the agent's DOM filter. For Scanner:

Always visible: interactive elements (links, buttons, inputs)
Always visible: landmarks (headings, navigation, main)
Filtered by heuristics: text nodes (microcopy only—short text, numbers, prices, goal-relevant keywords)

Pre-Resolved XPath Execution

Actions execute via pre-resolved XPath from the snapshot, not dynamic selectors. This ensures:

No DOM leak during execution
Deterministic action targeting
Reliable failure diagnosis

Failure Diagnosis

When actions fail, diagnoseFailure() checks:

Element existence (DOM shifted?)
Visibility (hidden, zero-size?)
Obstruction (blocked by overlay?)
State (disabled, offscreen?)

Analysis Process

Friction Detection Signals

Divergent Outcomes: Scanner fails where Reader succeeds
- Indicates content/navigation not accessible to skimmers
Behavioral Patterns:
- Backtracking (navigating away then returning)
- Looping (same action repeated)
- Budget exhaustion (never reached goal)
- Getting stuck (agent declares no progress possible)
Failure Taxonomy:
- True ambiguity: UI genuinely unclear (multiple agents struggle)
- Execution failure: Overlay blocked click, element shifted
- Agent capability gap: Agent constraints too restrictive

Cross-Agent Comparison

After all agents run, outcomes are compared:

scanner    | goal_achieved      | 8 steps  | 45/120 elements visible
reader     | goal_achieved      | 6 steps  | 120/120 elements visible

Key signals:

Scanner failed + Reader succeeded = friction for skimmers
Both succeeded = flow works for different user types
Both failed = fundamental usability issue

Architecture Overview ​

System Components ​

Request Flow ​

Key Files ​

Technology Stack ​

Agent Types ​

Core Architecture Concepts ​

Constraint-Based Diversity ​

Allowlist Enforcement ​

Pre-Resolved XPath Execution ​

Failure Diagnosis ​

Analysis Process ​

Friction Detection Signals ​

Cross-Agent Comparison ​