Skip to content

Architecture Overview

UXpert uses a Stagehand-based architecture that runs multiple AI agents with different behavioral constraints to detect UX friction through behavioral divergence.

System Components

┌─────────────┐
│  CLI/Web UI │
│  (run.js)   │
└──────┬──────┘
       │ URL + Goal + Agents

┌──────────────────────────────────────────────────────────┐
│                    Agent Loop (per agent)                │
│  ┌─────────┐   ┌───────────┐   ┌─────────┐   ┌────────┐ │
│  │Snapshot │──▶│DOM Filter │──▶│ Planner │──▶│Executor│ │
│  │(a11y    │   │(allowlist │   │(Claude) │   │(act()) │ │
│  │ tree)   │   │ by agent) │   │         │   │        │ │
│  └─────────┘   └───────────┘   └─────────┘   └────────┘ │
│       ▲                                           │     │
│       └───────────────────────────────────────────┘     │
│                      repeat until done/stuck/budget     │
└──────────────────────────────────────────────────────────┘


┌──────────────┐     ┌──────────────────┐
│ Trace Output │────▶│ Cross-Agent      │
│ (JSON +      │     │ Comparison       │
│  Screenshots)│     │ (friction detect)│
└──────────────┘     └──────────────────┘

Request Flow

  1. User Input: User provides URL, goal, and agent selection via CLI or web UI
  2. Browser Init: Stagehand (Playwright wrapper) launches a browser per agent
  3. Overlay Dismissal: Cookie banners, consent modals, and popups are dismissed
  4. Agent Loop (repeated until done/stuck/budget exhausted):
    • Snapshot: Capture accessibility tree via page.snapshot()
    • Filter: Apply agent-type DOM filter (scanner sees fewer elements)
    • Plan: Claude Sonnet chooses next action from filtered element list
    • Execute: Stagehand act() via pre-resolved XPath (deterministic, no DOM leak)
    • Record: Save step to trace with screenshot
  5. Cross-Agent Comparison: Detect friction by comparing agent outcomes
  6. Output: JSON traces with screenshots saved to traces/ directory

Key Files

FilePurpose
stagehand-poc/run.jsCLI runner, orchestrates sequential agent execution
stagehand-poc/server.jsBun API server for web UI
stagehand-poc/src/agent.jsAgent execution loop (runAgent)
stagehand-poc/src/planner.jsLLM planning with Claude Sonnet
stagehand-poc/src/dom-filter.jsAgent-type DOM filtering and microcopy heuristics
stagehand-poc/src/overlays.jsCookie/consent overlay dismissal
stagehand-poc/frontend/React + Vite trace viewer

Technology Stack

  • Browser Automation: Stagehand (Playwright wrapper with accessibility tree snapshots)
  • Planning Model: Claude Sonnet via Anthropic API
  • DOM Filtering: Custom middleware layer with research-grounded heuristics
  • Frontend: React + Vite
  • Runtime: Bun
  • API Server: Native Bun HTTP server

Agent Types

UXpert runs multiple agent types that differ in what they can see and how they behave:

AgentDOM VisibilityBehavior
ScannerLimited: headings, links, buttons, landmarks, microcopy (short text, prices, numbers)Simulates quick-skimming users who rely on obvious affordances
ReaderFull DOM accessSimulates careful users who read everything before acting
ExplorerFull DOM accessSimulates curious users who investigate side paths

All agents share the same step budget (default: 15 steps).

Core Architecture Concepts

Constraint-Based Diversity

Agent diversity comes from mechanical constraints (DOM filtering, step budgets), not persona prompts. The planner LLM receives the same instructions for all agents—only the visible elements differ.

Allowlist Enforcement

The planner only sees elements that pass the agent's DOM filter. For Scanner:

  • Always visible: interactive elements (links, buttons, inputs)
  • Always visible: landmarks (headings, navigation, main)
  • Filtered by heuristics: text nodes (microcopy only—short text, numbers, prices, goal-relevant keywords)

Pre-Resolved XPath Execution

Actions execute via pre-resolved XPath from the snapshot, not dynamic selectors. This ensures:

  • No DOM leak during execution
  • Deterministic action targeting
  • Reliable failure diagnosis

Failure Diagnosis

When actions fail, diagnoseFailure() checks:

  • Element existence (DOM shifted?)
  • Visibility (hidden, zero-size?)
  • Obstruction (blocked by overlay?)
  • State (disabled, offscreen?)

Analysis Process

Friction Detection Signals

  1. Divergent Outcomes: Scanner fails where Reader succeeds

    • Indicates content/navigation not accessible to skimmers
  2. Behavioral Patterns:

    • Backtracking (navigating away then returning)
    • Looping (same action repeated)
    • Budget exhaustion (never reached goal)
    • Getting stuck (agent declares no progress possible)
  3. Failure Taxonomy:

    • True ambiguity: UI genuinely unclear (multiple agents struggle)
    • Execution failure: Overlay blocked click, element shifted
    • Agent capability gap: Agent constraints too restrictive

Cross-Agent Comparison

After all agents run, outcomes are compared:

scanner    | goal_achieved      | 8 steps  | 45/120 elements visible
reader     | goal_achieved      | 6 steps  | 120/120 elements visible

Key signals:

  • Scanner failed + Reader succeeded = friction for skimmers
  • Both succeeded = flow works for different user types
  • Both failed = fundamental usability issue

Last updated: