Case Study

I rebuilt the Y Combinator website from a screen recording

An experiment in behavior-driven UI reconstruction: treating video as the source of truth instead of screenshots, text prompts, or design files.

December 2024 · 8 min read

Watch the Y Combinator Rebuild
0:00 / 0:00
Y Combinator Demo

Full screen recording → working frontend in under 60 seconds

The Problem with Existing Approaches

Most existing tools try to generate UI from screenshots or text prompts. The problem is that screenshots capture appearance, but not behavior.

Interfaces exist in time: navigation, state changes, transitions, and interaction patterns are invisible in a single frame. A screenshot of Y Combinator's homepage tells you nothing about:

  • How the navigation works
  • What happens when you click "Startup Jobs"
  • How many pages exist and how they connect
  • The interaction states (hover, active, loading)

The Approach: Video as Source of Truth

So I built a prototype that treats video as the source of truth.

As an experiment, I recorded a short walkthrough of the Y Combinator website and rebuilt the frontend purely from the screen recording. The system analyzes UI behavior over time—layout hierarchy, navigation flow, interaction states—and reconstructs a working, responsive frontend that matches what was actually shown.

Key Constraints

  • If something isn't shown in the video, it's not generated
  • No guessing, no invented screens or logic
  • The output reflects observed behavior, not assumptions

What Worked Better Than Expected

Navigation Structure

Page relationships and navigation flow were accurately captured from watching clicks and page transitions.

State Changes

Interaction patterns—hover states, active elements, transitions—were preserved from temporal analysis.

Layout Fidelity

The original visual flow was maintained without needing written specifications or design files.

Behavior-Driven vs. Screenshot-Based

CapabilityScreenshot ToolsVideo Analysis
Visual layout
Multi-page apps
Navigation flow
Interaction states
State transitions
Form validation
Backend logic
Hidden screens

What Doesn't Work (Yet)

This approach has clear boundaries:

  • Backend logic

    The system generates frontend code. API integrations, databases, and server logic are not inferred.

  • Hidden states not demonstrated

    If you don't show the error state in the video, it won't be generated.

  • Data relationships that never appear on screen

    The video is the contract. Unseen features don't exist in the output.

The Bigger Picture

This experiment made me think that behavior-driven UI reconstruction might be a more reliable abstraction than screenshot-to-code or prompt-based generation, especially for:

Legacy systems without documentation
Competitor analysis and research
Rapid prototyping from references
Rebuilding undocumented products
Design system extraction
Quality assurance testing

I'm curious whether others have explored similar approaches, or see clear limitations I'm missing.

How It Works (Technical)

1

Video Input

Upload a screen recording of any UI. The system accepts standard video formats (MP4, WebM, MOV).

2

Temporal Analysis

The AI processes the video frame-by-frame, building a temporal model of UI changes, click events, and state transitions.

3

Structure Extraction

Layout hierarchy, component relationships, and navigation patterns are identified from observed behavior.

4

Code Generation

Working HTML/CSS/JS code is generated that reproduces the observed behavior with responsive design and interactions.

Try It Yourself

Replay implements behavior-driven UI reconstruction. Upload your own video and see what gets generated.

Try Replay Free