Case Study

I rebuilt the Y Combinator website from a screen recording

An experiment in behavior-driven UI reconstruction: treating video as the source of truth instead of screenshots, text prompts, or design files.

December 2024 · 8 min read

Watch the Y Combinator Rebuild

0:00 / 0:00

Y Combinator Demo

Full screen recording → working frontend in under 60 seconds

The Problem with Existing Approaches

Most existing tools try to generate UI from screenshots or text prompts. The problem is that screenshots capture appearance, but not behavior.

Interfaces exist in time: navigation, state changes, transitions, and interaction patterns are invisible in a single frame. A screenshot of Y Combinator's homepage tells you nothing about:

How the navigation works
What happens when you click "Startup Jobs"
How many pages exist and how they connect
The interaction states (hover, active, loading)

The Approach: Video as Source of Truth

So I built a prototype that treats video as the source of truth.

As an experiment, I recorded a short walkthrough of the Y Combinator website and rebuilt the frontend purely from the screen recording. The system analyzes UI behavior over time—layout hierarchy, navigation flow, interaction states—and reconstructs a working, responsive frontend that matches what was actually shown.

Key Constraints

If something isn't shown in the video, it's not generated
No guessing, no invented screens or logic
The output reflects observed behavior, not assumptions

What Worked Better Than Expected

Navigation Structure

Page relationships and navigation flow were accurately captured from watching clicks and page transitions.

State Changes

Interaction patterns—hover states, active elements, transitions—were preserved from temporal analysis.

Layout Fidelity

The original visual flow was maintained without needing written specifications or design files.

Behavior-Driven vs. Screenshot-Based

Capability	Screenshot Tools	Video Analysis
Visual layout	✓	✓
Multi-page apps	✗	✓
Navigation flow	✗	✓
Interaction states	✗	✓
State transitions	✗	✓
Form validation	✗	✓
Backend logic	✗	✗
Hidden screens	✗	✗

What Doesn't Work (Yet)

This approach has clear boundaries:

Backend logic
The system generates frontend code. API integrations, databases, and server logic are not inferred.
Hidden states not demonstrated
If you don't show the error state in the video, it won't be generated.
Data relationships that never appear on screen
The video is the contract. Unseen features don't exist in the output.

The Bigger Picture

This experiment made me think that behavior-driven UI reconstruction might be a more reliable abstraction than screenshot-to-code or prompt-based generation, especially for:

Legacy systems without documentation

Competitor analysis and research

Rapid prototyping from references

Rebuilding undocumented products

Design system extraction

Quality assurance testing

I'm curious whether others have explored similar approaches, or see clear limitations I'm missing.

How It Works (Technical)

Video Input

Upload a screen recording of any UI. The system accepts standard video formats (MP4, WebM, MOV).

Temporal Analysis

The AI processes the video frame-by-frame, building a temporal model of UI changes, click events, and state transitions.

Structure Extraction

Layout hierarchy, component relationships, and navigation patterns are identified from observed behavior.

Code Generation

Working HTML/CSS/JS code is generated that reproduces the observed behavior with responsive design and interactions.

Try It Yourself

Replay implements behavior-driven UI reconstruction. Upload your own video and see what gets generated.

Try Replay Free