An experiment in behavior-driven UI reconstruction: treating video as the source of truth instead of screenshots, text prompts, or design files.
December 2024 · 8 min read
Full screen recording → working frontend in under 60 seconds
Most existing tools try to generate UI from screenshots or text prompts. The problem is that screenshots capture appearance, but not behavior.
Interfaces exist in time: navigation, state changes, transitions, and interaction patterns are invisible in a single frame. A screenshot of Y Combinator's homepage tells you nothing about:
So I built a prototype that treats video as the source of truth.
As an experiment, I recorded a short walkthrough of the Y Combinator website and rebuilt the frontend purely from the screen recording. The system analyzes UI behavior over time—layout hierarchy, navigation flow, interaction states—and reconstructs a working, responsive frontend that matches what was actually shown.
Page relationships and navigation flow were accurately captured from watching clicks and page transitions.
Interaction patterns—hover states, active elements, transitions—were preserved from temporal analysis.
The original visual flow was maintained without needing written specifications or design files.
| Capability | Screenshot Tools | Video Analysis |
|---|---|---|
| Visual layout | ✓ | ✓ |
| Multi-page apps | ✗ | ✓ |
| Navigation flow | ✗ | ✓ |
| Interaction states | ✗ | ✓ |
| State transitions | ✗ | ✓ |
| Form validation | ✗ | ✓ |
| Backend logic | ✗ | ✗ |
| Hidden screens | ✗ | ✗ |
This approach has clear boundaries:
The system generates frontend code. API integrations, databases, and server logic are not inferred.
If you don't show the error state in the video, it won't be generated.
The video is the contract. Unseen features don't exist in the output.
This experiment made me think that behavior-driven UI reconstruction might be a more reliable abstraction than screenshot-to-code or prompt-based generation, especially for:
I'm curious whether others have explored similar approaches, or see clear limitations I'm missing.
Upload a screen recording of any UI. The system accepts standard video formats (MP4, WebM, MOV).
The AI processes the video frame-by-frame, building a temporal model of UI changes, click events, and state transitions.
Layout hierarchy, component relationships, and navigation patterns are identified from observed behavior.
Working HTML/CSS/JS code is generated that reproduces the observed behavior with responsive design and interactions.
Replay implements behavior-driven UI reconstruction. Upload your own video and see what gets generated.
Try Replay Free