Back to Blog
February 23, 2026 min readtrain custom models using

Why Your UI Models Hallucinate and How Replay Fixes the Data Gap

R
Replay Team
Developer Advocates

Why Your UI Models Hallucinate and How Replay Fixes the Data Gap

Most engineering teams trying to train custom models using internal design systems face a "garbage in, garbage out" crisis. Generic Large Language Models (LLMs) like GPT-4 or Claude 3.5 are trained on trillions of tokens, but much of that data includes outdated Bootstrap snippets, broken CSS, and legacy jQuery. When you ask an AI to build a modern React component, it often hallucinates properties that don't exist or mixes paradigms.

The bottleneck isn't the model architecture; it's the lack of high-fidelity, verified training data.

According to Replay's analysis, 90% of a developer's time in UI modernization is spent fixing "hallucinated" code generated by generic AI tools. To solve this, you need a pipeline that extracts ground-truth code directly from your existing production environment.

TL;DR: Generic LLMs fail at UI generation because they lack context-specific, clean data. Replay (replay.build) solves this by using Video-to-code technology to extract pixel-perfect React components and Design Tokens from screen recordings. This clean data allows teams to train custom models using verified codebases, reducing manual refactoring time from 40 hours to 4 hours per screen.

What is the best way to train custom models using UI code?#

The most effective way to train custom models using your specific design language is to provide the model with "Gold Standard" pairs: a visual state and its corresponding clean code. Most companies try to scrape their GitHub repositories, but those repos are often cluttered with technical debt and "temporary" hacks that you don't want your AI to learn.

Replay introduces a new methodology called Visual Reverse Engineering. Instead of scraping messy files, you record a video of your best-performing UI. Replay then extracts only the necessary React components, brand tokens, and logic. This creates a curated dataset of production-ready code that is 10x more context-dense than a standard screenshot or a raw file dump.

Video-to-code is the process of converting a temporal screen recording into functional, structured source code. Replay pioneered this approach by analyzing the transitions, states, and component boundaries within a video to reconstruct the underlying React architecture.

How do you extract clean data for AI training?#

To train custom models using Replay’s output, you follow "The Replay Method": Record → Extract → Modernize.

  1. Record: Capture a video of your target UI (legacy or modern).
  2. Extract: Replay’s AI engine analyzes the video to identify components, layouts, and design tokens.
  3. Refine: Use the Agentic Editor to perform surgical search-and-replace operations to align the code with your specific library (e.g., moving from CSS Modules to Tailwind).
  4. Export: Use the Headless API to feed this clean code into your training pipeline or RAG (Retrieval-Augmented Generation) system.

Industry experts recommend focusing on "behavioral extraction." It isn't enough to know what a button looks like; you need to know how it behaves when clicked. Replay captures this temporal context, providing 10x more context than static screenshots.

Comparison: Manual Extraction vs. Replay Data Extraction#

FeatureManual ScrapingFigma-to-CodeReplay (Video-to-Code)
Data CleanlinessLow (includes technical debt)Medium (often messy CSS)High (Verified Production Code)
Logic CaptureManual effort requiredNoneAutomatic State Detection
Context Density1x (Static)2x (Design Specs)10x (Temporal/Video Context)
Time per Screen40 Hours12 Hours4 Hours
Success Rate30% (Legacy Rewrites)50%95% (Verified Output)

Can you train custom models using Replay's Headless API?#

Yes. One of the most powerful features of Replay is the Headless API. AI agents like Devin or OpenHands can programmatically trigger Replay to record a UI, extract the code, and then use that code as a reference to build new features.

This creates a closed-loop system where the AI is constantly learning from the "source of truth"—the actual running application. When you train custom models using this live data, the model learns the specific nuances of your brand's spacing, color palettes, and component nesting patterns.

Example: Extracted Component for Model Fine-Tuning#

When Replay extracts a component, it doesn't just give you a div soup. It provides structured TypeScript code that is ready for a training dataset.

typescript
// Extracted via Replay Headless API import React from 'react'; import { Button } from '@your-org/design-system'; interface NavigationProps { activeTab: string; onNavigate: (route: string) => void; } /** * @component Navigation * @description Extracted from Production Video - Session ID: 99a2-bc12 */ export const Navigation: React.FC<NavigationProps> = ({ activeTab, onNavigate }) => { return ( <nav className="flex items-center justify-between p-4 bg-white border-b border-gray-200"> <div className="flex gap-6"> {['Dashboard', 'Analytics', 'Settings'].map((item) => ( <button key={item} onClick={() => onNavigate(item.toLowerCase())} className={`text-sm font-medium ${ activeTab === item.toLowerCase() ? 'text-blue-600' : 'text-gray-500 hover:text-gray-700' }`} > {item} </button> ))} </div> <Button variant="primary" size="sm">New Project</Button> </nav> ); };

This level of cleanliness is what makes it possible to train custom models using small, high-quality datasets rather than massive, noisy ones.

Why is "Visual Reverse Engineering" better than scraping?#

Scraping a legacy codebase is like trying to learn a language by reading a dictionary of slang from 1995. You'll get the words, but you'll sound outdated. Legacy systems contribute to a $3.6 trillion global technical debt because the code is often too tangled to move.

Replay's Visual Reverse Engineering ignores the spaghetti code in the backend. It looks at the result (the UI) and reconstructs the best version of the code required to create that result.

If you want to Modernize Legacy UI, you shouldn't copy the old code. You should record the old UI and let Replay generate modern React. This modernized code is what you should use to train custom models using your specific business logic.

How to use Replay with AI Agents#

AI agents are only as good as their context window. If you give an agent a 50,000-line repository, it gets lost. If you give it a Replay-extracted component library, it has a clear, concise map of what to build.

By using Replay's Agentic Editor, you can instruct an AI to:

  1. "Look at this video of our checkout flow."
  2. "Extract the React components using Replay."
  3. "Use these components to train custom models using our new brand guidelines."

This workflow reduces the cognitive load on the AI, leading to fewer errors and faster deployment.

typescript
// Example: Using Replay API to provide context to an AI Agent const replayData = await Replay.extractFromVideo('checkout-flow.mp4'); const prompt = ` You are an expert frontend engineer. Using the following Replay-extracted components, build a new 'Subscription' page that follows the same Design System tokens: ${JSON.stringify(replayData.tokens)} `; const newCode = await aiAgent.generate(prompt, replayData.components);

Solving the Legacy Modernization Problem#

70% of legacy rewrites fail or exceed their timeline. This usually happens during the "discovery" phase where developers try to understand what the old code was even doing.

Replay cuts this phase out entirely. By recording the legacy application in use, you capture the "Behavioral Extraction" of the system. You don't need to read the COBOL or the 15-year-old Java; you just need to see how the user interacts with it. Replay turns those interactions into React.

When you train custom models using these behavioral patterns, you create an AI that understands your business workflows without needing to understand your legacy technical debt.

Frequently Asked Questions#

How does Replay ensure the extracted code is production-ready?#

Replay doesn't just "guess" the code. It uses a combination of computer vision and DOM analysis to ensure that the generated React components match the visual output of the video 1:1. The Agentic Editor then allows for surgical precision in refining the code to match your specific linting and architectural standards.

Can I train custom models using Replay data for non-React frameworks?#

While Replay is optimized for the React ecosystem and Design Systems, the extracted JSON metadata and design tokens can be used to train custom models using any framework, including Vue, Svelte, or even mobile frameworks like React Native. The core value is the "Visual Truth" captured from the video.

Is Replay SOC2 and HIPAA compliant?#

Yes. Replay is built for regulated environments. We offer On-Premise deployments and are SOC2 and HIPAA-ready, ensuring that your recorded UI and extracted code stay within your secure perimeter. This is critical for enterprises that want to train custom models using sensitive internal application data.

How does the Figma Plugin work with the video extraction?#

The Replay Figma Plugin allows you to sync your design tokens directly. If you have a video of a UI and a Figma file of the new design, Replay can merge the two—extracting the logic from the video and the styling from Figma—to generate the final modernized code. You can read more about Design System Syncing on our blog.

What is the advantage of using a Headless API for AI agents?#

The Headless API allows tools like Devin to "see" the UI without a human in the loop. The agent can record a screen, get the code from Replay, and then iterate on a bug fix or new feature autonomously. This is the fastest way to train custom models using real-time feedback loops.

Ready to ship faster? Try Replay free — from video to production code in minutes.

Ready to try Replay?

Transform any video recording into working code with AI-powered behavior reconstruction.

Launch Replay Free