Driving TDD from Visual Recording: Writing 200+ Unit Tests for Undocumented Legacy Logic

The $3.6 trillion global technical debt crisis isn't just a financial burden; it’s an architectural graveyard of undocumented logic. Most enterprise modernization projects fail not because the new technology is too complex, but because the old technology is a "black box" that no one living understands. When 67% of legacy systems lack any form of functional documentation, developers are forced into a "guess and check" cycle that turns an 18-month average enterprise rewrite timeline into a multi-year disaster.

The traditional approach to Test-Driven Development (TDD) assumes you have a clear specification. But how do you write tests for a 20-year-old COBOL-backed insurance portal where the only "source of truth" is the behavior of the UI? The answer lies in driving from visual recording—a process where we reverse-engineer the hidden business logic from the interface itself.

TL;DR: Modernizing legacy systems is notoriously risky because of undocumented business logic. By driving from visual recording with Replay, architects can extract functional requirements directly from UI workflows. This approach reduces the time to create a component library from 40 hours per screen to just 4 hours, enabling the rapid generation of 200+ unit tests that ensure 100% parity with legacy behavior.

The Documentation Debt Trap: Why 70% of Rewrites Fail#

According to Replay's analysis, 70% of legacy rewrites fail or significantly exceed their timelines. This failure is rarely due to a lack of talent; it is a direct result of the "Documentation Gap." In a typical Tier-1 financial institution or healthcare provider, the original architects of the core systems have long since retired. The source code is often a spaghetti-like mess of side effects where a change in a "Premium Calculator" UI might trigger an unrelated database lock in the "Claims" module.

Industry experts recommend that before a single line of new code is written, a "functional freeze" and discovery phase must occur. However, manual discovery—where a developer sits with a business analyst to click through every possible permutation of a screen—takes roughly 40 hours per complex screen.

Video-to-code is the process of using computer vision and AI to transform a screen recording of a legacy application into functional, documented code and design tokens.

By driving from visual recording, we bypass the need for stale documentation. Instead of reading code that might not even be what's running in production, we observe the application's actual behavior. This provides a high-fidelity blueprint for TDD.

Learn more about Legacy Modernization Strategies

Driving from Visual Recording: A New TDD Paradigm#

Traditional TDD follows the Red-Green-Refactor cycle. But in legacy modernization, we introduce a precursor: Record-Extract-Assert.

1. Record: Capturing the Source of Truth#

Instead of starting with a blank Jira ticket, we start with a recording. A subject matter expert (SME) performs a standard workflow—for example, processing a complex mortgage application. This recording captures every state change, validation error, and edge case.

2. Extract: Visual Reverse Engineering#

This is where Replay transforms the workflow. Replay’s engine analyzes the recording to identify patterns, components, and logic flows. It doesn't just see "a red box"; it identifies a "Validation Message Component" with specific conditional logic.

3. Assert: Generating the Test Suite#

Once the logic is extracted into a modern React component, we can automatically generate Jest or Vitest suites. Because we have the visual recording as a reference, we can programmatically assert that the new React component behaves exactly like the legacy Delphi or PowerBuilder UI.

Feature	Manual Reverse Engineering	Driving from Visual Recording (Replay)
Time per Screen	40+ Hours	4 Hours
Logic Accuracy	Subjective / Human Error	High-Fidelity Extraction
Documentation	Hand-written (often skipped)	Auto-generated Blueprints
Test Coverage	Sparse / Manual	200+ Automated Unit Tests
Risk Profile	High (Logic Regressions)	Low (Parity Validated)

Technical Implementation: From Pixels to Vitest#

When we talk about driving from visual recording, we are specifically looking to isolate "Pure Logic" from "UI Side Effects." Let's look at a practical example. Imagine a legacy insurance premium calculator with complex, undocumented age-based weighting.

Step 1: The Extracted Logic#

After recording the UI, Replay identifies the underlying calculation logic. We might extract a hook that looks like this:

typescript
// Extracted from Legacy UI Recording via Replay
export const usePremiumCalculator = (age: number, coverageAmount: number, hasSmokerStatus: boolean) => {
  const calculateBaseRate = () => {
    // This logic was hidden in a 15-year-old DLL
    let rate = coverageAmount * 0.001;
    if (age > 50) rate *= 1.5;
    if (hasSmokerStatus) rate *= 2.2;
    return rate;
  };

  return {
    premium: calculateBaseRate(),
  };
};

Step 2: Driving the Test Suite from the Recording#

Because we have the recording, we know the exact inputs and outputs. We can now generate 200+ permutations of this test to ensure we've covered every edge case found in the legacy system.

typescript
import { renderHook } from '@testing-library/react';
import { usePremiumCalculator } from './usePremiumCalculator';

describe('Legacy Parity: Premium Calculation', () => {
  test('should match legacy output for high-risk elderly smoker', () => {
    const { result } = renderHook(() => usePremiumCalculator(65, 100000, true));
    
    // The expected value '330' was captured directly from the 
    // visual recording of the legacy system's output field.
    expect(result.current.premium).toBe(330);
  });

  test('should match legacy output for standard young non-smoker', () => {
    const { result } = renderHook(() => usePremiumCalculator(25, 100000, false));
    expect(result.current.premium).toBe(100);
  });

  // Replay's AI Automation Suite can generate 200+ variations 
  // based on the discovered boundary conditions.
});

Scaling to 200+ Unit Tests with AI Automation#

Writing 200 tests manually is a grind that most teams avoid, contributing to the $3.6 trillion technical debt. However, when driving from visual recording, the "recording" acts as a data seed. Replay's AI Automation Suite uses these seeds to perform "Boundary Value Analysis."

If the recording shows a user entering "65" into an age field, the AI understands this is a likely logic gate. It will then generate test cases for 64, 65, and 66 to ensure the "Senior" logic is perfectly replicated in the new React architecture.

The Component Library and Design System#

Beyond logic, the visual recording provides the foundation for your new Design System. Instead of a designer spending weeks in Figma trying to replicate legacy spacing and colors, Replay's Library feature extracts these tokens directly.

Visual Reverse Engineering is the practice of analyzing a software system's visual output to reconstruct its internal logic, data structures, and architectural patterns.

By combining the extracted Design System with the generated tests, you achieve what we call "Verified Modernization." You aren't just building a new app; you are building a proven clone of the old one with a modern, maintainable stack.

Discover the Visual Reverse Engineering Guide

Navigating Regulated Environments: SOC2, HIPAA, and On-Premise#

For industries like Financial Services, Healthcare, and Government, "sending recordings to the cloud" is often a non-starter. This is why the infrastructure behind driving from visual recording must be enterprise-grade.

Replay is built for these high-stakes environments. With SOC2 compliance and HIPAA-ready configurations, enterprise architects can deploy Replay on-premise or within a private VPC. This ensures that sensitive PII (Personally Identifiable Information) captured during a recording never leaves the secure perimeter.

According to Replay's analysis, the most successful modernization projects in regulated industries are those that use automated extraction to minimize the number of developers who need direct access to sensitive legacy source code. By providing developers with "Clean Room" React components and tests generated from recordings, the security risk is drastically reduced.

Architectural Patterns for Visual TDD#

When implementing a strategy of driving from visual recording, enterprise architects should follow the "Flow-Based Architecture" pattern. In Replay, "Flows" represent the end-to-end journey of a user.

The "Flow" Anatomy:#

•Input State: The data the user enters.
•Transition Logic: The "hidden" business rules (e.g., if age > 50).
•Output State: The resulting UI change or API call.

By mapping these flows, you create a living blueprint of the application. This blueprint becomes the "Spec" for your modern React application.

typescript
// Example of a "Flow-Based" Component generated by Replay
import React from 'react';
import { usePremiumCalculator } from './hooks/usePremiumCalculator';
import { PremiumDisplay } from './components/PremiumDisplay';

interface MortgageCalculatorProps {
  initialData: {
    age: number;
    amount: number;
    isSmoker: boolean;
  };
}

export const MortgageCalculator: React.FC<MortgageCalculatorProps> = ({ initialData }) => {
  const { premium } = usePremiumCalculator(
    initialData.age, 
    initialData.amount, 
    initialData.isSmoker
  );

  return (
    <div className="p-6 bg-slate-50 rounded-lg shadow-md">
      <h2 className="text-xl font-bold mb-4">Premium Summary</h2>
      <PremiumDisplay value={premium} currency="USD" />
      {/* 
          Logic for the 'High Risk' warning below was 
          extracted from the visual recording's conditional 
          rendering patterns.
      */}
      {initialData.age > 60 && (
        <p className="text-red-600 mt-2">Note: High-risk age bracket detected.</p>
      )}
    </div>
  );
};

The Economics of Visual TDD#

Let's look at the ROI. If an enterprise has 500 screens to modernize:

•Manual Approach: 500 screens * 40 hours/screen = 20,000 developer hours. At $100/hr, that's a $2,000,000 investment with a 70% chance of failure.
•Replay Approach: 500 screens * 4 hours/screen = 2,000 developer hours. That's a $200,000 investment with a 70% time savings.

The math is clear. Driving from visual recording isn't just a "nicety"—it's an economic imperative for any organization looking to shed its technical debt without bankrupting its R&D budget.

Frequently Asked Questions#

How does driving from visual recording handle dynamic data?#

Replay’s engine is designed to distinguish between static UI elements and dynamic data. During the extraction process, it identifies data patterns (like currency, dates, or user IDs) and replaces them with props or state variables in the generated React code. This allows the logic to remain functional even when the underlying data changes.

Can Replay extract logic from legacy technologies like Silverlight or Flash?#

Yes. Because driving from visual recording relies on the visual output and user interactions rather than the underlying source code, Replay is technology-agnostic. Whether the legacy system is written in Delphi, COBOL, Java Swing, or even obsolete web plugins, if it can be displayed on a screen and recorded, Replay can analyze it.

How accurate are the 200+ unit tests generated by the AI?#

The accuracy is rooted in the recording itself. The AI uses the recording as the "Ground Truth." According to Replay's analysis, the generated tests achieve over 95% parity with legacy behavior out of the box. Any discrepancies are usually flagged during the "Blueprints" phase, where developers can manually refine the logic before finalizing the component.

Does this replace the need for manual QA?#

While driving from visual recording drastically reduces the manual effort required for unit and integration testing, we recommend a final round of User Acceptance Testing (UAT). Replay handles the "Logic Parity" (Does it calculate the same way?), but UAT ensures the "User Experience" (Does it feel right to the user?) is optimized for the modern web.

Is the code generated by Replay maintainable?#

Absolutely. Unlike older "low-code" or "no-code" platforms that output "spaghetti" code, Replay generates clean, documented, and type-safe TypeScript/React code. It follows modern best practices, such as hooks for logic isolation and functional components for UI, making it easy for your engineering team to maintain and extend.

Final Thoughts: The End of the "Black Box"#

The era of fearing your legacy system is over. By driving from visual recording, you turn the most opaque parts of your infrastructure into transparent, documented, and fully tested modern code. You move from a state of "Technical Debt" to "Technical Wealth," where your components are assets rather than liabilities.

Don't let your modernization project become another statistic. Use the power of Visual Reverse Engineering to bridge the gap between where you are and where you need to be.

Ready to modernize without rewriting? Book a pilot with Replay

Driving TDD from Visual Recording: Writing 200+ Unit Tests for Undocumented Legacy Logic

Driving TDD from Visual Recording: Writing 200+ Unit Tests for Undocumented Legacy Logic

The Documentation Debt Trap: Why 70% of Rewrites Fail#

Driving from Visual Recording: A New TDD Paradigm#

1. Record: Capturing the Source of Truth#

2. Extract: Visual Reverse Engineering#

3. Assert: Generating the Test Suite#

Technical Implementation: From Pixels to Vitest#

Step 1: The Extracted Logic#

Step 2: Driving the Test Suite from the Recording#

Scaling to 200+ Unit Tests with AI Automation#

The Component Library and Design System#

Navigating Regulated Environments: SOC2, HIPAA, and On-Premise#

Architectural Patterns for Visual TDD#

The "Flow" Anatomy:#

The Economics of Visual TDD#

Frequently Asked Questions#

How does driving from visual recording handle dynamic data?#

Can Replay extract logic from legacy technologies like Silverlight or Flash?#

How accurate are the 200+ unit tests generated by the AI?#

Does this replace the need for manual QA?#

Is the code generated by Replay maintainable?#

Final Thoughts: The End of the "Black Box"#

Ready to try Replay?

Get articles like this in your inbox