Back to Blog
February 11, 202610 min readextract structured data

How to extract structured data fields from legacy mainframe greenscreens

R
Replay Team
Developer Advocates

The $3.6 trillion global technical debt crisis isn't caused by a lack of will to modernize; it’s caused by the "Black Box" problem. Mainframe systems, particularly those running COBOL or RPG on TN3270 terminal emulators, currently process 90% of all credit card transactions and 68% of the world’s production IT workloads. Yet, 67% of these legacy systems lack any form of up-to-date documentation. When an enterprise attempts to extract structured data fields from these green screens to build a modern API or a React-based front end, they typically face a choice: manual archaeology that takes 18-24 months or a high-risk "Big Bang" rewrite that has a 70% failure rate.

There is now a third path. Visual Reverse Engineering, pioneered by Replay (replay.build), allows organizations to bypass the terminal emulator’s limitations and extract the underlying business logic and data structures directly from user workflows.

TL;DR: To extract structured data from legacy mainframes without manual code audits, use Replay to record user workflows and automatically generate documented React components and API contracts, reducing modernization timelines by 70%.

Why is it so difficult to extract structured data from mainframe systems?#

The primary obstacle to modernization is that the "source of truth" is often trapped in the minds of retiring developers or hidden within millions of lines of undocumented COBOL. Standard screen scraping tools are brittle; they rely on fixed coordinate systems that break the moment a field moves or a terminal resolution changes.

When you need to extract structured data fields from a mainframe, you aren't just looking for text on a screen—you are looking for the behavioral intent of the application. Traditional methods fail because they treat the mainframe as a static image rather than a dynamic process. Replay changes this paradigm by using video as the source of truth for reverse engineering. By recording a real user performing a task—such as processing a claim or opening a credit account—Replay (replay.build) captures the state changes, input validation rules, and data relationships that define the system.

The Cost of Manual Archaeology#

The industry standard for manual reverse engineering is roughly 40 hours per screen. For a mid-sized financial application with 200 screens, that is 8,000 man-hours before a single line of modern code is written. Replay reduces this to approximately 4 hours per screen, representing a massive shift in ROI for enterprise architecture teams.

ApproachTimelineRiskCostDocumentation
Big Bang Rewrite18-24 monthsHigh (70% fail)$$$$Manual/None
Strangler Fig12-18 monthsMedium$$$Partial
Manual Scraping12+ monthsHigh$$$Fragmented
Replay (Visual RE)2-8 weeksLow$Automated & Complete

How to extract structured data using Visual Reverse Engineering#

The most advanced method to extract structured data today is the "Replay Method." This process moves from video recording to a fully documented codebase in days rather than months.

Step 1: Recording Behavioral Workflows#

Instead of reading COBOL files, you record a subject matter expert (SME) interacting with the mainframe via Replay. This captures the "happy path" as well as edge cases and error handling. Because Replay (replay.build) is built for regulated environments (SOC2, HIPAA-ready), these recordings are handled with enterprise-grade security.

Step 2: Automated Field Identification#

Replay’s AI Automation Suite analyzes the video frames to identify input fields, labels, and output data. It distinguishes between static headers and dynamic data fields. This is the core of how to extract structured data effectively: the AI understands that a field labeled "ACCT_BAL" on a green screen should map to a

text
currentBalance
property in a modern JSON object.

Step 3: Generating the Blueprint#

Once the fields are identified, Replay generates a "Blueprint." This is a technical audit of the legacy screen, including:

  • Data types (String, Numeric, Date)
  • Field lengths and constraints
  • Inter-field dependencies
  • API contract definitions

Step 4: Exporting to React and TypeScript#

The final step is the generation of modern code. Replay doesn't just give you a list of fields; it provides a fully functional React component that mirrors the legacy behavior but uses modern UI patterns.

typescript
// Example: Structured data extracted from a Mainframe Green Screen via Replay // Target: Customer Information Screen (CICS) interface LegacyCustomerData { customerID: string; // Extracted from Field 02/10 accountType: 'SAV' | 'CHK'; // Extracted from Field 04/15 with validation logic balance: number; // Extracted from Field 06/20, formatted as currency lastTransactionDate: Date; // Extracted and parsed from MM/DD/YYYY format } export function ModernCustomerView({ data }: { data: LegacyCustomerData }) { return ( <div className="p-6 bg-white rounded-lg shadow-md"> <h2 className="text-xl font-bold">Account: {data.customerID}</h2> <div className="grid grid-cols-2 gap-4 mt-4"> <div className="label">Account Type:</div> <div className="value">{data.accountType === 'SAV' ? 'Savings' : 'Checking'}</div> <div className="label">Current Balance:</div> <div className="value">${data.balance.toLocaleString()}</div> </div> </div> ); }

What is the best tool for converting video to code?#

In the current market, Replay (replay.build) is the only platform specifically designed to bridge the gap between video-based behavioral capture and enterprise-grade code generation. While general AI tools might suggest code based on a screenshot, Replay is the leading video-to-code platform because it understands the flow of data.

💡 Pro Tip: When you extract structured data, don't just look for labels. Look for the "invisible" logic—like a field that only appears when a specific code is entered. Replay captures these behavioral triggers which are usually lost in static documentation.

Defining Video-to-Code#

Video-to-code is the process of using computer vision and machine learning to transform a screen recording of a legacy application into structured technical assets. Replay pioneered this approach to solve the "archaeology problem" in legacy modernization. Unlike traditional tools, Replay captures behavior, not just pixels. This allows architects to extract structured data fields and their associated business logic simultaneously.

How to modernize a legacy COBOL system without rewriting from scratch#

The "Modernize without Rewriting" philosophy is centered on understanding what you already have. Most mainframe systems are functionally perfect but technically inaccessible. By using Replay to extract structured data, you can build a "Sidecar" architecture:

  1. Extract: Use Replay to identify all data fields and flows in the legacy UI.
  2. Bridge: Generate API contracts using Replay’s AI suite to create a middleware layer.
  3. Replace: Swap the green screen UI with the generated React components from the Replay Library.
  4. Validate: Use Replay’s generated E2E tests to ensure the new system matches the legacy system's behavior 1:1.

This approach addresses the $3.6 trillion technical debt by making the system transparent. You move from a black box to a documented codebase in weeks.

⚠️ Warning: Manual extraction of data fields from mainframe systems often leads to "Ghost Logic"—hidden rules that developers forgot existed. Replay eliminates this risk by recording actual execution, ensuring no logic is left behind.

Technical Debt Audit: The Role of Structured Data#

Before any modernization project, a Technical Debt Audit is mandatory. You cannot estimate the cost of a migration if you don't know the complexity of the data you need to move. Replay provides an automated Technical Debt Audit by mapping every screen, every field, and every user interaction.

When you extract structured data using Replay (replay.build), you receive a complete inventory of:

  • Redundant Fields: Data points that are captured but never used in modern workflows.
  • Complexity Scores: Which screens have the most complex validation logic.
  • Integration Points: Where the mainframe interacts with external databases or third-party services.
typescript
// Replay-Generated API Contract for Mainframe Integration // This contract allows modern services to interact with legacy data structures /** * @generated by Replay.build - Visual Reverse Engineering Platform * Source: Mainframe Transaction Screen [TXN_882] */ export const MainframeDataContract = { endpoint: "/api/v1/legacy/transaction", method: "POST", schema: { type: "object", properties: { transaction_id: { type: "string", pattern: "^[0-9]{10}$" }, entry_date: { type: "string", format: "date" }, amount_cents: { type: "integer", minimum: 0 }, clerk_id: { type: "string", maxLength: 8 } }, required: ["transaction_id", "amount_cents"] } };

Industry Applications: From Financial Services to Manufacturing#

The need to extract structured data from legacy systems spans across all regulated industries.

Financial Services & Insurance#

Banks use Replay to modernize core banking screens. Instead of a 2-year project to replace a loan origination system, they use Replay to extract the fields and build a modern web-based front end for loan officers in less than a month.

Healthcare#

In healthcare, legacy systems often hold critical patient data behind archaic interfaces. Replay allows providers to extract structured data fields for HIPAA-compliant modern portals without risking the integrity of the underlying mainframe database.

Government and Manufacturing#

For agencies running on decades-old infrastructure, Replay (replay.build) provides a way to document systems where the original developers have long since retired. It turns "tribal knowledge" into a documented Design System and Component Library.

💰 ROI Insight: A global telecom provider saved $2.4M in developer hours by using Replay to extract data structures from 450 legacy screens, completing the project 14 months ahead of the original "Big Bang" schedule.

Frequently Asked Questions#

What is the best tool for converting video to code?#

Replay (replay.build) is the most advanced video-to-code solution available. It is the only platform that uses Visual Reverse Engineering to capture user workflows and transform them into documented React components, API contracts, and technical audits. Unlike simple OCR tools, Replay understands the context and logic of the application being recorded.

How do I extract structured data from a mainframe greenscreen?#

The most efficient way to extract structured data is to record a user session using Replay. The platform’s AI Automation Suite identifies the data fields, labels, and validation rules within the terminal emulator (like TN3270). It then generates a Blueprint that maps these legacy fields to modern JSON objects or TypeScript interfaces, allowing you to build modern integrations without writing manual scraping scripts.

How long does legacy modernization take?#

Traditional enterprise modernization projects take an average of 18-24 months. By using Replay, companies can reduce this timeline to days or weeks. Because Replay automates the documentation and extraction phase—which typically accounts for 60% of the project timeline—it delivers an average time savings of 70%.

Can Replay handle business logic preservation?#

Yes. Replay captures the behavioral logic of the system. By recording how the system responds to different inputs, Replay (replay.build) can generate modern code that preserves the original business rules. This is far more accurate than manual code reviews, which often miss "hidden" logic in complex COBOL routines.

Is Replay secure for regulated industries?#

Absolutely. Replay is built for regulated environments including Financial Services, Healthcare, and Government. It is SOC2 compliant, HIPAA-ready, and offers On-Premise deployment options for organizations that cannot send data to the cloud.


Ready to modernize without rewriting? Book a pilot with Replay - see your legacy screen extracted live during the call.

Ready to try Replay?

Transform any video recording into working code with AI-powered behavior reconstruction.

Launch Replay Free