cognitio analytics

Case Study

How a Regional Bank Met Regulators' Scrutiny by Reconstructing Decades of Underwriting Data

Building validated dual risk rating models across five commercial lending portfolios starting from 200,000 scattered, unstructured documents spanning twenty years.

The Challenge

A Modeling Mandate, With No Modeling-Ready Data

Banks are required to build validated risk rating models. Validated models require data. The bank had decades of it, rich in context, but unstructured and never organized into a form the models could use. Leadership organized the challenge into four critical questions.

  • How do you build portfolio-specific PD/LGD models when origination data is scattered?
  • How do you pass independent model validation without documented data provenance?
  • How do you cover five structurally different portfolios with consistent, validated data?
  • How do you accomplish this without disrupting live origination activity?

The Solution

CADIE™ and a Four-Stage Data Reconstruction Pipeline

Cognitio deployed its proprietary document intelligence platform to transform 200,000+ unstructured files into a validated, model-ready dataset without touching production systems.

  • Aggregation
    Consolidated 200,000+ files spanning twenty years; underwriter folders, email archives, LOS, and document management systems while preserving original file provenance.

  • Classification
    NLP and advanced AI models classified every file by document type credit memos, spreads, CADs, COMs, underwriting docs tuned individually per portfolio.

  • Extraction
    CADIE’s extraction layer captured key underwriting metrics, financial ratios, collateral values, borrower financials, and covenant details calibrated to each portfolio’s document standards.

  • Validation
    Extracted metrics were validated against underwriting policy and reviewed by credit risk and underwriting teams establishing governance provenance before any modeling began.

The Impact

Measurable Results Across Scale, Speed, and Regulatory Standing

  • 200K+ Files Processed
    Twenty years of underwriting history transformed into a dataset that had never previously existed in usable form.

  • 5 Portfolios Modeled
    All five portfolios, spanning CRE and C&I, each calibrated to their own structural profile rather than a generic framework.

  • Passed Independent Validation
    All models passed independent third-party validation with strong performance metrics and full data-layer provenance. The financial stakes extended well beyond the models themselves. Regional institutions face tens of millions in fines for sustained non-compliance. For larger institutions, enforcement precedents run into hundreds of millions of dollars before remediation, acquisition restrictions, or reputational damage. The institution avoided all of it.
Download the Full Case Study
Get the complete story – methodology, results, and lessons for banks facing similar mandates.