In the 2026 fintech landscape, mortgage document automation is no longer an optional competitive advantage, but a critical infrastructure requirement. Manual management of income documentation represents the main bottleneck in credit granting, with underwriting times that can extend for weeks due to data entry errors and redundant human validations. At the heart of this operational revolution lies Intelligent Document Processing (IDP), the technological entity that orchestrates the transformation of unstructured data (PDFs, scans, images) into structured and actionable information via API.
This technical guide explores the design of an end-to-end cloud-native pipeline for analyzing pay slips, CUD models (Single Certification), and 730 tax returns, comparing the capabilities of AWS Textract and Google Document AI in the specific context of the Italian tax system.
1. The Challenge of Italian Formats: Beyond Traditional OCR
Traditional OCR (Optical Character Recognition) fails miserably with Italian income documentation for three main reasons:
- Layout Variability: While the CUD (Single Certification) has a standardized format from the Revenue Agency, pay slips vary drastically depending on the payroll software used (Zucchetti, TeamSystem, ADP, etc.).
- Document Quality: Crooked scans, low-resolution smartphone photos, and crumpled documents introduce noise that legacy engines cannot filter out.
- Complex Semantics: Extracting the number “25.000” is useless if the system does not distinguish between “Gross Income”, “Social Security Taxable Income”, or “Net Income”.
To solve this problem, we must implement a pipeline that combines neural OCR with NLP (Natural Language Processing) layers for semantic understanding.
2. Technology Comparison: AWS Textract vs Google Document AI

When choosing the underlying engine, the decision often falls on the two cloud giants. Here is an analysis based on benchmarks performed on datasets of Italian tax documents.
AWS Textract
Strengths: The Queries feature is a game-changer. Instead of extracting all text, you can query the document with natural language questions like “What is the net income?” or “What is the hiring date?”. Textract responds by providing the value and the exact bounding box.
Limitations: Requires robust post-processing to normalize dates and Italian currency formats (e.g., the comma as a decimal separator).
Google Document AI
Strengths: Offers extremely powerful pre-trained processors (Lending AI). Google’s ability to understand complex tables (such as the sections of the 730 tax return) is often superior thanks to the underlying Knowledge Graph.
Limitations: Costs tend to be higher for specialized processors and a steeper learning curve for fine-tuning on custom Italian documents.
3. Cloud Pipeline Architecture


We will design an event-driven serverless solution to ensure scalability and consumption-based costs. The reference architecture uses AWS as an example, but it is mirrored on Google Cloud (GCP).
Step 1: Ingestion and Trigger
The flow begins when the user uploads the document (PDF or JPG) to an Amazon S3 Bucket (or Google Cloud Storage). It is crucial to configure the bucket with Lifecycle policies to delete sensitive documents after processing, in compliance with GDPR.
The upload event (s3:ObjectCreated) triggers an AWS Lambda (or Google Cloud Function). This function acts as an orchestrator.
Step 2: Asynchronous Processing
For multi-page documents like the 730 tax return, synchronous processing times out. The Lambda must call the asynchronous API (e.g., start_document_analysis in Textract). The job ID is saved in a NoSQL database (DynamoDB) along with the “PROCESSING” status.
Step 3: Extraction and NLP Post-Processing
Upon completion of the analysis, a notification on Amazon SNS/SQS triggers a second processing Lambda. Here is where the magic happens:
- Normalization: The raw extracted data is cleaned. Example: convert “1.200,50 €” to
float(1200.50). - Entity Extraction (NLP): If we use Textract Queries, we map the responses to our database fields. If we use raw OCR, we use NLP libraries (like SpaCy or fine-tuned Transformer models) to identify key entities based on the spatial proximity of words.
- Business Logic: Automatic calculation of derived metrics, such as the Debt-to-Income ratio, based on the extracted data.
4. Data Validation and Confidence Scores
The heart of the system’s reliability lies in the management of the Confidence Score. Each field extracted by the AI is accompanied by a confidence percentage (0-100%).
We define the operational thresholds:
- Confidence > 90%: Automatic acceptance. The data flows directly into the banking CRM.
- Confidence 60% – 89%: “Warning” flag. The data is inserted but marked for a quick review.
- Confidence < 60%: Rejection or HITL (Human-in-the-loop) routing.
5. Human-in-the-loop (HITL) Workflow
Total automation is a dangerous myth in the financial sector. To manage low-confidence cases, we integrate a human review workflow (using AWS A2I or custom interfaces).
When confidence is below the threshold, the document and extracted data are sent to a review queue. A human operator sees an interface with the original document on the left and the extracted fields on the right. The operator corrects only the fields highlighted in red. Once validated, the correct data re-enters the pipeline and, crucially, is used to retrain the model, improving its future performance.
6. JSON Payload Example (Normalized Output)
Regardless of the cloud provider, the goal is to produce a standardized JSON ready for the Core Banking system:
{
"document_id": "uuid-1234-5678",
"document_type": "PAY_SLIP",
"extraction_date": "2026-02-22T10:00:00Z",
"entities": {
"net_income": {
"value": 1850.45,
"currency": "EUR",
"confidence": 98.5,
"source_page": 1
},
"employee_seniority_date": {
"value": "2018-05-01",
"confidence": 92.0,
"normalized": true
},
"fiscal_code": {
"value": "RSSMRA80A01H501U",
"confidence": 99.9,
"validation_check": "PASSED"
}
},
"review_required": false
}
In Brief (TL;DR)
Intelligent Document Processing revolutionizes mortgage granting by transforming paper documents into structured data essential for business.
The guide compares AWS Textract and Google Document AI to overcome the layout challenges of Italian tax documents.
A well-designed serverless pipeline integrates NLP logic and automatic validation to optimize operational times and costs.
Conclusions

Implementing a mortgage document automation pipeline requires a hybrid approach that balances the raw power of Cloud Computing with the finesse of Italian business rules. By using services like AWS Textract or Google DocAI, integrated with rigorous validation logic and strategic human supervision, financial institutions can reduce decision times from days to minutes, offering a superior customer experience and drastically reducing operational costs.
Frequently Asked Questions

AWS Textract stands out for its Queries feature, which allows you to query the document with natural questions to extract specific data like net income, making it ideal for variable layouts. Google Document AI, on the other hand, offers very powerful pre-trained processors, particularly effective in understanding complex tables such as those present in 730 tax return models, although it may entail generally higher costs.
Classic OCR systems fail due to the high variability of layouts generated by different payroll software and the poor quality of smartphone scans. Furthermore, they lack the semantic understanding necessary to distinguish similar numerical values, such as gross income versus social security taxable income, thus requiring an advanced approach based on neural OCR and NLP.
This hybrid approach ensures that when artificial intelligence assigns a low confidence score to an extracted datum, the document is sent to a human operator for review. Manual intervention not only corrects the specific error but provides valuable data for model retraining, progressively improving the system’s future performance and reducing operational risks.
Intelligent Document Processing or IDP is the technological evolution that transforms unstructured documents like PDFs and images into structured data ready for banking use. In the mortgage context, it orchestrates the automatic extraction of information from CUDs and pay slips via API, reducing processing times from weeks to minutes and minimizing manual data entry errors.
Security is guaranteed through serverless architectures that minimize data persistence and the use of Lifecycle policies on storage services like Amazon S3 or Google Cloud Storage. These configurations ensure that documents containing personal data are automatically deleted immediately after processing, guaranteeing full compliance with privacy regulations such as GDPR.
Sources and Further Reading

- Bank of Italy – Fintech and Digital Innovation in the Financial Sector
- European Commission – Data Protection Rules (GDPR)
- Artificial intelligence and machine learning in financial services: Market developments and financial stability implications (Financial Stability Board)
- IBM – What is Intelligent Document Processing (IDP)?





Did you find this article helpful? Is there another topic you’d like to see me cover?
Write it in the comments below! I take inspiration directly from your suggestions.