Quale differenza distingue ETL ed ELT nel settore Fintech?

La distinzione principale riguarda il momento della trasformazione dei dati. Nel modello ETL i dati vengono elaborati prima del caricamento, mentre nel paradigma ELT i dati grezzi vengono caricati subito nel Data Warehouse e trasformati successivamente. Nel Fintech il metodo ELT è preferito poiché garantisce la totale auditabilità e consente di ricalcolare i KPI storici senza rieseguire la estrazione dai sistemi sorgente.

Perché scegliere BigQuery e dbt per una pipeline dati moderna?

Questa combinazione costituisce lo standard attuale grazie alla scalabilità serverless di Google BigQuery e alla capacità di dbt di gestire le trasformazioni SQL come codice software. Il loro utilizzo congiunto permette di sfruttare la potenza di calcolo del cloud per elaborare massicci volumi di dati finanziari, assicurando al contempo versionamento, testing automatico e una documentazione chiara della logica di business.

In che modo la architettura ELT migliora la compliance bancaria?

Il modello ELT facilita la compliance conservando i dati grezzi originali nel Data Warehouse, rendendo possibile ricostruire la storia di ogni transazione in qualsiasi momento. Inoltre, strumenti come dbt generano automaticamente un grafo di dipendenze, noto come Data Lineage, che consente agli auditor di visualizzare esattamente come un dato finale sia stato derivato dalla fonte durante le ispezioni normative.

Come viene gestita la qualità dei dati nelle transazioni finanziarie?

La integrità del dato viene assicurata tramite test automatici integrati nel codice di trasformazione, i quali verificano unicità e coerenza dei valori. Definendo regole specifiche nel file di configurazione, la pipeline può bloccare il processo o inviare avvisi immediati se rileva anomalie, prevenendo la propagazione di errori nei report decisionali o regolatori.

Come si calcolano KPI complessi come il Loan-to-Value con questo stack?

Il calcolo di indicatori di rischio viene gestito attraverso modelli SQL modulari dentro il Data Warehouse, unendo anagrafiche e valutazioni immobiliari. Grazie al metodo ELT è possibile implementare logiche che storicizzano il rischio e gestiscono i ritardi dei dati, assicurando che il calcolo rifletta sempre la informazione più valida e aggiornata disponibile al momento della esecuzione.

ETL vs ELT Pipelines in Fintech: A Complete Guide to BigQuery and dbt

Technical analysis of ETL vs ELT pipelines for Fintech. Practical guide to using BigQuery, dbt, and Airflow for Data Quality, Lineage, and mortgage auditability.

practical guide software programming cloud storage

by Francesco Zinghinì

Published on Jan 12, 2026

Updated on Jan 12, 2026

8 minutes reading time

In Brief (TL;DR)

Adopting the ELT model guarantees Fintech institutions complete auditability and calculation speeds essential for regulatory compliance.

Integrating Google BigQuery and Apache Airflow ensures scalable data ingestion while maintaining the full history of financial transactions.

Using dbt elevates data transformations to software code, optimizing governance and precision in KPI calculation.

The devil is in the details. 👇 Keep reading to discover the critical steps and practical tips to avoid mistakes.

In the 2026 data engineering landscape, choosing the correct architecture for data movement is not just a matter of performance, but of regulatory compliance, especially in the financial sector. When designing an etl vs elt pipeline for a Fintech institution, the stakes include decimal precision, complete auditability (Lineage), and the ability to calculate risk in near real-time. This technical guide explores why the ELT (Extract, Load, Transform) approach has become the de facto standard over traditional ETL, using a modern stack consisting of Google BigQuery, dbt (data build tool), and Apache Airflow.

ETL vs ELT: The Paradigm Shift in Fintech

For years, the ETL (Extract, Transform, Load) approach was dominant. Data was extracted from transactional systems (e.g., mortgage databases), transformed on an intermediate server (staging area), and finally loaded into the Data Warehouse. While secure, this approach presented significant bottlenecks: the computing power of the transformation server limited speed, and every change to business logic required complex pipeline updates before the data even landed in the warehouse.

With the advent of Cloud Data Warehouses like Google BigQuery and AWS Redshift, the paradigm has shifted towards ELT (Extract, Load, Transform). In this model:

Extract: Data is extracted from source systems (CRM, Core Banking).
Load: Data is loaded immediately into the Data Warehouse in raw format (Raw Data).
Transform: Transformations take place directly inside the Warehouse, leveraging its massive computing power (MPP).

For Fintech, ELT offers a crucial advantage: auditability. Since raw data is always available in the Warehouse, it is possible to reconstruct the history of any transaction or recalculate KPIs retroactively without having to re-execute the extraction.

Solution Architecture: The Modern Stack

ETL vs ELT Pipelines in Fintech: A Complete Guide to BigQuery and dbt - Summary Infographic — Summary infographic of the article "ETL vs ELT Pipelines in Fintech: A Complete Guide to BigQuery and dbt"

To build a robust pipeline for mortgage management, we will use the following technology stack, considered best-practice in 2026:

Storage & Compute: Google BigQuery (for serverless scalability).
Transformation: dbt (to manage SQL transformations, documentation, and testing).
Orchestration: Apache Airflow (to schedule and monitor jobs).

Phase 1: Ingestion and Orchestration with Apache Airflow

Fintech data architecture diagram on cloud BigQuery and dbt — The ELT architecture optimizes banking data flows for auditability and real-time performance.

The first step is to bring data into the Data Lake/Warehouse. In a Fintech context, timeliness is key. We use Apache Airflow to orchestrate extraction from operational databases (e.g., PostgreSQL) to BigQuery.

Example of Airflow DAG for Ingestion

The following conceptual snippet shows how to configure a task to load mortgage data in “append-only” mode to maintain full history.


from airflow import DAG
from airflow.providers.google.cloud.transfers.postgres_to_gcs import PostgresToGCSOperator
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
from airflow.utils.dates import days_ago

with DAG('fintech_mortgage_ingestion', start_date=days_ago(1), schedule_interval='@hourly') as dag:

    extract_mortgages = PostgresToGCSOperator(
        task_id='extract_mortgages_raw',
        postgres_conn_id='core_banking_db',
        sql='SELECT * FROM mortgages WHERE updated_at > {{ prev_execution_date }}',
        bucket='fintech-datalake-raw',
        filename='mortgages/{{ ds }}/mortgages.json',
    )

    load_to_bq = GCSToBigQueryOperator(
        task_id='load_mortgages_bq',
        bucket='fintech-datalake-raw',
        source_objects=['mortgages/{{ ds }}/mortgages.json'],
        destination_project_dataset_table='fintech_warehouse.raw_mortgages',
        write_disposition='WRITE_APPEND', # Crucial for history
        source_format='NEWLINE_DELIMITED_JSON',
    )

    extract_mortgages >> load_to_bq

Phase 2: Transformation and Modeling with dbt

Once the data is in BigQuery, dbt comes into play. Unlike traditional stored procedures, dbt allows data transformations to be treated as software code (software engineering best practices), including versioning (Git), testing, and CI/CD.

dbt Project Structure

We organize models into three logical layers:

Staging (src): Light cleaning, column renaming, data type casting.
Intermediate (int): Complex business logic, joins between tables.
Marts (fct/dim): Final tables ready for BI and regulatory reporting.

Calculating Complex KPIs: Loan-to-Value (LTV) in SQL

In the mortgage sector, LTV is a critical risk indicator. Here is what a dbt model (.sql file) looks like that calculates the updated LTV and classifies risk, joining master data and property appraisals.


-- models/marts/risk/fct_mortgage_risk.sql

WITH mortgages AS (
    SELECT * FROM {{ ref('stg_core_mortgages') }}
),

appraisals AS (
    -- We take the latest available appraisal for the property
    SELECT 
        property_id,
        appraisal_amount,
        appraisal_date,
        ROW_NUMBER() OVER (PARTITION BY property_id ORDER BY appraisal_date DESC) as rn
    FROM {{ ref('stg_external_appraisals') }}
)

SELECT
    m.mortgage_id,
    m.customer_id,
    m.outstanding_balance,
    a.appraisal_amount,
    -- LTV Calculation: (Outstanding Balance / Property Value) * 100
    ROUND((m.outstanding_balance / NULLIF(a.appraisal_amount, 0)) * 100, 2) as loan_to_value_ratio,
    CASE
        WHEN (m.outstanding_balance / NULLIF(a.appraisal_amount, 0)) > 0.80 THEN 'HIGH_RISK'
        WHEN (m.outstanding_balance / NULLIF(a.appraisal_amount, 0)) BETWEEN 0.50 AND 0.80 THEN 'MEDIUM_RISK'
        ELSE 'LOW_RISK'
    END as risk_category,
    CURRENT_TIMESTAMP() as calculated_at
FROM mortgages m
LEFT JOIN appraisals a ON m.property_id = a.property_id AND a.rn = 1
WHERE m.status = 'ACTIVE'

Data Quality and Auditability: The Heart of Fintech

In the financial field, incorrect data can lead to penalties. The ELT approach with dbt excels in Data Quality thanks to integrated tests.

Implementing Automated Tests

In the dbt schema.yml file, we define the assertions that the data must satisfy. If a test fails, the pipeline stops or sends an alert, preventing the propagation of corrupt data.


version: 2

models:
  - name: fct_mortgage_risk
    description: "Fact table for mortgage risk"
    columns:
      - name: mortgage_id
        tests:
          - unique
          - not_null
      - name: loan_to_value_ratio
        tests:
          - not_null
          # Custom test: LTV cannot be negative or absurdly high (>200%)
          - dbt_utils.expression_is_true:
              expression: ">= 0 AND <= 200"

Data Lineage

dbt automatically generates a dependency graph (DAG). For an auditor, this means being able to visually trace how a customer’s “High Risk” status was derived: from the raw table, through intermediate transformations, to the final report. This level of transparency is often mandatory in banking inspections.

Infrastructure Management and Versioning

Unlike legacy ETL pipelines based on GUIs (drag-and-drop graphical interfaces), the modern approach is Code-First.

Version Control (Git): Every change to the LTV calculation logic is a commit on Git. We can know who changed the formula, when, and why (via the Pull Request).
Isolated Environments: Thanks to dbt, every developer can run the pipeline in a sandbox environment (e.g., dbt run --target dev) on a subset of BigQuery data, without impacting production.

Common Troubleshooting

1. Schema Drift (Source schema change)

Problem: Core Banking adds a column to the mortgage table and the pipeline breaks.
Solution: In BigQuery, use the schema_update_options=['ALLOW_FIELD_ADDITION'] option during loading. In dbt, use packages like dbt_utils.star to dynamically select columns or implement strict schema tests that warn of the change without breaking the critical flow.

2. Data Latency

Problem: Property appraisal data arrives later than mortgage balances.
Solution: Implement “Late Arriving Facts” logic. Use SQL Window Functions (as seen in the example above with ROW_NUMBER) to always take the latest valid data available at runtime, or model snapshot tables to historicize the exact state at month-end.

Conclusions

disegno di un ragazzo seduto a gambe incrociate con un laptop sulle gambe che trae le conclusioni di tutto quello che si è scritto finora

The transition from an etl vs elt pipeline in the Fintech sector is not a trend, but an operational necessity. Using BigQuery for low-cost storage and high computational capacity, combined with dbt for transformation governance, allows financial companies to have reliable, auditable, and timely data. Implementing this architecture requires software engineering skills applied to data, but the return on investment in terms of compliance and business agility is incalculable.

Frequently Asked Questions

disegno di un ragazzo seduto con nuvolette di testo con dentro la parola FAQ

What difference distinguishes ETL and ELT in the Fintech sector?

The main distinction concerns the timing of data transformation. In the ETL model, data is processed before loading, whereas in the ELT paradigm, raw data is loaded immediately into the Data Warehouse and transformed subsequently. In Fintech, the ELT method is preferred as it guarantees total auditability and allows historical KPIs to be recalculated without re-executing extraction from source systems.

Why choose BigQuery and dbt for a modern data pipeline?

This combination constitutes the current standard thanks to the serverless scalability of Google BigQuery and the ability of dbt to manage SQL transformations as software code. Their joint use allows leveraging cloud computing power to process massive volumes of financial data, while ensuring versioning, automated testing, and clear documentation of business logic.

How does ELT architecture improve banking compliance?

The ELT model facilitates compliance by preserving original raw data in the Data Warehouse, making it possible to reconstruct the history of every transaction at any time. Furthermore, tools like dbt automatically generate a dependency graph, known as Data Lineage, which allows auditors to visualize exactly how a final data point was derived from the source during regulatory inspections.

How is data quality managed in financial transactions?

Data integrity is ensured through automated tests integrated into the transformation code, which verify the uniqueness and consistency of values. By defining specific rules in the configuration file, the pipeline can block the process or send immediate alerts if it detects anomalies, preventing the propagation of errors in decision-making or regulatory reports.

How are complex KPIs like Loan-to-Value calculated with this stack?

Risk indicator calculation is managed through modular SQL models inside the Data Warehouse, joining master data and property appraisals. Thanks to the ELT method, it is possible to implement logic that historicizes risk and manages data delays, ensuring that the calculation always reflects the most valid and updated information available at runtime.

Sources and Further Reading

disegno di un ragazzo seduto con un laptop sulle gambe che ricerca dal web le fonti per scrivere un post

Francesco Zinghinì

Electronic Engineer with a mission to simplify digital tech. Thanks to his background in Systems Theory, he analyzes software, hardware, and network infrastructures to offer practical guides on IT and telecommunications. Transforming technological complexity into accessible solutions.

Did you find this article helpful? Is there another topic you'd like to see me cover?
Write it in the comments below! I take inspiration directly from your suggestions.

I campi contrassegnati con * sono obbligatori. Email e sito web sono facoltativi per proteggere la tua privacy.

14 commenti

Subscribe to our WhatsApp channel!

Get real-time updates on Guides, Reports and Offers

Click here to subscribe

Subscribe to our Telegram channel!