ETL vs ELT Pipelines in Fintech: A Complete Guide to BigQuery and dbt

Technical analysis of ETL vs ELT pipelines for Fintech. Practical guide to using BigQuery, dbt, and Airflow for Data Quality, Lineage, and mortgage auditability.

Published on Jan 12, 2026
Updated on Jan 12, 2026
reading time

In Brief (TL;DR)

Adopting the ELT model guarantees Fintech institutions complete auditability and calculation speeds essential for regulatory compliance.

Integrating Google BigQuery and Apache Airflow ensures scalable data ingestion while maintaining the full history of financial transactions.

Using dbt elevates data transformations to software code, optimizing governance and precision in KPI calculation.

The devil is in the details. 👇 Keep reading to discover the critical steps and practical tips to avoid mistakes.

Advertisement

In the 2026 data engineering landscape, choosing the correct architecture for data movement is not just a matter of performance, but of regulatory compliance, especially in the financial sector. When designing an etl vs elt pipeline for a Fintech institution, the stakes include decimal precision, complete auditability (Lineage), and the ability to calculate risk in near real-time. This technical guide explores why the ELT (Extract, Load, Transform) approach has become the de facto standard over traditional ETL, using a modern stack consisting of Google BigQuery, dbt (data build tool), and Apache Airflow.

Flowchart comparing ETL vs ELT data pipelines in a cloud data warehouse environment
Comparison of ETL and ELT data architectures for banking compliance and real-time analysis on BigQuery.

ETL vs ELT: The Paradigm Shift in Fintech

For years, the ETL (Extract, Transform, Load) approach was dominant. Data was extracted from transactional systems (e.g., mortgage databases), transformed on an intermediate server (staging area), and finally loaded into the Data Warehouse. While secure, this approach presented significant bottlenecks: the computing power of the transformation server limited speed, and every change to business logic required complex pipeline updates before the data even landed in the warehouse.

With the advent of Cloud Data Warehouses like Google BigQuery and AWS Redshift, the paradigm has shifted towards ELT (Extract, Load, Transform). In this model:

  • Extract: Data is extracted from source systems (CRM, Core Banking).
  • Load: Data is loaded immediately into the Data Warehouse in raw format (Raw Data).
  • Transform: Transformations take place directly inside the Warehouse, leveraging its massive computing power (MPP).

For Fintech, ELT offers a crucial advantage: auditability. Since raw data is always available in the Warehouse, it is possible to reconstruct the history of any transaction or recalculate KPIs retroactively without having to re-execute the extraction.

Discover more →

Solution Architecture: The Modern Stack

ETL vs ELT Pipelines in Fintech: A Complete Guide to BigQuery and dbt - Summary Infographic
Summary infographic of the article "ETL vs ELT Pipelines in Fintech: A Complete Guide to BigQuery and dbt"
Advertisement

To build a robust pipeline for mortgage management, we will use the following technology stack, considered best-practice in 2026:

  • Storage & Compute: Google BigQuery (for serverless scalability).
  • Transformation: dbt (to manage SQL transformations, documentation, and testing).
  • Orchestration: Apache Airflow (to schedule and monitor jobs).
Read also →

Phase 1: Ingestion and Orchestration with Apache Airflow

Fintech data architecture diagram on cloud BigQuery and dbt
The ELT architecture optimizes banking data flows for auditability and real-time performance.
Advertisement

The first step is to bring data into the Data Lake/Warehouse. In a Fintech context, timeliness is key. We use Apache Airflow to orchestrate extraction from operational databases (e.g., PostgreSQL) to BigQuery.

Example of Airflow DAG for Ingestion

The following conceptual snippet shows how to configure a task to load mortgage data in “append-only” mode to maintain full history.


from airflow import DAG
from airflow.providers.google.cloud.transfers.postgres_to_gcs import PostgresToGCSOperator
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
from airflow.utils.dates import days_ago

with DAG('fintech_mortgage_ingestion', start_date=days_ago(1), schedule_interval='@hourly') as dag:

    extract_mortgages = PostgresToGCSOperator(
        task_id='extract_mortgages_raw',
        postgres_conn_id='core_banking_db',
        sql='SELECT * FROM mortgages WHERE updated_at > {{ prev_execution_date }}',
        bucket='fintech-datalake-raw',
        filename='mortgages/{{ ds }}/mortgages.json',
    )

    load_to_bq = GCSToBigQueryOperator(
        task_id='load_mortgages_bq',
        bucket='fintech-datalake-raw',
        source_objects=['mortgages/{{ ds }}/mortgages.json'],
        destination_project_dataset_table='fintech_warehouse.raw_mortgages',
        write_disposition='WRITE_APPEND', # Crucial for history
        source_format='NEWLINE_DELIMITED_JSON',
    )

    extract_mortgages >> load_to_bq
Read also →

Phase 2: Transformation and Modeling with dbt

Once the data is in BigQuery, dbt comes into play. Unlike traditional stored procedures, dbt allows data transformations to be treated as software code (software engineering best practices), including versioning (Git), testing, and CI/CD.

dbt Project Structure

We organize models into three logical layers:

  1. Staging (src): Light cleaning, column renaming, data type casting.
  2. Intermediate (int): Complex business logic, joins between tables.
  3. Marts (fct/dim): Final tables ready for BI and regulatory reporting.

Calculating Complex KPIs: Loan-to-Value (LTV) in SQL

In the mortgage sector, LTV is a critical risk indicator. Here is what a dbt model (.sql file) looks like that calculates the updated LTV and classifies risk, joining master data and property appraisals.


-- models/marts/risk/fct_mortgage_risk.sql

WITH mortgages AS (
    SELECT * FROM {{ ref('stg_core_mortgages') }}
),

appraisals AS (
    -- We take the latest available appraisal for the property
    SELECT 
        property_id,
        appraisal_amount,
        appraisal_date,
        ROW_NUMBER() OVER (PARTITION BY property_id ORDER BY appraisal_date DESC) as rn
    FROM {{ ref('stg_external_appraisals') }}
)

SELECT
    m.mortgage_id,
    m.customer_id,
    m.outstanding_balance,
    a.appraisal_amount,
    -- LTV Calculation: (Outstanding Balance / Property Value) * 100
    ROUND((m.outstanding_balance / NULLIF(a.appraisal_amount, 0)) * 100, 2) as loan_to_value_ratio,
    CASE
        WHEN (m.outstanding_balance / NULLIF(a.appraisal_amount, 0)) > 0.80 THEN 'HIGH_RISK'
        WHEN (m.outstanding_balance / NULLIF(a.appraisal_amount, 0)) BETWEEN 0.50 AND 0.80 THEN 'MEDIUM_RISK'
        ELSE 'LOW_RISK'
    END as risk_category,
    CURRENT_TIMESTAMP() as calculated_at
FROM mortgages m
LEFT JOIN appraisals a ON m.property_id = a.property_id AND a.rn = 1
WHERE m.status = 'ACTIVE'
Read also →

Data Quality and Auditability: The Heart of Fintech

In the financial field, incorrect data can lead to penalties. The ELT approach with dbt excels in Data Quality thanks to integrated tests.

Implementing Automated Tests

In the dbt schema.yml file, we define the assertions that the data must satisfy. If a test fails, the pipeline stops or sends an alert, preventing the propagation of corrupt data.


version: 2

models:
  - name: fct_mortgage_risk
    description: "Fact table for mortgage risk"
    columns:
      - name: mortgage_id
        tests:
          - unique
          - not_null
      - name: loan_to_value_ratio
        tests:
          - not_null
          # Custom test: LTV cannot be negative or absurdly high (>200%)
          - dbt_utils.expression_is_true:
              expression: ">= 0 AND <= 200"

Data Lineage

dbt automatically generates a dependency graph (DAG). For an auditor, this means being able to visually trace how a customer’s “High Risk” status was derived: from the raw table, through intermediate transformations, to the final report. This level of transparency is often mandatory in banking inspections.

Read also →

Infrastructure Management and Versioning

Unlike legacy ETL pipelines based on GUIs (drag-and-drop graphical interfaces), the modern approach is Code-First.

  • Version Control (Git): Every change to the LTV calculation logic is a commit on Git. We can know who changed the formula, when, and why (via the Pull Request).
  • Isolated Environments: Thanks to dbt, every developer can run the pipeline in a sandbox environment (e.g., dbt run --target dev) on a subset of BigQuery data, without impacting production.

Common Troubleshooting

1. Schema Drift (Source schema change)

Problem: Core Banking adds a column to the mortgage table and the pipeline breaks.
Solution: In BigQuery, use the schema_update_options=['ALLOW_FIELD_ADDITION'] option during loading. In dbt, use packages like dbt_utils.star to dynamically select columns or implement strict schema tests that warn of the change without breaking the critical flow.

2. Data Latency

Problem: Property appraisal data arrives later than mortgage balances.
Solution: Implement “Late Arriving Facts” logic. Use SQL Window Functions (as seen in the example above with ROW_NUMBER) to always take the latest valid data available at runtime, or model snapshot tables to historicize the exact state at month-end.

Conclusions

disegno di un ragazzo seduto a gambe incrociate con un laptop sulle gambe che trae le conclusioni di tutto quello che si è scritto finora

The transition from an etl vs elt pipeline in the Fintech sector is not a trend, but an operational necessity. Using BigQuery for low-cost storage and high computational capacity, combined with dbt for transformation governance, allows financial companies to have reliable, auditable, and timely data. Implementing this architecture requires software engineering skills applied to data, but the return on investment in terms of compliance and business agility is incalculable.

Frequently Asked Questions

disegno di un ragazzo seduto con nuvolette di testo con dentro la parola FAQ
What difference distinguishes ETL and ELT in the Fintech sector?

The main distinction concerns the timing of data transformation. In the ETL model, data is processed before loading, whereas in the ELT paradigm, raw data is loaded immediately into the Data Warehouse and transformed subsequently. In Fintech, the ELT method is preferred as it guarantees total auditability and allows historical KPIs to be recalculated without re-executing extraction from source systems.

Why choose BigQuery and dbt for a modern data pipeline?

This combination constitutes the current standard thanks to the serverless scalability of Google BigQuery and the ability of dbt to manage SQL transformations as software code. Their joint use allows leveraging cloud computing power to process massive volumes of financial data, while ensuring versioning, automated testing, and clear documentation of business logic.

How does ELT architecture improve banking compliance?

The ELT model facilitates compliance by preserving original raw data in the Data Warehouse, making it possible to reconstruct the history of every transaction at any time. Furthermore, tools like dbt automatically generate a dependency graph, known as Data Lineage, which allows auditors to visualize exactly how a final data point was derived from the source during regulatory inspections.

How is data quality managed in financial transactions?

Data integrity is ensured through automated tests integrated into the transformation code, which verify the uniqueness and consistency of values. By defining specific rules in the configuration file, the pipeline can block the process or send immediate alerts if it detects anomalies, preventing the propagation of errors in decision-making or regulatory reports.

How are complex KPIs like Loan-to-Value calculated with this stack?

Risk indicator calculation is managed through modular SQL models inside the Data Warehouse, joining master data and property appraisals. Thanks to the ELT method, it is possible to implement logic that historicizes risk and manages data delays, ensuring that the calculation always reflects the most valid and updated information available at runtime.

Francesco Zinghinì

Electronic Engineer with a mission to simplify digital tech. Thanks to his background in Systems Theory, he analyzes software, hardware, and network infrastructures to offer practical guides on IT and telecommunications. Transforming technological complexity into accessible solutions.

Did you find this article helpful? Is there another topic you'd like to see me cover?
Write it in the comments below! I take inspiration directly from your suggestions.

Leave a comment

I campi contrassegnati con * sono obbligatori. Email e sito web sono facoltativi per proteggere la tua privacy.







14 commenti

Icona WhatsApp

Subscribe to our WhatsApp channel!

Get real-time updates on Guides, Reports and Offers

Click here to subscribe

Icona Telegram

Subscribe to our Telegram channel!

Get real-time updates on Guides, Reports and Offers

Click here to subscribe

1,0x
Condividi articolo
Table of Contents