In Brief (TL;DR)
Adopting the ELT model guarantees Fintech institutions complete auditability and calculation speeds essential for regulatory compliance.
Integrating Google BigQuery and Apache Airflow ensures scalable data ingestion while maintaining the full history of financial transactions.
Using dbt elevates data transformations to software code, optimizing governance and precision in KPI calculation.
The devil is in the details. 👇 Keep reading to discover the critical steps and practical tips to avoid mistakes.
In the 2026 data engineering landscape, choosing the correct architecture for data movement is not just a matter of performance, but of regulatory compliance, especially in the financial sector. When designing an etl vs elt pipeline for a Fintech institution, the stakes include decimal precision, complete auditability (Lineage), and the ability to calculate risk in near real-time. This technical guide explores why the ELT (Extract, Load, Transform) approach has become the de facto standard over traditional ETL, using a modern stack consisting of Google BigQuery, dbt (data build tool), and Apache Airflow.

ETL vs ELT: The Paradigm Shift in Fintech
For years, the ETL (Extract, Transform, Load) approach was dominant. Data was extracted from transactional systems (e.g., mortgage databases), transformed on an intermediate server (staging area), and finally loaded into the Data Warehouse. While secure, this approach presented significant bottlenecks: the computing power of the transformation server limited speed, and every change to business logic required complex pipeline updates before the data even landed in the warehouse.
With the advent of Cloud Data Warehouses like Google BigQuery and AWS Redshift, the paradigm has shifted towards ELT (Extract, Load, Transform). In this model:
- Extract: Data is extracted from source systems (CRM, Core Banking).
- Load: Data is loaded immediately into the Data Warehouse in raw format (Raw Data).
- Transform: Transformations take place directly inside the Warehouse, leveraging its massive computing power (MPP).
For Fintech, ELT offers a crucial advantage: auditability. Since raw data is always available in the Warehouse, it is possible to reconstruct the history of any transaction or recalculate KPIs retroactively without having to re-execute the extraction.
Solution Architecture: The Modern Stack

To build a robust pipeline for mortgage management, we will use the following technology stack, considered best-practice in 2026:
- Storage & Compute: Google BigQuery (for serverless scalability).
- Transformation: dbt (to manage SQL transformations, documentation, and testing).
- Orchestration: Apache Airflow (to schedule and monitor jobs).
Phase 1: Ingestion and Orchestration with Apache Airflow

The first step is to bring data into the Data Lake/Warehouse. In a Fintech context, timeliness is key. We use Apache Airflow to orchestrate extraction from operational databases (e.g., PostgreSQL) to BigQuery.
Example of Airflow DAG for Ingestion
The following conceptual snippet shows how to configure a task to load mortgage data in “append-only” mode to maintain full history.
from airflow import DAG
from airflow.providers.google.cloud.transfers.postgres_to_gcs import PostgresToGCSOperator
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
from airflow.utils.dates import days_ago
with DAG('fintech_mortgage_ingestion', start_date=days_ago(1), schedule_interval='@hourly') as dag:
extract_mortgages = PostgresToGCSOperator(
task_id='extract_mortgages_raw',
postgres_conn_id='core_banking_db',
sql='SELECT * FROM mortgages WHERE updated_at > {{ prev_execution_date }}',
bucket='fintech-datalake-raw',
filename='mortgages/{{ ds }}/mortgages.json',
)
load_to_bq = GCSToBigQueryOperator(
task_id='load_mortgages_bq',
bucket='fintech-datalake-raw',
source_objects=['mortgages/{{ ds }}/mortgages.json'],
destination_project_dataset_table='fintech_warehouse.raw_mortgages',
write_disposition='WRITE_APPEND', # Crucial for history
source_format='NEWLINE_DELIMITED_JSON',
)
extract_mortgages >> load_to_bq
Phase 2: Transformation and Modeling with dbt
Once the data is in BigQuery, dbt comes into play. Unlike traditional stored procedures, dbt allows data transformations to be treated as software code (software engineering best practices), including versioning (Git), testing, and CI/CD.
dbt Project Structure
We organize models into three logical layers:
- Staging (src): Light cleaning, column renaming, data type casting.
- Intermediate (int): Complex business logic, joins between tables.
- Marts (fct/dim): Final tables ready for BI and regulatory reporting.
Calculating Complex KPIs: Loan-to-Value (LTV) in SQL
In the mortgage sector, LTV is a critical risk indicator. Here is what a dbt model (.sql file) looks like that calculates the updated LTV and classifies risk, joining master data and property appraisals.
-- models/marts/risk/fct_mortgage_risk.sql
WITH mortgages AS (
SELECT * FROM {{ ref('stg_core_mortgages') }}
),
appraisals AS (
-- We take the latest available appraisal for the property
SELECT
property_id,
appraisal_amount,
appraisal_date,
ROW_NUMBER() OVER (PARTITION BY property_id ORDER BY appraisal_date DESC) as rn
FROM {{ ref('stg_external_appraisals') }}
)
SELECT
m.mortgage_id,
m.customer_id,
m.outstanding_balance,
a.appraisal_amount,
-- LTV Calculation: (Outstanding Balance / Property Value) * 100
ROUND((m.outstanding_balance / NULLIF(a.appraisal_amount, 0)) * 100, 2) as loan_to_value_ratio,
CASE
WHEN (m.outstanding_balance / NULLIF(a.appraisal_amount, 0)) > 0.80 THEN 'HIGH_RISK'
WHEN (m.outstanding_balance / NULLIF(a.appraisal_amount, 0)) BETWEEN 0.50 AND 0.80 THEN 'MEDIUM_RISK'
ELSE 'LOW_RISK'
END as risk_category,
CURRENT_TIMESTAMP() as calculated_at
FROM mortgages m
LEFT JOIN appraisals a ON m.property_id = a.property_id AND a.rn = 1
WHERE m.status = 'ACTIVE'
Data Quality and Auditability: The Heart of Fintech
In the financial field, incorrect data can lead to penalties. The ELT approach with dbt excels in Data Quality thanks to integrated tests.
Implementing Automated Tests
In the dbt schema.yml file, we define the assertions that the data must satisfy. If a test fails, the pipeline stops or sends an alert, preventing the propagation of corrupt data.
version: 2
models:
- name: fct_mortgage_risk
description: "Fact table for mortgage risk"
columns:
- name: mortgage_id
tests:
- unique
- not_null
- name: loan_to_value_ratio
tests:
- not_null
# Custom test: LTV cannot be negative or absurdly high (>200%)
- dbt_utils.expression_is_true:
expression: ">= 0 AND <= 200"
Data Lineage
dbt automatically generates a dependency graph (DAG). For an auditor, this means being able to visually trace how a customer’s “High Risk” status was derived: from the raw table, through intermediate transformations, to the final report. This level of transparency is often mandatory in banking inspections.
Infrastructure Management and Versioning
Unlike legacy ETL pipelines based on GUIs (drag-and-drop graphical interfaces), the modern approach is Code-First.
- Version Control (Git): Every change to the LTV calculation logic is a commit on Git. We can know who changed the formula, when, and why (via the Pull Request).
- Isolated Environments: Thanks to dbt, every developer can run the pipeline in a sandbox environment (e.g.,
dbt run --target dev) on a subset of BigQuery data, without impacting production.
Common Troubleshooting
1. Schema Drift (Source schema change)
Problem: Core Banking adds a column to the mortgage table and the pipeline breaks.
Solution: In BigQuery, use the schema_update_options=['ALLOW_FIELD_ADDITION'] option during loading. In dbt, use packages like dbt_utils.star to dynamically select columns or implement strict schema tests that warn of the change without breaking the critical flow.
2. Data Latency
Problem: Property appraisal data arrives later than mortgage balances.
Solution: Implement “Late Arriving Facts” logic. Use SQL Window Functions (as seen in the example above with ROW_NUMBER) to always take the latest valid data available at runtime, or model snapshot tables to historicize the exact state at month-end.
Conclusions

The transition from an etl vs elt pipeline in the Fintech sector is not a trend, but an operational necessity. Using BigQuery for low-cost storage and high computational capacity, combined with dbt for transformation governance, allows financial companies to have reliable, auditable, and timely data. Implementing this architecture requires software engineering skills applied to data, but the return on investment in terms of compliance and business agility is incalculable.
Frequently Asked Questions

The main distinction concerns the timing of data transformation. In the ETL model, data is processed before loading, whereas in the ELT paradigm, raw data is loaded immediately into the Data Warehouse and transformed subsequently. In Fintech, the ELT method is preferred as it guarantees total auditability and allows historical KPIs to be recalculated without re-executing extraction from source systems.
This combination constitutes the current standard thanks to the serverless scalability of Google BigQuery and the ability of dbt to manage SQL transformations as software code. Their joint use allows leveraging cloud computing power to process massive volumes of financial data, while ensuring versioning, automated testing, and clear documentation of business logic.
The ELT model facilitates compliance by preserving original raw data in the Data Warehouse, making it possible to reconstruct the history of every transaction at any time. Furthermore, tools like dbt automatically generate a dependency graph, known as Data Lineage, which allows auditors to visualize exactly how a final data point was derived from the source during regulatory inspections.
Data integrity is ensured through automated tests integrated into the transformation code, which verify the uniqueness and consistency of values. By defining specific rules in the configuration file, the pipeline can block the process or send immediate alerts if it detects anomalies, preventing the propagation of errors in decision-making or regulatory reports.
Risk indicator calculation is managed through modular SQL models inside the Data Warehouse, joining master data and property appraisals. Thanks to the ELT method, it is possible to implement logic that historicizes risk and manages data delays, ensuring that the calculation always reflects the most valid and updated information available at runtime.
Sources and Further Reading
- Bank for International Settlements: Principles for effective risk data aggregation and risk reporting (BCBS 239)
- NIST Special Publication 800-145: The NIST Definition of Cloud Computing
- Wikipedia: Extract, Load, Transform (ELT) Architecture
- FFIEC IT Examination Handbook: Architecture, Infrastructure, and Operations

Did you find this article helpful? Is there another topic you'd like to see me cover?
Write it in the comments below! I take inspiration directly from your suggestions.