Lead Engineering: Guide to Predictive Lead Scoring with AI and CRM

Autore: Francesco Zinghinì | Data: 6 Febbraio 2026

In the current landscape of credit brokerage, viewing Lead Generation as a simple marketing activity is a fatal strategic error. We are in the era of Lead Engineering, where the customer acquisition flow must be treated as a closed-loop control system. This technical guide will explore how to design and implement a predictive lead scoring engine within an advanced CRM ecosystem, such as BOMA, transforming raw behavioral data into mathematical probabilities of mortgage disbursement.

The goal is no longer to generate contacts, but to predict revenue. Using Machine Learning algorithms and a solid data architecture, we will move from the subjective intuition of sales reps to a deterministic data-driven approach.

1. System Architecture: From Tracking to Inference

To build an effective scoring model, we must first establish a data pipeline (ETL) that connects user behavior on the website with the actual outcome of the file in the CRM. The proposed architecture is based on three pillars:

Data Source (Input): Google Analytics 4 (GA4) for behavioral data and mortgage simulator logs.
Data Warehouse (Processing): Google BigQuery for data storage and normalization.
Decision Engine (Core): Python scripts (hosted on Cloud Functions or Vertex AI) running XGBoost models.
Destination (Output): The BOMA CRM, which receives the score and orchestrates lead assignment.

Technical Prerequisites

Before proceeding, ensure you have access to:

Google Cloud Platform account with BigQuery enabled.
Daily export (or streaming) from GA4 to BigQuery configured.
API access to the BOMA CRM (or your proprietary CRM).
Python 3.9+ environment with pandas, scikit-learn, xgboost libraries.

2. Data Ingestion and Feature Engineering

Predictive lead scoring relies not only on demographics (age, income) but primarily on implicit signals. In the mortgage sector, how a user interacts with the simulator is a proxy for their purchasing intent and eligibility.

Extraction from BigQuery

We need to extract user sessions and transform them into features. Here is an example SQL query to extract behavioral metrics:


SELECT
  user_pseudo_id,
  COUNTIF(event_name = 'view_mortgage_simulator') as simulator_interactions,
  AVG(SAFE_CAST(event_params.value.string_value AS FLOAT64)) as avg_loan_amount,
  MAX(event_timestamp) - MIN(event_timestamp) as session_duration_micros,
  COUNTIF(event_name = 'download_pdf_guide') as high_intent_actions
FROM
  `project_id.analytics_123456.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20251201' AND '20260131'
GROUP BY
  user_pseudo_id

Defining Critical Features

For a credit scoring model, the most predictive variables (features) we need to engineer include:

Implicit Loan-to-Value (LTV): If the user enters a requested amount and property value in the simulator, the ratio is a strong indicator of feasibility.
Hesitation Time: Excessive time on the rates page could indicate price sensitivity.
Recursivity: Number of visits in the last 30 days.

3. Algorithmic Model Development (XGBoost)

Why use XGBoost (Extreme Gradient Boosting) instead of simple logistic regression? Because behavioral data is often non-linear and contains many missing values. Decision trees handle these irregularities better and offer greater interpretability via feature importance.

Python Implementation

Below is a code example to train the model. We assume we have a DataFrame df merging GA4 data with the historical outcome of files (0 = lost, 1 = disbursed) exported from the CRM.


import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Feature and Target Separation
X = df.drop(['conversion_flag', 'user_id'], axis=1)
y = df['conversion_flag']

# Dataset split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# XGBoost model configuration
model = xgb.XGBClassifier(
    objective='binary:logistic',
    n_estimators=100,
    learning_rate=0.05,
    max_depth=6,
    scale_pos_weight=10 # Crucial for imbalanced datasets (few mortgages disbursed compared to leads)
)

# Training
model.fit(X_train, y_train)

# Evaluation
preds = model.predict_proba(X_test)[:, 1]
print(f"AUC Score: {roc_auc_score(y_test, preds)}")

The scale_pos_weight parameter is fundamental in the credit sector, where the real conversion rate can be lower than 2-3%. This balances the weight of errors on positive classes.

4. API Integration with BOMA CRM

Once the model generates a probability (e.g., 0.85), this must be sent to the CRM in real-time or in batches. In the context of BOMA, we will use REST APIs to update the predictive_score custom field.

Update Workflow

The user fills out the quote request form.
The backend sends data to the CRM and simultaneously queries our model (exposed via Flask/FastAPI).
The model calculates the score based on navigation history (retrieved via client_id or cookie).
The system sends a PATCH request to the CRM.

Example JSON Payload to BOMA:


{
  "lead_id": "102938",
  "custom_fields": {
    "predictive_score": 85,
    "score_cluster": "HOT",
    "recommended_action": "Call_Immediately"
  }
}

5. The Feedback Loop: Adaptive Control

The true power of systems engineering lies in feedback. A static model degrades over time (model drift). An inverse process must be configured:

Every night, a script must extract the updated status of files from BOMA CRM (e.g., “Underwriting”, “Approval”, “Rejected”) and load it into BigQuery. This data becomes the new Ground Truth for model retraining.

If the model predicted 90/100 for a lead later rejected for “Insufficient Income”, the algorithm will learn to penalize similar feature combinations in future iterations.
This creates a self-correcting system that adapts to market changes (e.g., tightening of bank credit policies).

Conclusions and Impact on ROI

Implementing a predictive lead scoring system is not an academic exercise, but a financial necessity. By shifting call center resources to leads with a score > 70, credit brokerage companies can reduce Customer Acquisition Cost (CAC) by up to 40% and increase the conversion rate on processed files.

Integration between GA4, BigQuery, and an evolved CRM like BOMA represents the state of the art in 2026. It is no longer about calling all contacts as soon as possible, but calling the right contacts, with the right offer, at the right time, guided by mathematics.

Frequently Asked Questions

What is predictive lead scoring and how does it apply to mortgages?

Predictive lead scoring is a methodology that uses Machine Learning algorithms to calculate the mathematical probability that a contact turns into revenue. In the credit sector, this system analyzes user behaviors, such as interaction with the simulator, to assign a priority score, allowing consultants to focus only on files with a high probability of disbursement.

Why use XGBoost instead of logistic regression for scoring?

XGBoost is preferred because online behavioral data is often non-linear and fragmented. Unlike classic regression, the decision trees of this algorithm handle missing values better and offer greater variable interpretability, proving more effective in predicting complex outcomes like mortgage approval.

What data is needed to build an effective scoring model?

A robust model requires combining navigation data, coming from tools like Google Analytics 4, and historical data on file outcomes present in the CRM. The most predictive features include the implicit debt-to-income ratio, hesitation time on offers, and site visit frequency in the last 30 days.

How does the integration between GA4, BigQuery, and BOMA CRM work?

The architecture involves extracting raw data from GA4 to BigQuery for storage. Subsequently, Python scripts process this data generating a score that is sent in real-time to BOMA CRM via API. This allows updating the customer record with a predictive score and suggesting the best action to the sales rep.

How does the feedback loop improve lead generation ROI?

The feedback loop is a process that re-imports the actual sales outcome from the CRM to the artificial intelligence model. This allows the system to self-correct: if a high-score lead does not convert, the algorithm learns to penalize similar profiles in the future, reducing customer acquisition costs and increasing call center operational efficiency.