Questa è una versione PDF del contenuto. Per la versione completa e aggiornata, visita:
https://blog.tuttosemplice.com/en/lead-engineering-guide-to-predictive-lead-scoring-with-ai-and-crm/
Verrai reindirizzato automaticamente...
In the current landscape of credit brokerage, viewing Lead Generation as a simple marketing activity is a fatal strategic error. We are in the era of Lead Engineering, where the customer acquisition flow must be treated as a closed-loop control system. This technical guide will explore how to design and implement a predictive lead scoring engine within an advanced CRM ecosystem, such as BOMA, transforming raw behavioral data into mathematical probabilities of mortgage disbursement.
The goal is no longer to generate contacts, but to predict revenue. Using Machine Learning algorithms and a solid data architecture, we will move from the subjective intuition of sales reps to a deterministic data-driven approach.
To build an effective scoring model, we must first establish a data pipeline (ETL) that connects user behavior on the website with the actual outcome of the file in the CRM. The proposed architecture is based on three pillars:
Before proceeding, ensure you have access to:
pandas, scikit-learn, xgboost libraries.Predictive lead scoring relies not only on demographics (age, income) but primarily on implicit signals. In the mortgage sector, how a user interacts with the simulator is a proxy for their purchasing intent and eligibility.
We need to extract user sessions and transform them into features. Here is an example SQL query to extract behavioral metrics:
SELECT
user_pseudo_id,
COUNTIF(event_name = 'view_mortgage_simulator') as simulator_interactions,
AVG(SAFE_CAST(event_params.value.string_value AS FLOAT64)) as avg_loan_amount,
MAX(event_timestamp) - MIN(event_timestamp) as session_duration_micros,
COUNTIF(event_name = 'download_pdf_guide') as high_intent_actions
FROM
`project_id.analytics_123456.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20251201' AND '20260131'
GROUP BY
user_pseudo_id
For a credit scoring model, the most predictive variables (features) we need to engineer include:
Why use XGBoost (Extreme Gradient Boosting) instead of simple logistic regression? Because behavioral data is often non-linear and contains many missing values. Decision trees handle these irregularities better and offer greater interpretability via feature importance.
Below is a code example to train the model. We assume we have a DataFrame df merging GA4 data with the historical outcome of files (0 = lost, 1 = disbursed) exported from the CRM.
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
# Feature and Target Separation
X = df.drop(['conversion_flag', 'user_id'], axis=1)
y = df['conversion_flag']
# Dataset split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# XGBoost model configuration
model = xgb.XGBClassifier(
objective='binary:logistic',
n_estimators=100,
learning_rate=0.05,
max_depth=6,
scale_pos_weight=10 # Crucial for imbalanced datasets (few mortgages disbursed compared to leads)
)
# Training
model.fit(X_train, y_train)
# Evaluation
preds = model.predict_proba(X_test)[:, 1]
print(f"AUC Score: {roc_auc_score(y_test, preds)}")
The scale_pos_weight parameter is fundamental in the credit sector, where the real conversion rate can be lower than 2-3%. This balances the weight of errors on positive classes.
Once the model generates a probability (e.g., 0.85), this must be sent to the CRM in real-time or in batches. In the context of BOMA, we will use REST APIs to update the predictive_score custom field.
client_id or cookie).Example JSON Payload to BOMA:
{
"lead_id": "102938",
"custom_fields": {
"predictive_score": 85,
"score_cluster": "HOT",
"recommended_action": "Call_Immediately"
}
}
The true power of systems engineering lies in feedback. A static model degrades over time (model drift). An inverse process must be configured:
Every night, a script must extract the updated status of files from BOMA CRM (e.g., “Underwriting”, “Approval”, “Rejected”) and load it into BigQuery. This data becomes the new Ground Truth for model retraining.
Implementing a predictive lead scoring system is not an academic exercise, but a financial necessity. By shifting call center resources to leads with a score > 70, credit brokerage companies can reduce Customer Acquisition Cost (CAC) by up to 40% and increase the conversion rate on processed files.
Integration between GA4, BigQuery, and an evolved CRM like BOMA represents the state of the art in 2026. It is no longer about calling all contacts as soon as possible, but calling the right contacts, with the right offer, at the right time, guided by mathematics.
Predictive lead scoring is a methodology that uses Machine Learning algorithms to calculate the mathematical probability that a contact turns into revenue. In the credit sector, this system analyzes user behaviors, such as interaction with the simulator, to assign a priority score, allowing consultants to focus only on files with a high probability of disbursement.
XGBoost is preferred because online behavioral data is often non-linear and fragmented. Unlike classic regression, the decision trees of this algorithm handle missing values better and offer greater variable interpretability, proving more effective in predicting complex outcomes like mortgage approval.
A robust model requires combining navigation data, coming from tools like Google Analytics 4, and historical data on file outcomes present in the CRM. The most predictive features include the implicit debt-to-income ratio, hesitation time on offers, and site visit frequency in the last 30 days.
The architecture involves extracting raw data from GA4 to BigQuery for storage. Subsequently, Python scripts process this data generating a score that is sent in real-time to BOMA CRM via API. This allows updating the customer record with a predictive score and suggesting the best action to the sales rep.
The feedback loop is a process that re-imports the actual sales outcome from the CRM to the artificial intelligence model. This allows the system to self-correct: if a high-score lead does not convert, the algorithm learns to penalize similar profiles in the future, reducing customer acquisition costs and increasing call center operational efficiency.