ML · Telecom · Binary Classification

TeleChurn Intelligence

End-to-end machine learning pipeline to predict customer churn in telecommunications — from raw IBM Telco data to a production-ready XGBoost model with SHAP explainability.

0 Records
0% Churn Recall
0 Models Trained
0 Features Engineered
XGBoost
LightGBM
Random Forest
SMOTE
SHAP
PR-AUC Threshold
model_results.py
# Best Model: XGBoost + Threshold
model = XGBClassifier(
  n_estimators=400,
  max_depth=5,
  learning_rate=0.05,
  scale_pos_weight=2.76
)

# Final Metrics (Churn Class)
recall = 0.74
precision = 0.70
f1_score = 0.72
pr_auc = 0.63
01 — Data Understanding

IBM Telco Customer Churn

7,032 customer records after cleaning, spanning 21 features across demographics, services subscribed, billing details, and contract type.

📦
Dataset Source
IBM
Telco Customer Churn — publicly available sample dataset representing a real telecom company's customer base snapshot.
📊
Final Shape
7,032 × 21
After removing 11 records with null TotalCharges values. Zero remaining nulls post-cleaning. All columns verified.
⚖️
Class Imbalance
73.5 / 26.5
No Churn (5,163) vs Churn (1,869). Imbalanced distribution — accuracy is misleading; recall becomes the primary business KPI.
🎯
Target Variable
Churn 0/1
'Yes'/'No' string converted to binary integer. Stratified 80/20 split preserves class ratio in both train and test sets.
Feature Categories — hover to explore
tenure
MonthlyCharges
TotalCharges
SeniorCitizen
Partner
Dependents
InternetService
OnlineSecurity
OnlineBackup
DeviceProtection
TechSupport
StreamingTV
StreamingMovies
Contract
PaymentMethod
PaperlessBilling
Churn ← TARGET
02 — Exploratory Data Analysis

8 Key Discoveries

Thematic analyses revealing behavioral, structural, and financial drivers of churn — each with a direct business implication.

Churn Rate by Customer Segment
Ranked highest to lowest risk
M2M + e-Check
53.7%
Fiber Optic Users
41.9%
Senior Citizens
41.7%
Solo Customers
33.0%
Overall Average
26.5%
DSL Users
19.0%
Partnered Customers
19.7%
2-Year Contract
2.8%
Class Distribution
73.5% No Churn
No Churn (Majority)73.5%
Churned (Minority)26.5%
Total Records7,032
Imbalance Ratio2.76×
🔥 The Danger Zone Matrix
Churn rate (%) by Contract Type × Payment Method — hover cells for insight
Contract
Bank Auto
Credit Auto
⚠ e-Check
Mailed
Month-to-Month
34.1%
32.8%
53.7%
31.6%
One Year
9.7%
10.3%
18.4%
6.9%
Two Year
3.4%
2.2%
7.7%
0.8%
Critical >50%
High 25–50%
Medium 10–25%
Low <10%
03 — Feature Engineering

8 Engineered Features

Domain-driven features encoding business logic that raw columns cannot capture — consistently top SHAP contributors in the final model.

Contract Risk
Short_Contract
Contract == 'Month-to-month' → 1
Single strongest churn risk predictor. Customers with no long-term commitment can leave any time with zero friction.
⚡ Critical Impact
Service Depth
Total_Services
Count(add-ons == 'Yes'): 0–6
Service stickiness score. More subscribed services = higher switching cost. Churn drops from 45% → 5% with 6 services.
⚡ Critical Impact
Household
Is_Solo
No Partner AND No Dependents → 1
Encodes household isolation risk. Solo customers lack the social anchor that keeps partnered customers 80% longer on average.
↑ High Impact
Billing
Is_Auto_Pay
'automatic' in PaymentMethod → 1
Auto-payers passively renew without monthly re-evaluation. Manual payment triggers decision fatigue and churn risk each cycle.
↑ High Impact
Price Signal
Charge_Spike
MonthlyCharges – (TotalCharges / tenure)
Detects recent price increases relative to historical average. A sudden spike signals dissatisfaction risk for price-sensitive customers.
→ Medium Impact
Intensity
Charge_Tenure_Ratio
MonthlyCharges / (tenure + 1)
Charges intensity per month of tenure. High early-stage spend flags new customers paying a premium before establishing loyalty.
→ Medium Impact
Protection
Service_Stickiness
Count(Security + Backup + Support)
Protective add-ons specifically reduce willingness to switch providers. Each protection service raises the emotional switching cost.
→ Medium Impact
Revenue Risk
High_Value_At_Risk
MonthlyCharges > $70 → 1
Flags premium fiber optic customers with elevated churn exposure. High charges + fiber service = highest revenue-at-risk segment.
↑ High Impact
04 — Modeling Approach

6 Algorithm Families

11 model variants with systematic hyperparameter tuning, class imbalance handling, and threshold optimization. Click a model to explore.

📏
Logistic Regression
3 variants — baseline & tuned
🌿
Decision Tree
Interpretable — stakeholder-friendly
🌲
Random Forest
Ensemble — variance reduction
XGBoost
Best model — PR-AUC optimized
BEST
🔺
AdaBoost
Sequential — stump base learners
💡
LightGBM
Leaf-wise — fast & accurate
🗳️
Voting Ensemble
Soft-voting XGB+RF+LGBM
05 — Evaluation & Results

Model Scoreboard

Recall on the minority (Churn) class is the primary metric. Missing a churner costs a full customer lifetime value.

Recall Ranking — Churn Class
Higher = more churners caught before they leave
Baseline vs Best Model
Logistic Regression (baseline) vs XGBoost + Adv. Features + Threshold
Baseline LR
Best XGBoost
06 — Explainability

SHAP Feature Importance

SHapley Additive exPlanations on the best XGBoost model — global importance plus a live individual customer churn predictor.

Global Feature Importance
Mean |SHAP Value| — direction indicates risk increase (red) or decrease (green)
Short_Contract
0.92
↑ Risk
tenure
0.78
↓ Safe
MonthlyCharges
0.65
↑ Risk
InternetService_Fiber
0.58
↑ Risk
Total_Services
0.51
↓ Safe
Is_Solo
0.42
↑ Risk
TotalCharges
0.38
↓ Safe
Charge_Tenure_Ratio
0.31
↑ Risk
Is_Auto_Pay
0.24
↓ Safe
SeniorCitizen
0.18
↑ Risk
🔮 Live Churn Risk Explorer

Adjust customer attributes and watch the churn probability update in real time.

Tenure (months)6
Monthly Charges ($)$85
Services Bundled1
Month-to-Month Contract
Fiber Optic Internet
Senior Citizen
Auto-Pay Enabled
Estimated Churn Probability
72%
🔴 High Risk — Intervention Recommended
07 — Business Insights

4 Retention Strategies

Directly derived from model findings — ranked by estimated business impact and ease of implementation.

PRIORITY 01
📝
Contract Upgrade Campaign
Highest ROI
Offer targeted discounts to month-to-month customers for switching to 1- or 2-year contracts. A single contract change can reduce churn probability from ~35% to under 10%.
35% → 10%
Churn reduction with contract upgrade
PRIORITY 02
👴
Senior Retention Program
High Revenue Recovery
Design dedicated support and pricing tiers for senior customers. They pay ~$79/month (premium tier) but churn at 41.7% — a critical value destruction loop requiring immediate intervention.
$79/mo
Avg senior revenue at 41.7% churn risk
PRIORITY 03
🚀
First-90-Days Onboarding
Long-Term Loyalty
Tenure KDE analysis confirms the first 12 months are the most dangerous. Structured onboarding at Day 7, 30, and 60 can push customers past the 24-month loyalty threshold.
24 months
Loyalty threshold — churn drops dramatically after this
PRIORITY 04
🔗
Service Cross-Sell Push
Scalable at Low Cost
Target customers with 0–1 services — the 45% churn danger zone. Adding OnlineSecurity, TechSupport, or OnlineBackup shifts them from 45% to the sub-15% bracket permanently.
45% → 15%
Churn reduction from 1 service to 3+ services
08 — Project Structure

Repository Layout

📁
telechurn-intelligence/
📓
Full_EDA.ipynb← Data cleaning, EDA, feature engineering
📓
Model_building.ipynb← All models, tuning, SHAP, export
📄
WA_Fn-UseC_-Telco-Customer-Churn.csv← Raw source (add manually)
📄
output.csv← Processed + feature-engineered dataset
🤖
xgboost_churn_model_production.pkl← Serialized model bundle
📋
requirements.txt← Python dependencies
📖
README.md← Project documentation
🖼️
partner_churn_analysis.png← Saved visualization
🖼️
family_structure_churn.png← Saved visualization