Â
Customer churn is one of the most critical issues for telecom companies, as losing customers directly impacts revenue and growth. I built this Customer Churn Prediction App to demonstrate how machine learning can be used to predict whether a telecom customer is likely to leave the company (churn) or stay. The app allows businesses to proactively retain customers and optimize their marketing strategies while serving as an educational project showcasing end-to-end ML implementation.
Github Repo: https://github.com/mirzayasirabdullahbaig07/Customer_Churn_Prediction_Model
The main motivation was to explore how AI can solve real-world business problems. Customer churn prediction is a high-impact application of machine learning because it helps companies reduce revenue loss and improve customer retention. Additionally, I aimed to create a project that covers the entire ML lifecycle: from data preprocessing and exploratory data analysis to model building, evaluation, and deployment in a web app.
The project uses the Telco Customer Churn Dataset from IBM Sample Data, which contains both categorical and numerical features about customers:
Demographics: Gender, SeniorCitizen, Partner, Dependents
Account Information: Tenure, Contract type, Payment method, Paperless billing
Services: Phone service, Internet, Online security, Streaming services, Tech support
Charges: MonthlyCharges, TotalCharges
The target variable is Churn:
Yes → Customer will leave
No → Customer will stay
The dataset required preprocessing, including handling missing values, encoding categorical features, scaling numeric features, and addressing class imbalance using SMOTE.
I explored multiple machine learning models, including Random Forest and XGBoost, to identify the best-performing approach.
Why Random Forest/XGBoost:
Handles high-dimensional data with mixed categorical and numerical features
Robust to overfitting
Provides feature importance insights
Data Handling & Preprocessing:
Label encoding for categorical variables
Feature scaling for numerical columns
SMOTE to balance the dataset for accurate predictions
Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, and AUC to ensure reliable predictions. The models achieved high predictive performance, minimizing false positives and false negatives, which is critical in customer retention scenarios.
The model is deployed using Streamlit, providing a user-friendly interface:
Single Customer Prediction: Users input customer details and receive a prediction (Yes → will churn, No → will stay).
Batch Prediction: Users upload a CSV file containing multiple customer profiles to receive batch predictions.
Visuals: Probability-based predictions and clear charts to help interpret results.
The app also features a modern sidebar with portfolio links and serves as an educational tool for demonstrating real-world ML deployment.
If asked to explain this project in an interview, you can highlight:
Problem Motivation: Reducing customer churn is critical for revenue optimization in telecom.
Dataset Understanding: Mixed categorical and numerical features; target variable is churn.
Model Choice: Random Forest and XGBoost for high accuracy, interpretability, and handling complex feature relationships.
Techniques Used:
Data preprocessing (missing values, scaling, encoding)
Handling class imbalance (SMOTE)
Model evaluation using multiple metrics
Deployment: Interactive Streamlit app for single and batch predictions.
Learning Outcome: End-to-end ML workflow, from data analysis to deployment and visualization.
Python 3.9+ – Core programming
Streamlit – Web app deployment
Pandas & NumPy – Data processing
Matplotlib & Seaborn – Data visualization and EDA
Scikit-learn – Model training, encoding, and evaluation
XGBoost & Random Forest – Predictive modeling
Imbalanced-learn (SMOTE) – Handling class imbalance
Pickle – Model serialization
Video Demo:
This project is for educational purposes only and should not be used for real-world business decisions without further validation.