Customer Churn Prediction Model By Mirza Yasir Abdullah Baig - Yasir Insights

  Hire Me:

ML/AI Engineer

+92 322 7297049

Customer Churn Prediction Model By Mirza Yasir Abdullah Baig
  • Yasir Insights
  • Comments 0
  • 09 Oct 2025

📉 Customer Churn Prediction Model By Mirza Yasir Abdullah Baig

Introduction

Customer churn is one of the most critical issues for telecom companies, as losing customers directly impacts revenue and growth. I built this Customer Churn Prediction App to demonstrate how machine learning can be used to predict whether a telecom customer is likely to leave the company (churn) or stay. The app allows businesses to proactively retain customers and optimize their marketing strategies while serving as an educational project showcasing end-to-end ML implementation.

Github Repo: https://github.com/mirzayasirabdullahbaig07/Customer_Churn_Prediction_Model


Motivation & Intuition Behind the Project

The main motivation was to explore how AI can solve real-world business problems. Customer churn prediction is a high-impact application of machine learning because it helps companies reduce revenue loss and improve customer retention. Additionally, I aimed to create a project that covers the entire ML lifecycle: from data preprocessing and exploratory data analysis to model building, evaluation, and deployment in a web app.


Dataset & Features

The project uses the Telco Customer Churn Dataset from IBM Sample Data, which contains both categorical and numerical features about customers:

  • Demographics: Gender, SeniorCitizen, Partner, Dependents

  • Account Information: Tenure, Contract type, Payment method, Paperless billing

  • Services: Phone service, Internet, Online security, Streaming services, Tech support

  • Charges: MonthlyCharges, TotalCharges

The target variable is Churn:

  • Yes → Customer will leave

  • No → Customer will stay

The dataset required preprocessing, including handling missing values, encoding categorical features, scaling numeric features, and addressing class imbalance using SMOTE.


Techniques & Model Used

I explored multiple machine learning models, including Random Forest and XGBoost, to identify the best-performing approach.

  • Why Random Forest/XGBoost:

    • Handles high-dimensional data with mixed categorical and numerical features

    • Robust to overfitting

    • Provides feature importance insights

  • Data Handling & Preprocessing:

    • Label encoding for categorical variables

    • Feature scaling for numerical columns

    • SMOTE to balance the dataset for accurate predictions

Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, and AUC to ensure reliable predictions. The models achieved high predictive performance, minimizing false positives and false negatives, which is critical in customer retention scenarios.


Web App & User Experience

The model is deployed using Streamlit, providing a user-friendly interface:

  • Single Customer Prediction: Users input customer details and receive a prediction (Yes → will churn, No → will stay).

  • Batch Prediction: Users upload a CSV file containing multiple customer profiles to receive batch predictions.

  • Visuals: Probability-based predictions and clear charts to help interpret results.

The app also features a modern sidebar with portfolio links and serves as an educational tool for demonstrating real-world ML deployment.


Interview-Ready Points

If asked to explain this project in an interview, you can highlight:

  1. Problem Motivation: Reducing customer churn is critical for revenue optimization in telecom.

  2. Dataset Understanding: Mixed categorical and numerical features; target variable is churn.

  3. Model Choice: Random Forest and XGBoost for high accuracy, interpretability, and handling complex feature relationships.

  4. Techniques Used:

    • Data preprocessing (missing values, scaling, encoding)

    • Handling class imbalance (SMOTE)

    • Model evaluation using multiple metrics

  5. Deployment: Interactive Streamlit app for single and batch predictions.

  6. Learning Outcome: End-to-end ML workflow, from data analysis to deployment and visualization.


Tech Stack

  • Python 3.9+ – Core programming

  • Streamlit – Web app deployment

  • Pandas & NumPy – Data processing

  • Matplotlib & Seaborn – Data visualization and EDA

  • Scikit-learn – Model training, encoding, and evaluation

  • XGBoost & Random Forest – Predictive modeling

  • Imbalanced-learn (SMOTE) – Handling class imbalance

  • Pickle – Model serialization


Demo & Screenshots


Disclaimer

This project is for educational purposes only and should not be used for real-world business decisions without further validation.

Blog Shape Image Blog Shape Image

Leave a Reply

Your email address will not be published. Required fields are marked *