Yasir Insights
Comments 0
06 Dec 2025

🎗️ Breast Cancer Prediction Model By Mirza Yasir Abdullah Baig

Introduction

Breast cancer is one of the most common cancers among women worldwide. Early detection is crucial for improving survival rates. Inspired by this, I decided to build a Breast Cancer Prediction App to leverage machine learning for assisting early detection of breast tumors.

The app predicts whether a tumor is Malignant (cancerous) or Benign (non-cancerous) based on key medical features. While it cannot replace professional diagnosis, it serves as an educational and demonstration tool for machine learning applications in healthcare.

Github Repo: https://github.com/mirzayasirabdullahbaig07/Breast-Cancer-Prediction-Model

💡 Motivation Behind Building This

Personal Interest in Healthcare AI: I wanted to explore how AI can assist in critical real-world problems like cancer detection.
Demonstrate End-to-End ML Project Skills: From data preprocessing, model building, web deployment, to interactive UI with Streamlit.
Portfolio & Interview Ready: A project that shows practical implementation of ML in healthcare, which is appealing to recruiters in AI/ML roles.
Simplifying Complexity for Users: Instead of 30 medical features, I reduced the input to 8 essential features for usability, showing UX awareness in ML applications.

🧠 What This Model Does

The model predicts the risk of breast cancer by classifying tumors into:

Malignant → Cancerous tumor (⚠️ consult a doctor)
Benign → Non-cancerous tumor (✅ no immediate concern)

It uses supervised machine learning to learn from past tumor data and provides a probabilistic prediction based on new inputs.

📊 Dataset Used

Dataset: Breast Cancer Wisconsin (Diagnostic) Dataset
Source: UCI Machine Learning Repository
Details:
- Total Features: 30 medical features per tumor
- Simplified Features Used: 8 key features (mean radius, mean texture, mean perimeter, mean area, mean smoothness, etc.) for easy input.
Classes:
- Malignant → Cancerous
- Benign → Non-cancerous

🛠️ Techniques and Workflow

The project follows an end-to-end ML pipeline:

Data Preprocessing
- Handling missing values
- Removing duplicates
- Scaling and normalizing features
- Selecting key features for prediction
Model Selection & Training
- Techniques Used:
  - Logistic Regression (Baseline)
  - Random Forest Classifier (For robustness)
  - Neural Network (TensorFlow/Keras)
- Selected Model: Random Forest Classifier (or Neural Network depending on accuracy)
- Reason:
  - Handles non-linear relationships well
  - Provides feature importance insights
  - High accuracy on the dataset with low overfitting
Model Serialization
- Saved trained model using Pickle (trained_model.sav) for deployment
Web App Development
- Framework: Streamlit
- UI Features:
  - Input form for 8 tumor features
  - Clear display of prediction result (Malignant/Benign)
  - About Me sidebar with portfolio links
- User Experience: Focused on simplicity and clarity
Prediction
- Users enter tumor features → Click Predict → Receive visual result cards
- ⚠️ Malignant → consult a doctor
- ✅ Benign → no immediate concern

💻 Tech Stack

Python 3.9+ – Programming language
Streamlit – Frontend web app development
NumPy & Pandas – Data processing
Scikit-learn – Model training and evaluation
Pickle – Model serialization for deployment
TensorFlow/Keras – Neural network model (optional)

📈 Model Performance (Interview Talking Points)

Evaluation Metrics:
- Accuracy, Precision, Recall, F1-Score
Example Results:
- Accuracy: ~97–99%
- Confusion Matrix shows minimal false negatives (critical in medical applications)
Feature Importance:
- Radius, Perimeter, Area are most influential features for tumor prediction

🚀 Demo

Live App: https://breastcancerprediction07.streamlit.app/
Video Demo:

👨‍💻 About Me

Name: Mirza Yasir Abdullah Baig
Portfolio: Kaggle, LinkedIn, GitHub

❤️ Acknowledgements

⚠️ Disclaimer

This project is for educational purposes only. It should not be used as a substitute for professional medical diagnosis.

Yasir Insights