Breast cancer is one of the most common cancers among women worldwide. Early detection is crucial for improving survival rates. Inspired by this, I decided to build a Breast Cancer Prediction App to leverage machine learning for assisting early detection of breast tumors.
The app predicts whether a tumor is Malignant (cancerous) or Benign (non-cancerous) based on key medical features. While it cannot replace professional diagnosis, it serves as an educational and demonstration tool for machine learning applications in healthcare.
Github Repo: https://github.com/mirzayasirabdullahbaig07/Breast-Cancer-Prediction-Model
Personal Interest in Healthcare AI: I wanted to explore how AI can assist in critical real-world problems like cancer detection.
Demonstrate End-to-End ML Project Skills: From data preprocessing, model building, web deployment, to interactive UI with Streamlit.
Portfolio & Interview Ready: A project that shows practical implementation of ML in healthcare, which is appealing to recruiters in AI/ML roles.
Simplifying Complexity for Users: Instead of 30 medical features, I reduced the input to 8 essential features for usability, showing UX awareness in ML applications.
The model predicts the risk of breast cancer by classifying tumors into:
Malignant → Cancerous tumor (⚠️ consult a doctor)
Benign → Non-cancerous tumor (✅ no immediate concern)
It uses supervised machine learning to learn from past tumor data and provides a probabilistic prediction based on new inputs.
Dataset: Breast Cancer Wisconsin (Diagnostic) Dataset
Source: UCI Machine Learning Repository
Details:
Total Features: 30 medical features per tumor
Simplified Features Used: 8 key features (mean radius, mean texture, mean perimeter, mean area, mean smoothness, etc.) for easy input.
Classes:
Malignant → Cancerous
Benign → Non-cancerous
The project follows an end-to-end ML pipeline:
Data Preprocessing
Handling missing values
Removing duplicates
Scaling and normalizing features
Selecting key features for prediction
Model Selection & Training
Techniques Used:
Logistic Regression (Baseline)
Random Forest Classifier (For robustness)
Neural Network (TensorFlow/Keras)
Selected Model: Random Forest Classifier (or Neural Network depending on accuracy)
Reason:
Handles non-linear relationships well
Provides feature importance insights
High accuracy on the dataset with low overfitting
Model Serialization
Saved trained model using Pickle (trained_model.sav
) for deployment
Web App Development
Framework: Streamlit
UI Features:
Input form for 8 tumor features
Clear display of prediction result (Malignant/Benign)
About Me sidebar with portfolio links
User Experience: Focused on simplicity and clarity
Prediction
Users enter tumor features → Click Predict → Receive visual result cards
⚠️ Malignant → consult a doctor
✅ Benign → no immediate concern
Python 3.9+ – Programming language
Streamlit – Frontend web app development
NumPy & Pandas – Data processing
Scikit-learn – Model training and evaluation
Pickle – Model serialization for deployment
TensorFlow/Keras – Neural network model (optional)
Evaluation Metrics:
Accuracy, Precision, Recall, F1-Score
Example Results:
Accuracy: ~97–99%
Confusion Matrix shows minimal false negatives (critical in medical applications)
Feature Importance:
Radius, Perimeter, Area are most influential features for tumor prediction
Video Demo:
This project is for educational purposes only. It should not be used as a substitute for professional medical diagnosis.