Abstract:
Stroke is a leading cause of death and long-term disability globally. Timely and accurate prediction of stroke risk can significantly improve patient outcomes and aid in early intervention. This project presents a web-based stroke prediction system that integrates multiple machine learning models trained on health data. The system allows users to register/login, enter personal and medical information, and select from various models (Logistic Regression, SVM, Random Forest, Naïve Bayes, and Gradient Boosting) for stroke risk prediction. To address the dataset’s inherent class imbalance, SMOTE is applied during training to enhance model sensitivity. The backend is developed using Flask, with MySQL handling user authentication and data storage. The system ensures reliable, scalable, and user-friendly stroke risk prediction for clinical and educational use.
Introduction:
Stroke is a medical emergency that occurs when blood flow to a part of the brain is interrupted or reduced, depriving brain tissue of oxygen and nutrients. Early detection of stroke risk factors such as hypertension, diabetes, age, and lifestyle choices plays a crucial role in prevention. With the increasing digitization of healthcare and availability of data, machine learning has shown significant potential in predicting disease outcomes.
This project leverages machine learning algorithms to predict the likelihood of a patient experiencing a stroke based on input parameters such as age, gender, BMI, average glucose level, hypertension, and smoking status. Unlike traditional approaches, which rely on a single model, this system enables users to select among multiple pre-trained models to observe different prediction results and accuracy levels.
The web application is built using the Flask framework, providing a lightweight and responsive interface for users. A MySQL database is integrated to securely manage user registration, authentication, and data storage. SMOTE (Synthetic Minority Over-sampling Technique) is used during training to resolve the class imbalance problem, which commonly biases models toward the majority class (no stroke).
This solution offers a comprehensive platform for patients, healthcare providers, and researchers to interactively explore predictive modeling and receive insights into stroke likelihood. By merging data science and user-centered design, the system promotes awareness, prevention, and clinical decision support for one of the world’s leading health threats.
Problem Statement:
Due to class imbalance and lack of integrated predictive tools, early detection of stroke is challenging. There’s a need for an intelligent, user-interactive system that offers reliable predictions using diverse ML models and handles medical data responsibly.
Existing System and Disadvantages:
- Manual prediction using clinical rules and risk factors
- Lack of automation and intelligent model selection
- Single-model dependency
- Poor sensitivity toward minority stroke class
- No centralized user interface for patients or providers
Proposed System and Advantages:
- A web-based system with login, registration, and prediction features
- Integration of five different ML algorithms for comparative prediction
- SMOTE applied for class balancing to improve stroke prediction accuracy
- Easy-to-use interface developed using Flask
- MySQL backend for secure user management and data storage
Modules:
- User Authentication Module: Register/Login using MySQL
- Model Training Module: ML training with Logistic Regression, SVM, Random Forest, Naïve Bayes, Gradient Boosting
- SMOTE Enhancement Module: Class balancing during training
- Prediction Module: Allows model selection and user input for stroke prediction
- Result Visualization Module: Displays prediction and probability
- Admin Management
Algorithms/Models Used:
- Logistic Regression
- Support Vector Machine (SVM)
- Random Forest Classifier
- Gaussian Naïve Bayes
- Gradient Boosting Classifier
Software Requirements:
- Python 3.8+
- Flask Framework
- MySQL Database
- HTML, CSS (for UI)
- Required Python packages: sklearn, joblib, pymysql, imblearn
Hardware Requirements:
- 64-bit processor, 4 GB RAM (minimum)
- Localhost or Web Hosting (for deployment)
- MySQL server installed
Conclusion:
The developed system efficiently predicts stroke risk using multiple machine learning models and provides a comparison-based prediction strategy for users. It enhances accuracy by mitigating class imbalance through SMOTE and supports medical professionals in making informed decisions. The interface is intuitive, secure, and adaptable to real-world scenarios.
Future Enhancements:
- Integration with real-time health APIs (like wearable health trackers)
- Role-based access (doctor/patient/admin)
- Model auto-update based on feedback
- Integration with EHR (Electronic Health Records)
- Cloud deployment with scalability


Reviews
There are no reviews yet.