Predicting Lung Cancer Using Machine Learning


Authors : Dr. A. Srinivasa Rao; Challa Manikanta; Akuri Azad; Kandagatla Rushikesh; Polavarapu Rohitha

Volume/Issue : RISEM–2025

Google Scholar : https://tinyurl.com/mv478wh5

Scribd : https://tinyurl.com/4m3t6nnj

DOI : https://doi.org/10.38124/ijisrt/25jun173

Abstract : Accurate and timely forecasting is essential for improving outcomes for patients because lung cancer is still one of the deadly illnesses in the world because of its late-stage detection and lack of effective early detection techniques. Conventional diagnostic methods frequently involve invasive procedures, and although medical imaging methods like CT scans and X-rays offer useful information they need to be interpreted by professionals and can occasionally result in incorrect diagnoses or postponed treatment. Furthermore, cancer of the lung risk is influenced by various variables including demographic variables (such as gender and age), daily behaviors (such as smoking) and signs of disease (such as persistent cough and other lung symptoms). This study uses machine learning approaches to create a strong lung cancer prediction model based on a large dataset that includes clinical, lifestyle and demographic characteristics in order to overcome these obstacles and improve predicted accuracy. To guarantee data quality and dependability prior to model training the dataset is subjected to comprehensive exploratory data analysis (EDA), preprocessing and feature scaling. Key performance metrics like accuracy, mean squared error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are used to implement and assess a variety of machine learning models, such as Deep Neural Networks (DNN), Decision Trees and Random Forests The Decision Tree and Random Forest models perform noticeably better with accuracies of 90.32% and 93.54%, respectively whereas the DNN model performs sub optimally, according to preliminary results, with an accuracy of 12.62%. The Bagging Classifier is used to further optimize the Random Forest model which has the best accuracy in order to improve performance and stability.

Keywords : CNN, Data Augmentation, Disease Detection, Plant Health, Real-Time Detection and Streamlit.

Accurate and timely forecasting is essential for improving outcomes for patients because lung cancer is still one of the deadly illnesses in the world because of its late-stage detection and lack of effective early detection techniques. Conventional diagnostic methods frequently involve invasive procedures, and although medical imaging methods like CT scans and X-rays offer useful information they need to be interpreted by professionals and can occasionally result in incorrect diagnoses or postponed treatment. Furthermore, cancer of the lung risk is influenced by various variables including demographic variables (such as gender and age), daily behaviors (such as smoking) and signs of disease (such as persistent cough and other lung symptoms). This study uses machine learning approaches to create a strong lung cancer prediction model based on a large dataset that includes clinical, lifestyle and demographic characteristics in order to overcome these obstacles and improve predicted accuracy. To guarantee data quality and dependability prior to model training the dataset is subjected to comprehensive exploratory data analysis (EDA), preprocessing and feature scaling. Key performance metrics like accuracy, mean squared error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are used to implement and assess a variety of machine learning models, such as Deep Neural Networks (DNN), Decision Trees and Random Forests The Decision Tree and Random Forest models perform noticeably better with accuracies of 90.32% and 93.54%, respectively whereas the DNN model performs sub optimally, according to preliminary results, with an accuracy of 12.62%. The Bagging Classifier is used to further optimize the Random Forest model which has the best accuracy in order to improve performance and stability.

Keywords : CNN, Data Augmentation, Disease Detection, Plant Health, Real-Time Detection and Streamlit.

CALL FOR PAPERS


Paper Submission Last Date
30 - November - 2025

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe