CRISP-MED-DM a Methodology of Diagnosing Breast Cancer


Authors : Bouden Halima; Noura Aknin; Achraf Taghzati; Siham Hadoudou; Mouhamed Chrayah

Volume/Issue : Volume 9 - 2024, Issue 1 - January

Google Scholar : http://tinyurl.com/32mxbprz

Scribd : http://tinyurl.com/4jswwkfu

DOI : https://doi.org/10.5281/zenodo.10553959

Abstract : The aim of this study was to assess the applicability of knowledge discovery in database methodology, based upon DM techniques, to predict breast cancer. Following this methodology, we present a comparison between different classifiers or multi- classifiers fusion with respect to accuracy in discovering breast cancer for three different data sets, by using classification accuracy and confusion matrix based on a supplied test set method. We present an implementation among various classification techniques, which represent the most known algorithms in this field on three different datasets of breast cancer. To get the most suitable results we had referred to attribute selection, using GainRatioAttributeEval that measure how each feature contributes in decreasing the overall entropy. The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion, the results show that accuracy improved, and feature selection methods did not have a strong influence on WDBC and WPBC datasets, but in WBC the selected attributes (Uniformity of Cell Size, Mitoses, Clump thickness, Bare Nuclei, Single Epithelial cell size, Marginal adhesion, Bland Chromatin and Class) improved the accuracy.

Keywords : Data Mining Methodology, CRISP-DM, Healthcare, Breast Cancer, Classification.

The aim of this study was to assess the applicability of knowledge discovery in database methodology, based upon DM techniques, to predict breast cancer. Following this methodology, we present a comparison between different classifiers or multi- classifiers fusion with respect to accuracy in discovering breast cancer for three different data sets, by using classification accuracy and confusion matrix based on a supplied test set method. We present an implementation among various classification techniques, which represent the most known algorithms in this field on three different datasets of breast cancer. To get the most suitable results we had referred to attribute selection, using GainRatioAttributeEval that measure how each feature contributes in decreasing the overall entropy. The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion, the results show that accuracy improved, and feature selection methods did not have a strong influence on WDBC and WPBC datasets, but in WBC the selected attributes (Uniformity of Cell Size, Mitoses, Clump thickness, Bare Nuclei, Single Epithelial cell size, Marginal adhesion, Bland Chromatin and Class) improved the accuracy.

Keywords : Data Mining Methodology, CRISP-DM, Healthcare, Breast Cancer, Classification.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe