Authors :
Bouden Halima; Noura Aknin; Achraf Taghzati; Siham Hadoudou; Mouhamed Chrayah
Volume/Issue :
Volume 9 - 2024, Issue 1 - January
Google Scholar :
http://tinyurl.com/32mxbprz
Scribd :
http://tinyurl.com/4jswwkfu
DOI :
https://doi.org/10.5281/zenodo.10553959
Abstract :
The aim of this study was to assess the
applicability of knowledge discovery in database
methodology, based upon DM techniques, to predict
breast cancer. Following this methodology, we present a
comparison between different classifiers or multi-
classifiers fusion with respect to accuracy in discovering
breast cancer for three different data sets, by using
classification accuracy and confusion matrix based on a
supplied test set method. We present an implementation
among various classification techniques, which represent
the most known algorithms in this field on three
different datasets of breast cancer. To get the most
suitable results we had referred to attribute selection,
using GainRatioAttributeEval that measure how each
feature contributes in decreasing the overall entropy.
The experimental results show that no classification
technique is better than the other if used for all datasets,
since the classification task is affected by the type of
dataset. By using multi-classifiers fusion, the results
show that accuracy improved, and feature selection
methods did not have a strong influence on WDBC and
WPBC datasets, but in WBC the selected attributes
(Uniformity of Cell Size, Mitoses, Clump thickness, Bare
Nuclei, Single Epithelial cell size, Marginal adhesion,
Bland Chromatin and Class) improved the accuracy.
Keywords :
Data Mining Methodology, CRISP-DM, Healthcare, Breast Cancer, Classification.
The aim of this study was to assess the
applicability of knowledge discovery in database
methodology, based upon DM techniques, to predict
breast cancer. Following this methodology, we present a
comparison between different classifiers or multi-
classifiers fusion with respect to accuracy in discovering
breast cancer for three different data sets, by using
classification accuracy and confusion matrix based on a
supplied test set method. We present an implementation
among various classification techniques, which represent
the most known algorithms in this field on three
different datasets of breast cancer. To get the most
suitable results we had referred to attribute selection,
using GainRatioAttributeEval that measure how each
feature contributes in decreasing the overall entropy.
The experimental results show that no classification
technique is better than the other if used for all datasets,
since the classification task is affected by the type of
dataset. By using multi-classifiers fusion, the results
show that accuracy improved, and feature selection
methods did not have a strong influence on WDBC and
WPBC datasets, but in WBC the selected attributes
(Uniformity of Cell Size, Mitoses, Clump thickness, Bare
Nuclei, Single Epithelial cell size, Marginal adhesion,
Bland Chromatin and Class) improved the accuracy.
Keywords :
Data Mining Methodology, CRISP-DM, Healthcare, Breast Cancer, Classification.