Using mahout library for clustering algorithm a case study on healthcare data| International Journal of Innovative Science and Research Technology

Using Mahout Library for Clustering Algorithm: A Case Study on Healthcare Data

Authors : Dr. Divya Chauhan; Dr. Satpal

Volume/Issue : Volume 10 - 2025, Issue 11 - November

Google Scholar : https://tinyurl.com/4y9pz4nd

Scribd : https://tinyurl.com/58brdnm6

DOI : https://doi.org/10.38124/ijisrt/25nov1348

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Data mining techniques and algorithms worked excellently with small datasets. Data mining algorithms analysed bulk data to identify trends and draw conclusions. But most data mining tool is not efficient to process very large dataset which is the case in big data. They are not able to give quick outcomes in quick time, unless the computational tasks are run on multiple machines distributed over cloud. For process large volume of data like big data, Hadoop has adopted a new set of library for machine learning called Mahout. This paper deals with the clustering algorithms with the help of mahout library in Hadoop MapReduce environment. The real-world healthcare dataset is used which is quite large in size. The three clustering algorithms used are canopy clustering, K-Means clustering and fuzzy K-Means clustering.

Keywords : Data Mining, Big Data, Hadoop, Mahout, Clustering, Healthcare.

References :

Prachi Surwade, Prof. Satish S. Banait, “A Survey on Clustering Techniques for Mining Big Data”, International Journal of Advanced Research in Science and Management, Feburary 2016, 2(2)
Apache Mahout: https://mahout.apache.org/
T. Sajana, C. M. Sheela Rani and K. V. Narayana, “A Survey on Clustering Techniques for Big Data Mining”, Indian Journal of Science and Technology, January, 2016,9(3)
Miss. Harshada S. Deshmukh, Prof. P. L. Ramteke, “Comparing the Techniques of Cluster Analysis for Big Data”, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET, December 2015, 4(12)
Keshav Sanse, Meena Sharma, “Clustering methods for Big data analysis”, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), March 2015, 4(3)
Dr. Venkateswara Reddy Eluri, MS. Amina Salim Mohd AL-Jabri, Dr. M. Ramesh, Dr. Mare Jane, “A Comparative Study of Various Clustering Techniques on Big Data Sets using Apache Mahout”, 3rd MEC International Conference on Big Data and Smart City,2016
Fatos Xhafa, Adriana Bogza, Santi Caballé, “Performance Evaluation of Mahout Clustering Algorithms Using a Twitter Streaming Dataset” IEEE 31st International Conference on Advanced Information Networking and Applications, 2017
Van-Dai Ta, Chuan-Ming Liu, Goodwill Wandile Nkabinde, “Big Data Stream Computing in Healthcare Real-Time Analytics” IEEE International Conference on Cloud Computing and Big Data Analysis, 2016
Rui Máximo Esteves, Chunming Rong, “Using Mahout for clustering Wikipedia’s latest articles: A comparison between k-means and fuzzy c-means in the cloud” Third IEEE International Conference on Cloud Computing Technology and Science, 2011
Ahmad Al-Khoder, Hazar Harmouch, “Evaluating four of the most popular Open Source and Free Data Mining Tools” IJASR International Journal of Academic Scientific Research, 2015, 3(1)
Hoda A. Abdel Hafez, “Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity” Dubai UAE Jan 28-29, 2016
Amini A, Wah TY, Saboohi H. On density-based data streams clustering algorithms: A survey. Journal of Computer Science and Technology, Jan. 2014,29(1):116-141
Olga Kurasova, Virginijus Marcinkevicius, Viktor Medvedev, Aurimas Rapecka, and Pavel Stefanovi, “Strategies for Big Data Clustering” IEEE 26th International Conference on Tools with Artificial Intelligence, 2014
Pritika Talwar, Shubham, Komalpreet Kaur, “Exploring Clustering techniques in Machine Learning”, International Journal of Creative Research Thoughts (IJCRT), March 2024,12(3)
Aasim Ayaz Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions” Peer J Computer science, https://doi.org/10.7717/peerj-cs.2286, August 2024
Anju Parmar, Divya Chauhan, Dr. K.L. Bansal, “Performance Evaluation of Weka Clustering Algorithms on Large Datasets” International Journal of Advanced Research, 2017,5(6), 2209-2216
Annual Reports: https://www.aiims.edu/en/about-us/annual-reports.html: accessed on: 20th january, 2019
Tmc-Annual Report: https://tmc.gov.in/index.php/tmc-annual-report, Accessed on: 21^st august, 2019
Fortis bmw reports: https://www.fortismalar.com/bmw-report, accessed on: 30^th January, 2019
Apollo Hospitals: https://www.apollohospitals.com/corporate/investor-relations/financial-reports, accessed on 20^th January, 2019
Linux Uprising: https://www.linuxuprising.com/2019/05/how-to-convert-pdf-to-text-on-linux-gui.html
Clustering your data: https://mahout.apache.org/users/clustering/clusteringyourdata.html

Data mining techniques and algorithms worked excellently with small datasets. Data mining algorithms analysed bulk data to identify trends and draw conclusions. But most data mining tool is not efficient to process very large dataset which is the case in big data. They are not able to give quick outcomes in quick time, unless the computational tasks are run on multiple machines distributed over cloud. For process large volume of data like big data, Hadoop has adopted a new set of library for machine learning called Mahout. This paper deals with the clustering algorithms with the help of mahout library in Hadoop MapReduce environment. The real-world healthcare dataset is used which is quite large in size. The three clustering algorithms used are canopy clustering, K-Means clustering and fuzzy K-Means clustering.

Keywords : Data Mining, Big Data, Hadoop, Mahout, Clustering, Healthcare.

Paper Submission Last Date
31 - August - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.