A Scalable Data Mining Framework for Knowledge Discovery Using Distributed Big Data Analytics in Heterogeneous Systems

Diwakar Bhardwaj; M.V. Rajesh; Sathya arthi R; Talla  Prashanthi; Dr. Sowjanya  Bagadi; Mahesh Kurulekar; Sunil  Thakur; Mahendran  Arumugam

A Scalable Data Mining Framework for Knowledge Discovery Using Distributed Big Data Analytics in Heterogeneous Systems

Authors

Diwakar Bhardwaj Department of Computer Engineering & Applications, GLA University, Mathura.
M.V. Rajesh Associate Professor, Department of CSE (Data Science), Pragati Engineering College, ADB Road, Surampalem, NearPeddapuram, Kakinada District, Andhra Pradesh, India - 533437.
Sathya arthi R Assistant Professor, Department of Management Studies, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research.
Talla Prashanthi Assistant Professor, Department of Information Technology, Vardhaman College of Engineering, Shamshabad, Hyderabad, India - 501 218.
Dr. Sowjanya Bagadi Assistant Professor, School of Business, Aditya University, Surampalem, Andhra Pradesh, Pin 533437.
Mahesh Kurulekar Assistant Professor, Civil Engineering, Vishwakarma Institute of Technology, Pune, Maharashtra, 411037.
Sunil Thakur School of Engineering &Technology,Noida international University, Uttar Pradesh 203201, India.
Mahendran Arumugam Center for Global Health Research,Saveetha Medical College, Saveetha Institute of Medical and Technical Sciences, Chennai, India.

Keywords:

Big Data Analytics, Distributed Data Mining, Knowledge Discovery, Heterogeneous Systems, Machine Learning, Apache Spark, Hadoop, Scalability, Distributed Computing.

Abstract

The massive increase in structured and unstructured computing resources in the form of cloud platforms, IoT devices, distributed networks, enterprise systems, among others, has made big data analytics a critical area of research. Conventional data mining methods tend to have serious problems with big data due to the physical unscalability of these methods, excessive computational cost, latency and inefficient use of resources with a distributed system. These issues require scalable and efficient frameworks which can handle large quantities of heterogeneous data and guarantee the correct knowledge discovery. This study presents a scalable distributed data mining architecture that will be used to boost knowledge discovery by big data analytics in heterogeneous systems. The framework combines Apache Hadoop and Apache Spark to have the ability of efficient distributed stored data, parallel computing and in-memory computing. The proposed model takes into account machine learning and data mining algorithms such as Random Forest, Decision tree, K-Means clustering and FP-Growth to do the job of classification, clustering and pattern extraction effectively with the distributed datasets. The framework is analyzed based on various machine learning and distributed system performances metrics like accuracy, precision, recall, F1-score, execution time, scalability, throughput, and resource utilization. It has been shown in experiments that the suggested framework greatly enhances processing efficiency, scalability, and predictive performance of traditional centralized systems and Hadoop systems and provides efficient and reliable knowledge discovery in large-scale heterogeneous settings.

Downloads

Published

2026-05-12

How to Cite

Bhardwaj, D., Rajesh, M., R, S. arthi, Prashanthi, T., Bagadi, D. S., Kurulekar, M., … Arumugam, M. (2026). A Scalable Data Mining Framework for Knowledge Discovery Using Distributed Big Data Analytics in Heterogeneous Systems. International Journal of Artificial Intelligence and Machine Learning, 6(2s), 499–509. Retrieved from https://svedbergopen.com/index.php/ijaiml/article/view/232

Download Citation

Issue

Vol. 6 No. 2s (2026): IJAIML_VOL.6_NO.2s 2026

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Most read articles by the same author(s)

Dr. Mugilan D, Keerthika K, Sathya Arthi R, Ortikov Elyor Abdumajidovich, Xaitboy Uralov, Gafur Namazov, Exploring The Use of Explainable Ai (Xai) For Personalized Educational Feedback Systems , International Journal of Artificial Intelligence and Machine Learning: Vol. 6 No. 2s (2026): IJAIML_VOL.6_NO.2s 2026
Dr. Nidhi Srivastava , Dr. Soumya Surath Panda, Dr. Vijay J. Upadhye, Sunil Thakur, Shailesh Kulkarni, Shanthi R, Malarvizhi S, Monisha J, An Explainable Artificial Intelligence Approach for Early Disease Prediction and Risk Assessment Using Healthcare Big Data , International Journal of Artificial Intelligence and Machine Learning: Vol. 6 No. 2s (2026): IJAIML_VOL.6_NO.2s 2026
Rakesh Kumar, Y Vijay Kumar, Kanchana K, Sameera Khan, Dr. Ravi Thangjam, Rajesh Raikwar, Paul Praveen Albert Selvakumar, Mahendran Arumugam, Reinforcement Learning-Driven Autonomous Navigation System for Mobile Robots in Unstructured and Dynamic Terrains , International Journal of Artificial Intelligence and Machine Learning: Vol. 6 No. 2s (2026): IJAIML_VOL.6_NO.2s 2026
Anshy Singh, K Saroja Rani, Monisha J, Yugandhar Manchala, Dr. Sowjanya Bagadi, Suhas Bhise, Durga Prasad, Explainable Artificial Intelligence Models for Interpretable Decision-Making in High-Stakes Applications , International Journal of Artificial Intelligence and Machine Learning: Vol. 6 No. 2s (2026): IJAIML_VOL.6_NO.2s 2026
Rohit Agarwal, Vinod Kumar Naidu Pamuluri, Sathya arthi R, Ala Rajitha, Dr. Ravi Thangjam, Prashant Anerao, Deepika Sharma, Human-AI Collaborative Systems: Cognitive Computing Approaches for Enhancing User Interaction and Decision Support , International Journal of Artificial Intelligence and Machine Learning: Vol. 6 No. 2s (2026): IJAIML_VOL.6_NO.2s 2026

A Scalable Data Mining Framework for Knowledge Discovery Using Distributed Big Data Analytics in Heterogeneous Systems

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

Make a Submission

INDEXING

Developed By

Information

Browse

Current Issue