A Scalable Data Mining Framework for Knowledge Discovery Using Distributed Big Data Analytics in Heterogeneous Systems

Authors

  • Diwakar Bhardwaj Department of Computer Engineering & Applications, GLA University, Mathura.
  • M.V. Rajesh Associate Professor, Department of CSE (Data Science), Pragati Engineering College, ADB Road, Surampalem, NearPeddapuram, Kakinada District, Andhra Pradesh, India - 533437.
  • Sathya arthi R Assistant Professor, Department of Management Studies, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research.
  • Talla Prashanthi Assistant Professor, Department of Information Technology, Vardhaman College of Engineering, Shamshabad, Hyderabad, India - 501 218.
  • Dr. Sowjanya Bagadi Assistant Professor, School of Business, Aditya University, Surampalem, Andhra Pradesh, Pin 533437.
  • Mahesh Kurulekar Assistant Professor, Civil Engineering, Vishwakarma Institute of Technology, Pune, Maharashtra, 411037.
  • Sunil Thakur School of Engineering &Technology,Noida international University, Uttar Pradesh 203201, India.
  • Mahendran Arumugam Center for Global Health Research,Saveetha Medical College, Saveetha Institute of Medical and Technical Sciences, Chennai, India.

Keywords:

Big Data Analytics, Distributed Data Mining, Knowledge Discovery, Heterogeneous Systems, Machine Learning, Apache Spark, Hadoop, Scalability, Distributed Computing.

Abstract

The massive increase in structured and unstructured computing resources in the form of cloud platforms, IoT devices, distributed networks, enterprise systems, among others, has made big data analytics a critical area of research. Conventional data mining methods tend to have serious problems with big data due to the physical unscalability of these methods, excessive computational cost, latency and inefficient use of resources with a distributed system. These issues require scalable and efficient frameworks which can handle large quantities of heterogeneous data and guarantee the correct knowledge discovery. This study presents a scalable distributed data mining architecture that will be used to boost knowledge discovery by big data analytics in heterogeneous systems. The framework combines Apache Hadoop and Apache Spark to have the ability of efficient distributed stored data, parallel computing and in-memory computing. The proposed model takes into account machine learning and data mining algorithms such as Random Forest, Decision tree, K-Means clustering and FP-Growth to do the job of classification, clustering and pattern extraction effectively with the distributed datasets. The framework is analyzed based on various machine learning and distributed system performances metrics like accuracy, precision, recall, F1-score, execution time, scalability, throughput, and resource utilization. It has been shown in experiments that the suggested framework greatly enhances processing efficiency, scalability, and predictive performance of traditional centralized systems and Hadoop systems and provides efficient and reliable knowledge discovery in large-scale heterogeneous settings.

Downloads

Published

2026-05-12

How to Cite

Bhardwaj, D., Rajesh, M., R, S. arthi, Prashanthi, T., Bagadi, D. S., Kurulekar, M., … Arumugam, M. (2026). A Scalable Data Mining Framework for Knowledge Discovery Using Distributed Big Data Analytics in Heterogeneous Systems. International Journal of Artificial Intelligence and Machine Learning, 6(2s), 499–509. Retrieved from https://svedbergopen.com/index.php/ijaiml/article/view/232

Most read articles by the same author(s)

Similar Articles

<< < 13 14 15 16 17 18 19 > >> 

You may also start an advanced similarity search for this article.