Fuzzy Multicore Clustering of Big Data in the Hadoop Map Reduce Framework
Subject Areas : ICTSeyed Omid Azarkasb 1 * , Seyed Hossein Khasteh 2 , Mostafa Amiri 3
1 - K.N. Toosi University of Technology
2 -
3 - K.N. Toosi University of Technology
Keywords: Big Data Clustering, Fuzzy Multicore Learning, Hadoop Map Reduce, Task Scheduling, Cloud Computing, Pattern Recognition.,
Abstract :
A logical solution to consider the overlap of clusters is assigning a set of membership degrees to each data point. Fuzzy clustering, due to its reduced partitions and decreased search space, generally incurs lower computational overhead and easily handles ambiguous, noisy, and outlier data. Thus, fuzzy clustering is considered an advanced clustering method. However, fuzzy clustering methods often struggle with non-linear data relationships. This paper proposes a method based on feasible ideas that utilizes multicore learning within the Hadoop map reduce framework to identify inseparable linear clusters in complex big data structures. The multicore learning model is capable of capturing complex relationships among data, while Hadoop enables us to interact with a logical cluster of processing and data storage nodes instead of interacting with individual operating systems and processors. In summary, the paper presents the modeling of non-linear data relationships using multicore learning, determination of appropriate values for fuzzy parameterization and feasibility, and the provision of an algorithm within the Hadoop map reduce model. The experiments were conducted on one of the commonly used datasets from the UCI Machine Learning Repository, as well as on the implemented CloudSim dataset simulator, and satisfactory results were obtained.According to published studies, the UCI Machine Learning Repository is suitable for regression and clustering purposes in analyzing large-scale datasets, while the CloudSim dataset is specifically designed for simulating cloud computing scenarios, calculating time delays, and task scheduling.
.
[1] S.M. Razavi, M. Kashani, S. Paydar, "Big Data Fuzzy C-Means Algorithm based on Bee Colony Optimization using an Apache Hbase", Journal of Big Data, Vol. 8, Article Number: 64, 2021.
[2] X. Liu, X. Zhu, M. Li, L. Wang, E. zhu, T. Liu, M. Kloft, D. Shen, J. Yin, W. Gao, “Multiple Kernel k-Means with Incomplete Kernels”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, No. 5, pp.1191-1204, 2020.
[3] R. K. Sanodiya, S. Saha, J. Mathew, “A Kernel Semi-Supervised Distance Metric Learning with Relative Distance: Integration with a MOO Approach”, Expert Systems with Applications, Elsevier, Vol. 125, pp. 233-248, 2019.
[4] M. Soleymani Baghshah, S. Bagheri Shouraki, “Efficient Kernel Learning from Constraints and Unlabeled Data”, 20th International Conference on Pattern Recognition, Istanbul, Turkey, pp. 3364-3367, 2010.
[5] S. Zhu, D. Wang, T. Li, “Data Clustering with Size Constraints”, Knowledge-Based Systems, Elsevier, Vol. 23, pp. 883-889, 2010.
[6] L. A. Maraziotis, “A Semi-Supervised Fuzzy Clustering Algorithm Applied to Gene Expression Data”, Pattern Recognition, Elsevier, Vol. 45, pp. 637-648, 2014.
[7] J. Bezdek, R. Ehrlich, W. Full, “FCM: the Fuzzy C-Means Clustering Algorithm”, Computers & Geosciences, Elsevier Vol. 10, Issue. 2-3, pp. 191-203, 1984.
[8] O. Ozdemir, A. Kaya, “Comparison of FCM, PCM, FPCM and PFCM Algorithms in Clustering Methods”, Afyon Kocatepe University Journal of Science and Engineering, pp. 92-102, 2019.
[9] M. A. Lopez Felip, T. J. Davis, T. D. Frank, J.A. Dixon, “A Cluster Phase Analysis for Collective Behavior in Team Sports”, Human Movement Science, Elsevier, Vol. 59, pp. 96-111, 2018.
[10] F. Hai Jun, W. Xiao Hong, M. Han Ping, W. Bin, “Fuzzy Entropy Clustering using Possibilistic Approach”, Advanced in Control Engineering and Information Science, Elsevier, Procedia Engineering Vol. 15, pp.1993-1997, 2011.
[11] M. Bouzbida, L. Hassine, A. Chaari, “Robust Kernel Clustering Algorithm for Nonlinear System Identification” Hindawi, Mathematical Problems in Engineering, pp. 1-11, 2017.
[12] J. Dean, S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters", Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, pp. 137-150, 2004.
[13] L. Jiamin and F. Jun, "A Survey of MapReduce based Parallel Processing Technologies", China Communications, Vol. 11, Issue. 14, pp. 146–155, 2014.
[14] W. Zhao, H. Ma, Q. He, "Parallel K-Means Clustering based on MapReduce, in Cloud Computing", IEEE International Conference on Cloud Computing, pp. 674-679, Part of the Lecture Notes in Computer Science book series (LNCS, volume 5931), 2009.
[15] H. Bei, Y. Mao, W. Wang, X. Zhang, "Fuzzy Clustering Method Based on Improved Weighted Distance", Mathematical Problem in Engineering, Vol. 5, Hindawi, 2021.
[16] S.A.Ludwig, "MapReduce-based Fuzzy C-Means Clustering Algorithm: Implementation and Scalability", International Journal of Machine Learning and Cybernetics, pp.923-934, Copyright owner: Springer-Verlag Berlin Heidelberg, 2015.
[17] J. Ramisingh, V. Bhuvaneswari, "An Integrated Multi-Node Hadoop Framework to Predict High-Risk Factors of Diabetes Mellitus using a Multilevel MapReduce based Fuzzy Classifier (MMR-FC) and Modified DBSCAN Algorithm", Applied Soft Computing, Vol. 108, 2021.
[18] A. A. Abin, H. Beigy “Active Constrained Fuzzy Clustering: A Multiple Kernels Learning Approach”, Pattern Recognition, Elsevier, Vol. 48, Issue. 3, pp. 935-967, 2015.
[19] UCI, Machine Learning Repository, Center for Machine Learning and Intelligent Systems, https://archive.ics.uci.edu/ml/index.php, site visit: 2021.