The effectiveness of big data classification control based on principal component analysis

Mostafa Abdulghafoor Mohammed, Mostafa Mahmood Akawee, Ziyad Hussien Saleh, Raed Abdulkareem Hasan, Ahmed Hussein Ali, Tole Sutikno

Abstract


Large-scale datasets are becoming more common, yet they can be challenging to understand and interpret. When dealing with big datasets, principal component analysis (PCA) is used to minimize the dimensionality of the data while maintaining interpretability and avoiding information loss. It accomplishes this by producing new uncorrelated variables that gradually reduce the variance of the system. In the field of data analysis, PCA is a multivariate statistical technique commonly used to obtain rules explaining the separation of groups in a given situation. Classes are predicted using a classification algorithm, a supervised learning technique that indicates which type of data points will be presented. Creating a classification model using classification algorithms is required before any successful classification can be achieved. It is possible to predict the future using a variety of categorized strategies. It is necessary to reduce the dimensionality of data sets using the PCA approach. This article will begin by introducing the basic ideas of PCA and discussing what it can and cannot do. It will then describe some variants of PCA and their application and then shows how PCA improves the performance using a series of experiments.

Keywords


Big data; Classification algorithm; Interpretability; Large-scale datasets; Machine learning; Principal component analysis; Spark ecosystem

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v12i1.4405

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats