A Modified Overlapping Partitioning Clustering Algorithm for Categorical Data Clustering

Mohammad Alaqtash, Moayad A.Fadhil, Ali F. Al-Azzawi

Abstract


Clustering is one of the important approaches for Clustering enables the grouping of unlabeled data by partitioning data into clusters with similar patterns. Over the past decades, many clustering algorithms have been developed for various clustering problems. An overlapping partitioning clustering (OPC) algorithm can only handle numerical data. Hence, novel clustering algorithms have been studied extensively to overcome this issue. By increasing the number of objects belonging to one cluster and distance between cluster centers, the study aimed to cluster the textual data type without losing the main functions. The proposed study herein included over twenty newsgroup dataset, which consisted of approximately 20000 textual documents. By introducing some modifications to the traditional algorithm, an acceptable level of homogeneity and completeness of clusters were generated. Modifications were performed on the pre-processing phase and data representation, along with the number methods which influence the primary function of the algorithm. Subsequently, the results were evaluated and compared with the k-means algorithm of the training and test datasets. The results indicated that the modified algorithm could successfully handle the categorical data and produce satisfactory clusters.

Keywords


Categorical clustering; Cosine similarity; Feature extraction; Partitioning clustering; Unsupervised learning; Vector space model

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v7i1.896

Refbacks

  • There are currently no refbacks.




Bulletin of EEI Stats