Performance evaluation of feature selections on some ML approaches for diagnosing the narcissistic personality disorder

ABSTRACT


INTRODUCTION
The problem of mental health is a serious issue in modern society.It usually refers to a person's mood, thoughts, and behavior [1].One of the leading causes of suicide is poor mental health [2].A personality disorder is one type of mental health disorder that interferes with ways of thinking, understanding situations, and relating to others.A person with a personality disorder can self-destruct in any situation [3].There are several types of personality disorders, such as narcissistic, paranoid, schizoid, antisocial, borderline, and so on.A 2019 World Health Organization (WHO) survey shows nearly a billion people have a mental disorder [4].People with mental disorders are frequently stigmatized in the community, which can have a negative impact on them [5].
Narcissism is a common mental disorder [6].The characteristics of narcissistic personality disorder (NPD) narcissism are grandiosity, a need for praise, and a lack of empathy [7].It is becoming more common in everyday life, particularly in social media.Those suffering from NPD frequently exhibit an inability to maintain relationships and their jobs.The NPD causes problems in many areas of life, including relationships, work, school, and finances.People with NPD may become dissatisfied and disillusioned when they do not receive special favors or adoration, they believe they are entitled to.They may not enjoy being around them because their relationships are not rewarding [8], which may cause them to act violently and aggressively [9].Early personality disorder diagnosis is critical for understanding people's psychological conditions [10].As a result, we can take steps to help people with personality disorders [11].It is done based on appearance or behavior, which is usually a complex and difficult task.It requires the medical science of psychology [12].It usually employs test screening.However, it would be costly and time-consuming for a large population.Furthermore, diagnostic procedures have the unintended consequence of discouraging healthy people from participating.Psychological problems consequently frequently go unrecognized or addressed.Machine learning (ML) is one method for predicting mental health disorders quickly and accurately.In recent years, ML techniques have been used in a variety of medical research projects, particularly in biomedical and neuroscience, to gain a better understanding of mental health issues [13].
ML enables computers and other computing systems to autonomously learn from the past and advance without explicit human programming [11].ML is based on the creation of computer programs that can access data and figure things out for themselves.It is very effective in the healthcare industry, where there is a lot of data.In that case, the prediction model produced will be superior, free of human error, and will reduce the time required for diagnostics.ML is a significant advancement in computer science and data processing techniques that can be used to improve almost any service [14].This allows for the formation of a pattern in the data, allowing for easier and more accurate identification and prediction.It is considered to be a highly useful tool for predicting mental health [15].ML requires input in the form of features or variables.One of the most important issues is determining which feature will lead us to the best solution.Datasets used in the ML process typically contain redundant and irrelevant features, cannot improve accuracy [16], have no effect on the learning model [17], and may even degrade the learning model's performance [16].As a result, relevant features must be chosen.
Numerous studies on feature selection (FS) techniques have been conducted recently to create a dataset with the most pertinent features for the best model performance [18] and reduce computation time [19].FS is typically used to select only efficient features based on the given input by reducing noisy data, which aids in the identification of the application [20].There have been many reports in previous research on FS methods that affect improving ML performance, including relief [21], minimum-redundancy-maximum-relevance FS (m-RMR) [22], information gain [23]- [25], and gain ratio (GR) [23], [26], [27].To our knowledge, no research has reported the best method for each ML approach.Each method employs a distinct technique or strategy for identifying the best and most relevant features.There is no research that compares ML performance or discusses the use of FS in NPD cases.
In this study, we would like to compare the performance of various FS approaches (all features, information gain, and GR) against some ML methods (support vector machine (SVM), random forest classifier (RFC), and Naive Bayes).In this study, features are chosen when the threshold value is greater than 0.05.It is frequently required and used in studies that require good accuracy at a lower level [28], [29].The research findings will provide an overview of the ML approach's ability to predict NPD using features generated by FS techniques.
The rest of this paper is organized as follows: section 2 provides a brief overview of the research method.In section 3, we present the experiment results as well as the comparison classification and FS methods.Finally, section 4 describes the conclusion, which demonstrates the remarkable effectiveness of the approaches.

METHOD
This section outlines the procedures followed when doing research.The research stage begins with determining the problem, followed by data collection.Before the data is entered into the model, it is necessary to do pre-processing of the data.At this stage, the data is cleaned and adjusted to be processed for the next step (feature selections using information gain and GR).The next stage is that the selected features will be processed into the ML method (SVM, RFC, and Naive Bayes).It consists of several steps, as illustrated in Figure 1.

Dataset
We have selected 8376 data from the data collecting process, with an average age of 14 to 50, 44 features, and 1 class label.The data used in this research are obtained from Open Psychometrics (https://www.kaggle.com/datasets).The dataset consists of 5330 data with the class label of "yes" and 3046 with the class label of "no."Utilizing feature selection and preprocessing approaches, the data have been cleaned and reduced to the most important attributes.Three steps make up the method: feature selection, preprocessing, and applying ML to the prediction process.The details of 44 features of this study are listed in Table 1.

Feature selection
The source dataset often consists of various features, some of which may or may not be important for categorization [30], [31].Unimportant features that depend on other attributes reduce prediction accuracy.A feature selection strategy must be used to overcome this and decrease the feature's dimension.There are several methods in feature selection, namely filter, wrapper, and hybrid methods.The method most often used for feature selection is the filter method [32].The Information gain and GR methods are used in this work, which helps to find the feature subsets.The techniques are one of the popular filter models [29].Each feature in the dataset was counted, selected, and defined using a value limit known as the threshold (cutoff).This research uses a threshold >0.05.

Information gain
Information gain is the change in class entropy from a previous state to a state when an attribute value occurs.It is applied here to demonstrate how features are pertinent.Decision tree induction is the foundation of this method.Information gain is used as a criterion for choosing attributes.The information gain method has a faster time in the feature selection process than other methods [33].The features with the most information will be ranked highly in this method; otherwise and low.It is computed using the following steps: Step 1: step one involves computing entropy () before observing attribute  and class C.
Step 2: once attribute  has been observed, calculate the entropy.After seeing features that are a part of subsets of the main data set, this calculates the entropy.The first two phases are extremely important since they supply the entropies needed in the next phase to acquire information gain.
Step 3: information gain calculation.The difference between the Entropy before observing characteristic  and information gain of attribute .
The final feature set that will be utilized for classification is chosen after computing the gain from each feature and setting a threshold.

Gain ratio
After dividing the data, the entropy value of the probability distribution subset is calculated using the GR, which normalizes the information gain acquired [21], [34] considers the dataset's number and size [35].The GR modifies the information gain, which lessens its bias.
The GR chooses an attribute based on the number and size of branches.By accounting for the inherent information of a split, it corrects information gain.

Classification
A classification is a form of data mining technique that is currently popular.It functions similarly to other methods like decision trees and neural networks.These strategies use various methods to assess the available data to produce their prediction.
After the feature selection stage and the features that affect the class label based on ranking are obtained, the next stage is classification.Furthermore, the accuracy, error rate, and time required in the classification process are compared using the method of SVM, RFC, and Naive Bayes.The classification model will be validated using k-fold cross-validation.The cross-validation method is commonly used for training sets [36].Figure 2 shows the k-fold cross-validation.

Performance measurement
This study examines how the confusion matrix might be used to gauge accuracy and mistake rate.A confusion matrix of size  ×  coupled to a classifier, where  is the total number of classes, displays the anticipated and actual categorization.The confusion matrix for  = 2 is shown in Table 2. Calculating prediction accuracy and classification error is another method for evaluating and comparing classifiers.Both values can be obtained from the confusion matrix Table 2 and calculated using (1) and (2).Performance evaluation of feature selections on some ML approaches for diagnosing the … (Heni Sulistiani) 1387

RESULTS AND DISCUSSION
In the pre-processing stage of data mining, features are selected from the initial attributes through feature selection.In this research, we use information gain and GR feature selection techniques to determine the number of features based on weight values.The first step in the method for choosing features is calculating the weight of each feature.Next, features will be ranked based on the weight value.
In this research, we use three scenarios for performance evaluation of the classification algorithm including accuracy, error rate, and computational time.The first scenario: use all features.Second scenario: feature selection is carried out using information gain techniques.Third scenario: feature selection is carried out using the GR technique.The main aim was to perform a comparative analysis of the use of all features and two different feature selection techniques.In addition, to identify the best feature selection techniques that recommend the most relevant features.The results of feature selection can be seen in Table 3. From the Table 3 shows that the information gain technique produces 37 selected features, while the GR technique produces 38 features.From the information gain technique, there are seven features with a weight value of less than 0.05, including f26, f05, f19, f24, f02, f01, and f43.Meanwhile, from the GR technique, there are six features that with a weight value of less than 0.05, including f26, f19, f24, f02, f01, and f43.These features will be removed and not used in the classification process.The information gain technique produces fewer features than the GR technique.This means that the information gain technique is able to reduce irrelevant features according to the threshold.
Selected features are used as final features as there is no further feature removal, and these features are used for training as well as the desired test data to measure classifier efficiency.We develop the model using WEKA (version 3.9.2) in this research.The WEKA platform simplifies the construction of several data analysis techniques and offers a JAVA programming language API [37].It provides tools for categorizing, regressing, grouping, eliminating superfluous traits, creating association rules, and displaying the dataset.
To assess classification performance, we apply k-fold cross-validation.The same data is used to separate the dataset into 'k' subsets.Ten folds of data will be used in this study, with each fold being around the same size.They therefore have ten data subsets.The cross-validation test will employ 9-fold for training and 1-fold for testing for each of the 10 data subsets.Three different classifiers, including the SVM, Naive Bayes, and RFC available in WEKA were used to know which classifier outperformed.The confusion matrix for each characteristic in the dataset is used to evaluate how well the models performed.The results of the accuracy performance comparison of three scenarios can be seen in Table 4. Based on test findings, it is known that the three ML techniques can accurately predict NPD when the feature selection strategy is used.By applying feature selection techniques, the accuracy value of each ML method has increased.This shows that the presence of features affects the classification results.Feature selection selects several features that are able to provide the best results in classification [38].From the results of the validation test, the GR technique has the highest accuracy value compared to the others.
For all test scenarios, it has previously been demonstrated that the RFC approach is more accurate.Value the accuracy of the RFC method using all features, information gain, and GR with 99.93%, 99.96%, and 100% respectively.We noticed that the RFC with GR has the best of accuracy with 100%.Meanwhile, the Naive Bayes method has the lowest accuracy value compared to the other methods.Because the Naive Bayes method has constraints in class imbalances that affect the classification results.Due to its sensitivity to class distribution, naive Bayes predictions can occasionally be unsuccessful at predicting minority instances.
Furthermore, a comparison of the error rate values of three scenarios is carried out.The error rate is calculated using the confusion matrix.The results of the error rate comparison can be seen in Figure 3.   4 presents the consuming time for the ML methods before and after the feature selection using information gain and GR.The test results show that the Naive Bayes with information gain method has the fastest consuming time; it only takes 0.22 seconds.Because Naive Bayes determines the best suitable probability by calculating the likelihood of one class for each existing attribute group.Meanwhile, the SVM approach takes more time than the other two methods.This is because the SVM method's job is more complicated due to using a kernel that seeks out a hyperplane, which makes the processing time longer [39].
In addition, the test results show that the number of features affects the consumption time.The information gain technique produces fewer features than the GR.So, the information gain technique has a faster consuming time when compared to using all the features and GR techniques.

CONCLUSION
The test results show that the GR feature selection technique has the highest accuracy value compared to using all features and information gain.In the application of machine learning, the RFC method with a GR has the highest accuracy value of 100%.In the time-consuming test, the naive Bayes method with information gain has the fastest time, which is 0.22 seconds.The test results also show that the number of features analyzed greatly affects processing/consuming time.Therefore, it is necessary to carry out further research to improve feature selection performance to produce relevant and important features.

Figure 1 .
Figure 1.The design of the proposed method

Figure 2 .
Figure 2. The procedure of k-fold validation positives instances Number of true positives instances (TP) Number of false negatives instances (FN) Actual negatives instances Number of false positives (FP) Number of true negatives instances (TN)

( 1 )
Meanwhile, to calculate the error, the following equation is used.()  =

Figure 3 . 1389 Figure 4 .
Figure 3.A comparison of the error rate of the methods

Figure
Figure4presents the consuming time for the ML methods before and after the feature selection using information gain and GR.The test results show that the Naive Bayes with information gain method has the fastest consuming time; it only takes 0.22 seconds.Because Naive Bayes determines the best suitable probability by calculating the likelihood of one class for each existing attribute group.Meanwhile, the SVM approach takes more time than the other two methods.This is because the SVM method's job is more complicated due to using a kernel that seeks out a hyperplane, which makes the processing time longer[39].In addition, the test results show that the number of features affects the consumption time.The information gain technique produces fewer features than the GR.So, the information gain technique has a faster consuming time when compared to using all the features and GR techniques.

Table 3 .
Selected features

Table 4 .
A comparison of accuracy for NPD