An empirical assessment of different kernel functions on the performance of support vector machines

Received Apr 16, 2021 Revised Sep 11, 2021 Accepted Oct 22, 2021 Artificial intelligence (AI) and machine learning (ML) have influenced every part of our day-to-day activities in this era of technological advancement, making a living more comfortable on the earth. Among the several AI and ML algorithms, the support vector machine (SVM) has become one of the most generally used algorithms for data mining, prediction and other (AI and ML) activities in several domains. The SVM’s performance is significantly centred on the kernel function (KF); nonetheless, there is no universal accepted ground for selecting an optimal KF for a specific domain. In this paper, we investigate empirically different KFs on the SVM performance in various fields. We illustrated the performance of the SVM based on different KF through extensive experimental results. Our empirical results show that no single KF is always suitable for achieving high accuracy and generalisation in all domains. However, the gaussian radial basis function (RBF) kernel is often the default choice. Also, if the KF parameters of the RBF and exponential RBF are optimised, they outperform the linear and sigmoid KF based SVM method in terms of accuracy. Besides, the linear KF is more suitable for the linearly separable dataset.


INTRODUCTION
In the 21 st century, artificial intelligence (AI) and its sub-disciplines such as machine learning (ML), data mining, deep learning and expert systems have experienced a rebirth following parallel developments in computer power, theoretical understanding and vast amounts of data. These techniques have taken centre stage in the technology industry, assisting in solving several thought-provoking computing, software engineering, and operations research issues [1], [2]. They have impacted demand forecasting, financial analysis, supply chain planning, computer vision, big data analytics, customer engagement, business domain knowledge, education and many more domains. Several ML algorithms like decision trees (DTs), neural networks (NN), K-nearest neighbour (KNN), naïve base (NB), random forest (RF) and support vector machine (SVM) have achieved real-world application success [3]. Among these algorithms, the SVM has been widely used recently. To mention a few, finance [4], [5], engineering [6], healthcare [7]- [9].
The SVM is one of the most robust supervised ML algorithms based on statistical learning theory (STL) developed by Vapnik [10]. The SVM employs the risk minimisation theory to establish the best separation hyperplane in multi-dimensional space to classify a bipartite outcome [11]. Initially, the SVM was designed for binary classification [12]; however, of late, the SVM is applicable for both classification and  [13], [14], random forest [15], [16], neural network [17], [18] and k-nearest neighbours [19], [20]. Notwithstanding variations in the experimental outcomes, the SVM is equated to more traditional models in many of these studies. In contrast, other studies [13], [14], [16], [19] reported that the SVM grippingly outperformed several conventional models.
When training an SVM model, there is a need to choose a KF and its associated parameters, and this is one of the biggest challenges to users of the SVM [5], [11], [21]. The KF permits the SVM (a linear machine) to transform the feature space and act as a non-linear model. The KF parameters regulate the shape of the separating margin used to classify a set of features. An accurate selection of these parameters can significantly improve the prediction accuracy of the SVM [4], [22]. However, the availability of numerous KFs makes it challenging to select the appropriate one for a specific domain. Unfortunately, several researchers (especially beginners) adopt default SVM without worrying about the parameters it uses (e.g., KF).
Nonetheless, selecting all these parameters is necessary before using the SVM in a specific task since they are task-dependent. Besides, there is no universal accepted technique for choosing the appropriate KF and its parameters in a particular domain to attain high generalisation [12]. Additionally, the optimal regularisation parameter (C) is pivotal to obtain accurate outcomes [4], [17]. Hence, the current study presents a comprehensive comparative analysis of several KF on SVM performance for heart disease detection, exchange rate, and weather prediction. Furthermore, we aim to make the KF generalised for a specific domain.
Based on the above discussions, this study seeks to answer the question; which KF and its parameters are suitable for achieving higher generalisation of the SVM under a given dataset? We hypothesised that no single KF and its associated parameters are suitable for all domain applications. The rest of this paper is organised as shown in. First, section 2 presents the basic overview of the SVM, the various types of KF and the study framework. Then, we present the empirical results and discussions in section 3, followed by the study conclusions and future works in section 4.

RESEARCH METHOD 2.1. Machine learning
ML is a subfield of AI centred on creating computer algorithms capable of analysing and learning data with intrinsic patterns and improving their accuracy rate with time without independently. According to [23], ML can be defined graphically, as shown in Figure 1. Given some class of tasks (T), to a computer program, the program is said to learn from experience (E) and performance measure (P), if its performance at (T), as measured by P, improves with (E). Typically, ML is divided into four areas [24], i.e., 1) supervised learning (SL), 2) unsupervised learning (UL), 3) evolutionary learning (EL) and 4) reinforcement learning (RL). Among these four techniques, studies [3], [25], [26] shows that SL is the most commonly used by researchers and professionals due to the already labelled dataset's availability and SL lesser computational time than UL, RL and EL.

Task (T)
Computer program learning algorithm Several SL algorithms exist, e.g., KNN, linear regression, RF, logistical regression, gradient boosted trees (GBT), NN, SVM, DT and NB. However, since the current study centres on the SVM, we briefly present its overview in the subsequent section.

Support vector machine
The SVM algorithm was developed at AT&T Bell laboratories [10] to accurately classify binary dependent features using a unique hyperplane (H). The SVM maps input features (v) to a higher dimensional space where the best separating hyperplane is created. The hyperplane serves as a borderline sandwiched between two classes, created by maximising the margin between support vectors of both classes [19]. Typically, the hyperplane differs in value for different feature dimensions, e.g., for 1-dimension, H would be a point; for 2-dimension, H would be a straight-line. Whiles for dimensions greater than two, H would be a plane. Figure 2 shows a simple SVM [27]. The H with the most significant margin between the classes is  Figure 2).
where W t is a weight vector and 0 is called the bias value The SVM employs a practical mathematical function known as kernel trick to map the classification data and a dot-product for mapping a higher dimension. Usually, KFs are grouped into two classes, namely (i) rotation invariant kernel and (ii) translation invariant kernel. Table 1: shows the kernel functions for setting up an SVM model as defined [12].  Lately, the SVM has gain popularity among SL algorithms for solving real-world problems, like handwriting analysis, facial analysis and more, specifically for pattern classification, outlier detection and regression-based applications [28], [29]. However, one critical challenge faced by SVM users is choosing the appropriate KF and its associated parameters, such as penalty parameter (C), KF parameters like the gamma ( ) for the RBF kernel. Studies [4], [17], [28] affirms that is the vital step in managing a learning task with the SVM since it has a substantial effect on its accuracy. Thus, the primary problems accompanying setting up the SVM model are how to select the KF and its allied parameter values for a specific domain application [4], [17], [28], and this paper attempts to find the best expression for these SVM parameters in three different domain. Specifically, we examine how the SVM efficiency in predicting diabetes, fake banknote, flower species and wheat species is affected by different KF and other parameters. Thus, we aim to make the KFs generalised for a specific domain. It is anticipated that this study outcome will serve as a ground base for several researchers (precisely beginners) to adopt the appropriate SVM parameter for a particular domain application without worrying. Figure 3 shows the experimental framework of this study. In (5) different data sets were downloaded from various sources for this study; see Table 2 for details.   Firstly, we preprocessed each dataset separately to free the datasets from missing values, outliers and data inconsistency. We then normalised each dataset using the max-min function (see (4)) within the range [0, 1].

Study framework
where x ' is the normalisation value; x=the value to be normalised, are the minima and maxima value of the dataset.
Each normalised dataset is divided into a training set (80%) and a testing set (20%). From the different Kernels discussed in Table 1, we adopted five (5) most commonly used, namely: (i) linear, (ii) polynomial, (iii) radial basis function (RBF), (iv) exponential RBF and (v) sigmoid. Based on these six kernels. We set up an SVM model using different parameters (see Table 3). The aim is to make the kernels generalised for every dataset. 10-fold cross-validation was used for generalisation. We evaluate the performance of the SVM based on (i) confusion matrix, (ii) accuracy for classification analysis and root-mean-square-error (RMSE) for regression analysis. The Scikit-learn library was used to implement the SVM model. The experiment was conducted on Google colab [30], a free online platform for modelling ML models on powerful hardware options like GPU and TPU. Table 3 shows the details of the parameter used in modelling the SVM. Represent the kernel type for algorithm Penalty parameter (C) The Regularisation parameter, its strength is inversely proportional to C. Degree (d) Degree of the polynomial kernel function ('poly'). All other kernels ignore it. Gamma (γ) Kernel coefficient for 'radial basis function', 'polynomial' and 'sigmoid'. Table 4 shows the experimental results of different KF with several datasets. The results show that one single KF is not suitable for all domains, which confirms arguments by literature [4], [17], [28] that selecting the correct KF is a vital step in setting up an SVM model. An incorrect choice of the KF will lead to abysmal results by the model. Hence, the random selection of KF is not always optimal for achieving a high SVM generalisation. From the outcome, it suggests that the linear kernel is more suitable for a small dataset. Also, SVM is not a parametric model, so complexity increases as the training dataset's size increases. Table 4 shows that the optimal kernels effectively enhance the SVM prediction performance and generalizability by optimising the result. Likewise, the same kernel function on different datasets gave different results (see Table 4); this suggests that a given dataset's statistical property can effectively inform the choice of a KF and its parameters. Figures 4-9 shows the confusion matrix plot for each KF and its parameters on the different datasets; the results affirm Table 4. From Table 4 and Figures 4-9, it can be inferred that the choice of an SVM KF highly depends on the problem at hand, i.e., what one is attempting to model. Therefore, the motivation behind selecting a specific KF is very intuitive and upfront based on what kind of information one is intended to extract about the dataset. The outcome (see Table 4) affirms that correct tuning of the KF parameter in the RBF, Sigmond and exponential RBF KFs increases SVM accuracy compared with the linear kernel.

CONCLUSION
SVM is one of the machine learning algorithms that has received much attention from the research committee and professionals. Its success has been seen in every sector of our day-to-day activities. On the other hand, one primary concern with its implementation is selecting the most appropriate kernel function and finetuning its associated parameters. Several KF can be used for different applications, but the most suitable depends on the problem at hand (domain area). This paper reviewed and examined the effect of the linear, polynomial, RBF, sigmoid, and exponential RBF Kernel functions on the SVM algorithm's performance. We assessed these KF performances on the SVM using the confusion matrix and accuracy for classification and the RMSE for regression tasks. It is observed that no single KF is suitable for all classification problems or regression problems. Also, an SVM with optimised kernel parameters for RBF and exponential RBF KFs are more likely to outperform the linear and sigmoid kernel base SVM methods in terms of accuracy. Since the KFs performance varies with the problem at hand, this study suggests a fused kernel method as a way forward. Hence, a systematic method and optimisation technique such as genetic algorithm (GA) and particle swarm optimisations (PSO) can be used to build a unified kernel. Therefore, the challenge of selecting the correct kernels and the best logical method of merging these kernels is a future direction.