A comparative study of classification techniques in data mining algorithms used for medical diagnosis based on DSS

ABSTRACT


INTRODUCTION
Data mining (DM) is the process of detecting patterns in vast amounts of data or uncovering new information from massive amounts of data in terms of patterns or rules. Database technology, analytics, artificial intelligence, pattern recognition, information systems, high-speed computing, and visualization approaches are among the areas involved. Different rules and patterns are mined using DM techniques, such as association rules, sequential patterns, classification trees, and so on. Before it may generate usable information, it must first undergo data preparation. DM main purpose is to extract hidden information from a batch of data. The data gathered is useful in making decisions. Several popular DM technologies are currently being used to successfully find predictive information for a variety of applications. DM results can be presented in a variety of ways, including a list, graphically results, summarized tables, and visualization [1], [2].
The provision of high-quality services at reasonable prices is a major problem for the health sector. A high-quality service entails appropriately diagnosing and treating patients. Poor clinical judgment can have disastrous consequences, this cannot be allowed. Even the most technologically advanced hospitals lack software that uses DM techniques to anticipate sickness. There's a lot of hidden data out there that can be turned into good information. Clinical analysis is recognized in order to be subjective; the doctor providing the analysis determines the outcome. Second, and perhaps most crucially, the quantity of information that must be examined in order to attribute selection filter's default properties takes a lot of time. Finally, it is important to prioritize changing the attribute selection parameters for more accurate results [9].
In-hospital mortality and internal medicine (transfer to a hospital) have been the results (direct hospital admission or transfer). We created four ML-based models: lasso regression, RF, gradient-boosted DT, and deep neural network using commonly accessible triage data as predictors (such as demographic information and vital signs in the training set (70% random sample). We assessed the predictive performance of the models using statistics, prospective prediction values, and decision curves for the test set (the remaining 30% of the data). These ML models were developed for each result using the common triage categorization data and evaluated to the model [10]. The patients with heart disease have been predicted using a variety of DM approaches. However, using DM techniques did not eliminate the data's ambiguity. Fuzziness has been added to the measured data in an effort to reduce uncertainty. To eliminate ambiguity, a membership function was created and added to the measured value. Additionally, an effort was made to categorize the patients using the data gathered from the medical community. The minimum distance K-NN classifier was used to divide the data into different categories. In comparison to other classifiers of parametric approaches, it was discovered that the fuzzy K-NN classifier works well [11].
In this study, a variety of classification algorithms based on factors like age, gender, blood pressure, cholesterol, and pulse rate are used to evaluate each persons risk level. DM techniques such as NB, K-NN, DT algorithm, and neural network are used to categorize the patient risk level. The risk level can be predicted with a high degree of accuracy when more criteria are employed [12]. The massive amount of data generated by the healthcare industry, ML algorithms significantly contribute to the disease prediction. The leading cause of death in India is heart disease. According to WHO, timely steps can anticipate and prevent stroke. By using ML approaches like DT and NB as well as risk variables, the study in this paper can assist predict cardiovascular disease with greater accuracy. The heart failure dataset, which has 13 attributes, is the dataset that we took into consideration. Pre-processing of the gathered data is necessary before examining the performance of approaches. Then feature selection and reduction should come next [13]. The medical industry has a lot of promise for using DM to find patterns that are hidden in medical datasets. This makes it possible to abstract knowledge for predicting heart disease using a variety of mining techniques. A survey of different single DM approaches and hybrid mining techniques is conducted to determine the most effective method for achieving high accuracy in heart disease prediction. Here, the potential of a variety of classifying strategies, including NB, SVM, DT, K-NN, and even a hybrid classifier approach, was assessed. Analysis of several method showed that classification-based techniques outperform earlier strategies in terms of accuracy. DM, classification, disease diagnosis, forecasting, and accuracy are some related terms [14].
This article's purpose is to determine the most important risk factors for managing the kidney, liver, and CP disorders issue and to provide diagnoses, therapies, and invalidations for each level among them. DSS provides tools for measuring infection levels, locating solutions, and providing short instructions or tasks. Additionally, it demonstrates appropriate step and care taken to prevent deaths, aids medical staff members working in hospitals, and concentrates on finding information sources. The primary goal of this paper is to improve the DSS based on DT C4.5 inference system (DTCIS), NB algorithm, and logistic regression (LR) algorithm to diagnose these illnesses and compare their effectiveness and correction rates, an algorithm was developed) to gracefully manage requests for medical care in order to reduce the symptoms of CP, kidney disease, and liver disease, as well as to control medical care flexibly in order to offer an optimal way of treatment.
The organization of this manuscript is as the following. Section 2 discussed the relevant related papers discussed the use of classification technologies in the healthcare DSS field are surveyed. In section 3 discuss the importance of medical diagnosis detection using ML. In section 4 we analysis the results. Section 5 explained all the information that related with the experimental data of diseases. In section 6 depicts the DM algorithms. Section 7 describes the performance evaluation. Section 8 illustrates the comparison of NB, C4.5 and LR algorithms. Section 9 describes the conclusions.

RELATED WORK
Several studies focusing on medical diagnostics have been published so far. Using the UCI ML repository's dataset, these studies used several solutions to the issue and produced great efficiency in classification of 77 percent or more [15]. Some instances are as follows: with a logistic-regression-derived discriminant function, experimental results indicated a proper classification accuracy of roughly 77%. To detect cardiac illness [16]. Bahani et al. [17] have employed fuzzy support vector clustering. The experimental findings were achieved using a well-known benchmark of heart illness, and the algorithm used a measurement created by a kernel to allocate every piece of information. Support for ischemic heart disease (IHD). Vector machines are good forecaster and detectors with a high degree of accuracy. Nonlinear proximal SVM (PSVM) are used in this tree-based classifier. Kaur  2967 component analysis, an expert system was created to diagnose diabetes. A cascade learning system was also developed to identify diabetes. Developed a fuzzy-based controller to regulate blood glucose levels using expert knowledge. To obtain variations from a dataset of self-monitoring blood sugar levels, devised a stochastic model [19]. Salah and Ahmed [20] have design a DSS application to assist specialists (doctors) in making challenging decisions. The DSS is based on specialists' experience and a DM extraction strategy to assist the hospital handle the (COVID-19) viral pandemic gracefully and, more broadly, to define the type of disease and give a suitable protocol health indicator on the diagnosis. To begin, it is necessary to identify the 3 early COVID-19 pandemic diagnosis (fever, weariness, dry cough, and breathing problems) that are used to establish whether or not a person is infected with the virus. Second, employing two age factors and primary healthcare status variables like diabetes, heart problems, or ischemia, this approach separates infected persons into several categories depending on their immune response risk (very high degree, high degree, mild degree, and normal). Where these folks are assessed and expected to follow the rules of their class. The major goal of this work is to improve a DSS based on DTCIS to elegantly monitor requests in medical care to decelerate COVID-19 virus symptoms and regulate a pandemic flare-up to reduce its impact on medical care. utilizing a good treatment protocol to deliver a proper therapy. Ahmed et al. [21] have design a first aid decision support system (FADSS) and built it to provide access to practical instances that pose a risk to the general public, as well as sophisticated conditions for assessing research abilities and providing for critical medical care via a graphical user interface. The design of FADSS's first-aid therapy is based on common first-aid scenarios. We provided an approach for managing first-aid therapy by modeling a framework (FADSS) that helps individuals access data about first-aid situations that are commonly accessible as a service. The FADSS service employs a set of fifteen critical scenarios that may occur in individual's lives. A therapy decision is recommended by a technique for a new kind of disaster. FADSS uses computational models, DT, and DM (C4.5 algorithm) to test information in real-time in order to develop a decision-making system. When the case is really critical, the system sends out automatic alerts via text messages and email reports. The primary goal of this research is to develop an effective tool that assists individuals and junior staff at first-aid centers in locating relevant information resources.
Although detection for emphysema using low-dose computed tomography reduces lung cancer mortality, it also has the potential to cause harm. Patients should be informed as to the advantages and risks of detection for emphysema before making a choice, according to current guidelines. To compare the impact of a decision-making assistance for patients on detection for emphysema decision-making outcomes vs professional requirements material (EDU) among smokers. When compared to EDU, a patent ductus arteriosus (PDA) sent to people searching help from smoking stop lines enhanced the quality of detection for emphysema decisions. These advancements were in line with professional society advices for tobacco making educated decisions about detection for emphysema [22].
The patient decision aid had no effect on whether or not the patient intended to be screened or whether or not the screening was completed. Motivations to get screened or screening results were not affected by the PDA. But there is a critical necessity enhance the quality of life of these talks, the patient decision aid was designed to supplement but not as a replacement for a medical discussion practitioner. 44 the PDA may reach a huge a large number of possibilities qualified tobacco in the US if it was distributed through tobacco stop lines. Given the varied funding for cigarette stop lines, treating the function of lines for quitting smoking in spreading patient decision aid support for detection for emphysema is critical to ensure that the intervention is widely disseminated and has a greater impact.
Medical evidence can be filtered derived from research, integrated using computerized patient history, and advices targeted to particular patients generated using sophisticated evidence-based information resources. The goal of this study was to see how effective an electronic CDSS is in reappraising indication and providing health care practitioners provides patient-specific, relevant data suggestions at this time of treatment. However, the modification despite the fact that an investment arbitration clinical decision support therapy was only statistically successful in improving providing activity and service quality, this pragmatic randomized controlled trials (RCT) found that it was marginally useful in changing prescribing behaviour and service quality. Impacts on improving adherence to suggestions were small, influencing only about 4 out of every 100 patients. An electronic health record with a standardized conceptual and layout system that can be easily managed. complicated information on disease, 46-48 and the capacity to link data to a CDSS that is either supplier or accessible are the minimal criteria for adopting CDSSs. 49,50 these essential elements are increasingly prevalent preconditions for CDSS use in hospitals, and they can help to promote and sustain CDSS implementation in health authorities. A DSS is additional feature of a health-monitoring system [23].
A DSS is a platform that collects data from many resources and current events it in creative visual representations such as graphs, geographical analyses, and maps, as well as reports from the health-care industry and other sources of information. A DSS transforms health data into actionable information more accessible, intelligible, and, as a result, decision-makers are more motivated to use it. Measure evaluation's prototype DSS is capable of integrating many different data sources and is a strong but user-friendly tool for data analysis. We must ensure that data sources are compatible as they are strengthened and additional become accessible. The MFL is the most essential data source since it allows data sources to be linked. As a result, it necessitates extra attention. The DSS provides decision makers with comprehensive data for monitoring programs and avoiding and controlling outbreaks.
There have now been a few investigations that have focused on clinical determination. Using the dataset acquired from the user computer interface AI archive, these investigations used a variety of approaches to address the problem and achieved high grouping correctness's of 77% or higher [24]. Here are a few examples of models: fuzzy support vector clustering was used to differentiate coronary artery disease.
To relegate each bit of information, this method used a portion initiated measurement, and exploratory the outcomes were acquired through using a well-known standard of cardiovascular sickness. Nonlinear proximal assistance vector machines are used in this tree-based classifier [25]. Based on synthesis analysis, Aljohani et al. [26] have established a professional framework to analyse the diabetic sickness. To analyse diabetes, he created a course learning architecture. Also enhanced a fluffy-based regulator that combines expert information to control the sugar levels in the blood levels.
The adequacy of CDSS has been investigated in a number of different studies. The framework and analytic initiatives advise doctors on potentially dangerous pharmaceutical combinations. These projects can help to limit concerns and blunders, avoid misunderstandings, and improve the doctors' analyses. Early warning in the event of harm may have an impact on the type of care provided and the expense involved [27]. According to a study conducted in England, executing PC-based rules can result in an improvement in health outcomes, and the unresolved questions raised by clinicians throughout the clinical experience will provide an opportunity to use the CDSS [28]. They compiled the four factors associated with implementing successful CDSS from several studies. The elements were: i) automating alarms and updates, ii) providing proposals at a specific location and time, iii) providing substantial suggestions, and iv) automating the complete process. These elements have an impact on the process of crisis care and treatment. To demonstrate how quickly the DSS communicated the issue, a contextual analysis approach was used. Data sharing is a critical component of properly implementing a CDSS [29].
The findings reveal that doctors accept losing control of their work and losing special aptitudes and information by following the advice of CDSS, where any non-expert can gain access to clinical material specified by the doctor. As a result, skilled self-sufficiency plays an important role in doctors' decision to use a CDSS. Furthermore, this investigation improves; i) structure, which encourages the chief to assign a domain to doctors in order to facilitate effective information sharing and the implementation of intuitive CDSS, and ii) the nature of administrations provided to patients through the use of appropriate clinical information technology (IT) frameworks in clinics. Antoniadi et al. [30] have looked into the most significant CDSS issues. These include computerization of the entire CDSS, integration into clinical work processes, framework extensibility and viability, ideal advising, cost-benefit analysis, and the requirement for structures that allow for the reuse and able to share of CDSS administrations and components.

MEDICAL DIAGNOSIS DETECTION USING MACHINE LEARNING 3.1. Machine learning techniques
ML techniques are now being utilized to identify and defend outliers, especially at the detection stage. Algorithms such as the SVM, K-NN, neural network, DT, NB, and others are now in use. ML is a collection of algorithms that convert data into actionable information. When it complements rather than replaces a topic master's unique knowledge, it works well. As the name implies, a predictive model is used to forecast one value based on the dataset's other values. The learning algorithm seeks to deduce and simulate the relationship between the goal and other characteristics. The processing of a training predictive model is referred to as supervised learning or classification [31]. DT, NB, LR, and RF are examples of supervised learning approaches (RF). In this study, we develop four ML models using the LR, NB, and RF, DT, algorithms, which we then analyze to discover the best model. The C4.5 DT algorithm is a DT that is used to make decisions.
This method is a better version of his prior C4.5 (j48) algorithm, which was better than his iterative measures of depression 3 technique (LR). The C4.5 algorithm has the advantage of being opinionated when it comes to trimming and making a lot of decisions automatically with very good defaults. The C4.5 algorithms make use of the information entropy concept. The approach requires a set of input and output training pairs, with the appropriate class as the output. The result is displayed as a tree, making it human-readable. It has a variety of characteristics, including [31]: i) the C4.5 approach is capable of detecting noise and missing data, ii) the C4.5 approach is capable of detecting noise and missing data. The huge DT can be thought of as a set of straightforward rules, iii) the C4.5 classifiers can forecast which attributes are relevant and which are not when it comes to categorization, and iv) overfitting and pruning errors were no longer a problem.

Feature selection
To improve classification performance and save memory, feature selection is a procedure for choosing a subset of significant characteristics from a greater number of features and reducing the amount of irrelevant redundant features in a dataset [32]. Feature extraction aids in data interpretation, reduction of the curse of dimensionality, decrease of processing requirements, enhancement of learning accuracy, and distinction of which features may be essential to a particular situation [33]. There are several supervised feature selection strategies, which can be split into wrapper, filter, and embedding models. One of the most commonly used filter model approaches in the training dataset of each attribute is examined by assessing the utility of an attribute fuzzy with regard to the class in feature selection (Figure 1). The higher a characteristic's entropy, the more information rich it is [33].

Feature weighting
Feature weighing is a viable approach to keeping or eliminating a feature. The more important traits are given greater weight, while the less important features are given less weight. Large-weighted features play an essential role in the model's construction, resulting in improved accuracy. The domain's knowledge of the relative relevance of features is frequently used to determine these weights. Alternatively, it could be chosen for you automatically [34].

The dataset
The purpose of this project is to create and evaluate a CDSS for the management of patients with kidney disease, liver disease, and CP. Kidney illness is the greatest cause of death worldwide, according to one survey. Almost 830,000 individuals die in the United States alone, at a cost of about 393.5 billion dollars. The human kidney is positioned a cross on both sections of the lumbar spine on the posterior abdominal wall. The kidney's key tasks include metabolic management, waste and toxin excretion, blood pressure regulation, and fluid balance maintenance. The kidney filters all of the blood in the body 20 times every hour. Whenever kidney failure is reduced, the waste of the body is unable to be digested, resulting in back discomfort, hypertension, kidney disorders, high blood pressure, urethral inflammation, lethargy, sleeplessness, vertigo, hair loss, and eyesight blur, poor reaction time, sadness, fear, mental problems, and other complications. A damaged kidney will also generate and secrete erythropoietin. Patients will get anemia if their red blood cell production is insufficient. Kidney failure patients may present bone fractures because the kidney helps regulate the calcium and phosphate [35]. In the United Kingdom, 700,000 people have symptomatic kidney failure. Although there is a tendency toward bettering the prognosis of kidney failing patients, mortality is still significant, especially in the first year. This is despite breakthroughs in renal failure treatment, which include anything from effective medication management to invasive procedures like cardiac resynchronization therapy (CRT). As a result, continuous research into ways to identify patients who are more likely to die early is justified. The predictive value of patients with poor liver function tests (LFTs) with newly diagnosed renal failure with lower elimination proportion is one such route (KFREF).
LFTs were abnormal in patients with right renal failure and pericardial regurgitation as a result of liver obstruction, primarily alkaline phosphatase (ALP) and bilirubin. Similarly, ischemic hepatitis caused by cardiogenic shock results in aberrant aminotransferases. Low albumin levels, which indicate insufficient nutrition, are common in patients with cardiac cachexia. To avoid acute consequences and reduce the risk of long-term complications, hepatitis, a liver illness, necessitates ongoing medical care and patient self-management education. Anorexia (lack of appetite) and an increase in alkaline phosphate levels are to blame for this. Hepatitis is one of the diseases that can be categorized [36].
Fibrous tissue is increasingly replacing endo-and exocrine pancreatic cells as acinar structures are depleted characterize CP. Pancreatic exocrine insufficiency frequently occurs before endocrine insufficiency in the course of CP. Digestive enzyme secretion and, as a result, activity decreases over time, resulting in maldigestion. Weight loss is common in CP patients related to a decrease in food consumption (due to pain and/or long-term alcohol consumption) and proteins maldigestion, complex acids, and carbs. Malnutrition is therefore widespread in CP patients, and its severity is one of the most important indicators in predicting complications and the disease's fate. In a recent study, we discovered that patients with advanced CP have significantly worse nutritional indicators. Protein calorie deficiency has been linked to lower serum amino acid concentrations [37].

The WEKA tool
WEKA is a collection of DM ML algorithms. The algorithms can be used with a collection of data or used straight from Java code. WEKA includes data pre-processing, classification, regression, data collecting, association rules, and visualization tools. It's also ideal for creating new ML strategies [38]. The ability to read files from a number of database formats is one of WEKA's strongest features [39]. A visualization support is the software's flaw. It's worth emphasizing that the software serves as a visual representation of data, results, and procedures. The assistance available is somewhat restricted. WEKA can communicate with the R statistical package. In order to improve not only the statistical analysis operations, but also the visualization of statistical analyses and findings [40]. WEKA is a free open source application distributed under the terms of the GNU general public license. WEKA was originally developed in C, but it has since been fully rewritten in Java and is incompatible with nearly every computing platform. WEKA is simple to set up and use, featuring a graphical interface that allows for quick setup. WEKA is applicable to a wide range of DM techniques and industries. Users can use the application to find hidden information databases, files with user-friendly interfaces, and visualizations [41], [42].

EXPERIMENTAL DATA
Kidney illness, liver disorders, and CP were the three medical datasets we used. All of these datasets were taken from the ML datasets archive. The goal is to categorize illnesses and assess features extraction measure techniques like NB, C4.5, and LR. This experiment uses the kidney illness dataset of 340 individuals, which has 29 sequentially valued and important features, as indicated in Table 1. The liver diseases dataset contains 23 attributes and 220 instances, as shown in Tables 2 and 3. Table 4 shows the CP dataset comprising 680 individuals with 27 variables [43]. Table 3 describe the elements of liver function and their normal values used in this study are prescribed to compare them with the values achieved by the liver patient. Table 4 describe the elements of CP function and their normal values used in this study are prescribed to compare them with the values achieved by the liver patient.

Attributes selection measures
ML and DM use a variety of measures to develop and evaluate models. The DT C4.5 methods, the NB algorithm, and the LR algorithm have all been developed and tested on our experimental datasets. The confusion matrix created by these algorithms can be used to assess their correctness precision, recall, F-measure, and receiver operating characteristic (ROC) space were the four performance measurements used [44]. The four measures are calculated using a distinct confusion matrix (also known as a contingency table). The classification results are represented as a matrix in the confusion matrix. It includes information on a categorization system's actual and expected classifications. The data set classified as true when they were exactly true (i.e., TP) and the number of variables categorized as false when they were clearly false (i.e., FP) (i.e., TN). The number of samples that were erroneously classified is represented by the other two cells. Furthermore, the cell indicating the number of tests categorized as false while they were exactly true (i.e., FN) and the cell representing the number of results generated during the testing as true when they were clearly false (i.e., TF) (i.e., FP). Following the generation of the confusion matrices, the precision, recall, and F-measure may be easily calculated. Precision, in less formal terms, assesses the proportion of patients who are actually sick (i.e. true positive) among the patients who were proclaimed disease. Recall, on the other hand, measures the percentage of actually sick who were detected; and F-measure, on the other hand, balances precision and recall. False positive rate (FPR) and true positive rate (TPR) are the x and y axes, respectively, in a ROC space, which illustrates true positive and false positive tradeoffs in relation to each other.   Table 3. Liver diseases dataset liver function tests with their normal values as used in the study [36] No Liver function tests  Normal values  1  ALT  10 IU/L-40 IU/L  2  ALP  40 IU/L-130 IU/L  3  Albumin  35 g/dl-50 g/dl  4 Bilirubin 5 mg/dl-25 mg/dl ALT: alanine aminotransferase; ALP: alkaline phosphatase Table 4. Specified analytical values in chronic pancreatitis patients and healthy controls [37]

DATA MINING ALGORITHMS
Different algorithms are used in DM to transform raw data into knowledge that can be applied. A classification method is used in vocations when it is necessary to forecast one value using data from other values in the dataset, as the name suggests. The learning algorithm tries to infer and simulate how the aim and other features relate to one another. Supervised learning or categorization is the process of processing a training predictive model. DT, NB, neural networks, supervised learning techniques include SVMs and RF, for instance [45].
Four models were created in this study in order to get the results. After contrasting NB, RF, DT, and LR, the best model was chosen. A powerful method for data exploration, predictive modeling, and analysis that is widely used is RF. Individual DT that are created from a number of separately trained DT can deliver results (RF). A classification model uses learning techniques to generate a group of classifiers, and it then uses a grading system for predictions to classify new data. The output is done using individual trees in the RF approach, which comprises of multiple DT [46]. It has a number of features, including: i) offers excellent and efficient services for missing data and methods for dealing with it, and ii) because over processing is an issue in some DT, this strategy is the best answer [46].

Naïve Bayes
This procedure is the foundation of bayes theory and is employed when the number of input dimensions is large. The output of a bayesian classifier can be computed from the input. In addition, you can upload fresh data at any time during the game and gain points for the best probability classifier. When the class variable is given, the presence (or absence) of a feature assigned to a class, according to a NB classifier, is unrelated to the presence (or absence) of any other characteristic, as demonstrated in Tables 5-7 [47].

Logistic regression
Is a predictive analysis technique that works in the same way as other regression analyses. One of the responsibilities of LR is also to describe the data. The ensuing link between classes and attributes is explained and shown in (Tables 8-10) [48].

Decision tree C4.5 algorithm
DM is one of the most essential techniques, DT are used in measurements, calculations, and ML. Using a DT, one can move from a particular idea (represented by branches) to conclusions regarding the utility and worth of an object (as an insight model) (represented as leaves). Class labels refer to the leaves, and conjunctions of climax refer to the branches that symbolize the class labels. Regression trees are DT in which the objective variable can take on persistent features (often true numbers). Because of its comprehensibility and clarity, DT are one of the most commonly used DM tools [48]. C4.5 chooses one data characteristic at every node of the tree that splits its set of samples most efficiently into subgroups enhanced in one or the other category [47]. The normalized information gain (difference in entropy) that results from selecting a property for data splitting is its criterion. To make the decision, the attribute with the highest normalized information gain is picked. LR was improved by C4.5 [48].
-Taking care of both continuous and discrete properties. Creates a threshold and then divides the list into those whose attribute value is more than or equal to the threshold and those whose attribute value is less than or equal to it -Working having input parameters that are missing in training data -Dealing with qualities that have different costs -Pruning trees after they've been made. C4.5 goes back through the tree after it's been created and replaces branches that don't help with leaf nodes. When the C4.5 methods is used to the three medical datasets [48]. The results are shown in Tables 11-13.

Classification rules
Considerable standards are obtained, that are important for understanding the experimental dataset's data pattern and actions [49]. Using the C4.5 DT algorithm, the following pattern was discovered [50]. The following are some of the rules retrieved from the kidney disease dataset: -Kidney disease (absence): creatinine=0. 3

PERFORMANCE EVALUATION
The many types of classification errors have an effect on its strength in one way or another. We must pay close attention to the expenses associated with errors. As a result, we evaluated the data we obtained using a set of standard indicators: positive either accuracy or value (Pr). It is the ratio of correctly classified attack flows (TP) to properly classified flows for all attributes (TP+FP). Sensitivity or recall (Rc) tt measures the ratio of correctly categorized attribute flows (TP) to all properly classified attribute flows (TP+FN). The detection rate is the frequency of actual events that can be predicted to occur. The accuracy, precision, recall, and F1 were calculated using (6) and (7) [44]. NB, RF, DT, and LR are the four techniques for DM that are most widely used. The performance test results of our evaluation measures for these four methods are shown in Table 14. These results depend on the confusion matrices Table 14, in addition to the performance measurement (1)- (4). With an accuracy rate of 87.3% and 87.3%, respectively, and a potential success (precision) of 89% for them, DT and RF classifiers are superior to the others among the three classification algorithms for handling numerical data in particular that were assessed. Since this experimental approach is more optimal, the F1 scor to DT and RF are 88.19% and 87.47%, respectively.

COMPARISON OF NB, C4.5 AND LR ALGORITHM
When it comes to category descriptions, aggressive, divide-and-conquer tactics are used, algorithm designers have had a lot of success. For this survey, DT learners such as NB, C4.5, and LR were chosen since they're reasonably quick and create classifiers that compete. Evaluating these three methods' ambiguity matrix (Table 15), we discovered that while LR outperforms the NB algorithm on Indices of feature extraction, C4.5 outperforms both in terms of accuracy and time complexity. The run time complexity of LR is satisfactory when compared to C4.5 [50].

CONCLUSION
One of the really efficient categorizations algorithms is the decision-tree algorithm. The algorithm's effectiveness and rate of rectification will be determined by the data. Each model's confusion matrix was computed using 8-fold cross validation, and the performance was evaluated using precision, recall, F measure, and ROC space. Bagging algorithms, particularly C4.5, performed best among the examined approaches, as expected. The findings presented here make practical application more accessible, paving the way for significant progress in the treatment of kidney, liver, and CP. The survey looks at the number of steps involved in data processing and the challenge of running data for the DT algorithms LR, C4.5, and NB. It may be inferred that the C4.5 algorithm outperforms the other two algorithms in terms of rule generation and accuracy. This shown that the NB algorithm outperforms the LR and NB algorithms in terms of induction and rule generalization. The results are then saved in the decision support repository. Since then, the knowledge base has narrowed to a specific group of disorders. The approach has been confirmed through a case study, and the extent of modeled medical knowledge can be expanded. Furthermore, interactions between the patient's various drugs should be explored in order to improve decision support.