Classification of lung condition for early diagnosis of pneumonia and tuberculosis based on embedded system

Received Aug 26, 2020 Revised Nov 25, 2020 Accepted Apr 22, 2021 The lungs are the main organs in the respiratory system that have a function as a place for exchange of oxygen and carbon dioxide. Due to the importance of lung function, indications of lung disorders must be detected and diagnosed early. Research on the classification of lung conditions generally uses chest x-ray image data. Where a time-consuming procedure is needed to obtain the data. In this research, an embedded system to diagnose lung conditions was designed. The system was made to be easy to use independently and provides real-time examination results. This system uses parameters of body temperature, oxygen saturation, fingernail color and lung volume in classifying lung conditions. There are three conditions that can be classified by the system, that is healthy lungs, pneumonia and tuberculosis. The k-nearest neighbor method was used in the classification process in the designed system. The dataset used was 51 data obtained from the hospital. Each data already has a label in the form of lung condition based on the doctor’s diagnosis. The proposed system has an accuracy of 88.24% in classifying lung conditions.


INTRODUCTION
The lungs are the vital organs in the human body and have an important role in the respiratory and circulatory system [1]. The main function of the lungs is to perform the process of exchanging oxygen and carbon dioxide in the bloodstream. This process occurs in the alveoli, which are the tiny air sacs in the lungs. The alveoli receive oxygen during the inspiration process, this oxygen moves to the bloodstream through capillaries. The oxygen-rich blood then flows throughout the body through the heart. Meanwhile, the carbon dioxide received by the alveoli from the bloodstream will be excreted from the body during the expiration process [2]. Disorders of the alveoli will cause instability in the circulatory system. The most common disease in the alveoli is pneumonia. Pneumonia occurs due to an infection caused by a virus that causes inflammation of the alveoli. As a result of this inflammation, the alveoli fill with fluid and pus. This makes the alveoli unable to provide sufficient oxygen to flow throughout the body [3].
Besides pneumonia, tuberculosis (TB) is also dangerous disease that attacks the lungs. Tuberculosis is caused by Mycobacterium Tuberculosis, which can lead to complications in the lungs [4]. Based on information provided by the World Health Organization (WHO), 10 million people worldwide were infected with tuberculosis with a mortality rate of 15% in 2018 [5]. The mortality rate of lung disease, both pneumonia and tuberculosis, can be reduced if the indication of the disease can be detected and diagnosed at 1263 an early stage. Early diagnosis aims to provide fast and precise treatment to patients with a lung disease. In addition, early diagnosis also reduces the risk of transmitting viruses and bacteria to susceptible persons. Early diagnosis of lung conditions will be difficult to implement if conventional tests, such as sputum tests, spirometry tests, chest x-ray and CT scan are used [6]- [9]. The reason is because these tests must be carried out in adequate health care facilities and take a long time to get the test results. Therefore, a more practical and real-time system is needed to detect and diagnose lung conditions. This system needs to be used at any time, especially when early symptoms of lung disease are found. These symptoms include a persistent cough, fever, chest pain and shortness of breath. The diagnostic results obtained from this system can be used as a reference to get further treatment from a doctor.
Early diagnosis of lung conditions will provide optimal results if the appropriate parameters are used. There are several vital parameters that can be used to determine the lung conditions, including body temperature, oxygen saturation, fingernail color and lung volume [10], [11]. People with lung diseases generally have a high body temperature. This is caused by an infection that occurs in the lungs, so the hypothalamus in the brain sends signals to the skin, muscles and organs to increase body temperature in response [12]. In addition, people with lung disease also have a low percentage of oxygen saturation. Oxygen saturation is the ratio between the hemoglobin that binds oxygen (oxyHb) to the total hemoglobin in the blood. The low percentage of oxygen saturation is caused by the inability of the lungs to work properly to meet the oxygen demand in the blood [13]. Lack of oxygen in the blood can also be detected by discoloration of the fingernails. In healthy people, the fingernails have a pink color [14]. Meanwhile, people with lung disease will experience a condition called cyanosis. Cyanosis is a condition when the fingernails become pale and bluish due to a lack of oxygen in the blood. Furthermore, lung volume of people with lung disease is decreased. This condition is caused by a decrease in elasticity in the lung muscles as a result of the presence of viruses and bacteria in the lungs [15].
Research on the early diagnosis of lung conditions has been performed by Liebenlito [16]. In this research, the diagnosis of lung conditions used data in the form of chest x-ray images. Karnkawinpong [17] also uses chest x-ray images in classifying pulmonary tuberculosis lesion. The classification system designed in both research is based on computer-aided diagnosis (CAD). The use of chest x-ray images and computer becomes less practical when applied to an early detection system. Patients are required to have an x-ray examination in advance at a health care facility and it takes time. Therefore, in this research we propose an embedded system that can provide real-time classification results for lung conditions. The diagnosis of lung conditions in this system uses parameters of body temperature, oxygen saturation, fingernail color and lung volume. These four parameters can be easily acquired from the body non-invasively. The parameters used in this research were detected using various sensors. Body temperature was measured using the MLX90614 temperature sensor. Then the oxygen saturation was measured using the MAX30100 pulse oximetry sensor. Fingernail color detection uses the TCS3200 color sensor. Meanwhile, the lung volume was measured using a flex sensor.
This research uses the k-nearest neighbor (KNN) method in the classification process. The KNN method was chosen because it has resistance to noisy data. This method has been used by Qin [18] to classify chronic kidney disease, with an accuracy of 99.25%. In another research conducted by Wang [19], the KNN method had an accuracy of 99.67% in diagnosing epilepsy using electroencephalogram (EEG) signals. KNN was also used by Shaharum [20] in his research for classifying the severity of asthma based on the wheezing sound with the accuracy of 97.5%. In this research, we implemented the KNN method to classify healthy lung, pneumonia and tuberculosis based on parameters of body temperature, oxygen saturation, fingernail color and lung volume.

PROPOSED METHOD
The purpose of this research was to design and implement a system for early detection of lung conditions and classify them into healthy lung, pneumonia and tuberculosis. A diagnosis is given based on measurements from the temperature sensor, pulse oximetry sensor, color sensor and flex sensor. Where the acquisition of each sensor is carried out non-invasively without the help of medical personnel. The classification was processed using the KNN method on the Arduino Mega, then the results are displayed on the liquid crystal display. The KNN method requires storing all training data on the microcontroller. Therefore, a microcontroller with a large memory was used. Arduino Mega has a flash memory capacity of 256 KB, SRAM of 8 KB and EEPROM of 4 KB. This spesification is sufficient for processing the data that will be used in this research.

Body temperature measurement
MLX90614 temperature sensor was used to measure body temperature in the designed system. This sensor was used by Gu [21] in his research to design a wearable device for monitoring the health condition of the elderly. The MLX90614 sensor is a contactless temperature sensor. This sensor works by utilizing radiation energy emitted by an object when it generates heat. The radiation energy is directly proportional to the heat produced. In this sensor there is a part called thermopile, which functions to convert energy radiation into a voltage. The MLX90614 sensor has an accuracy and resolution of 0.5°C and 0.02°C, respectively.
As shown in Figure 1, the main components in this system were assembled on a black box. The temperature and flex sensor were placed on the outside of the black box to simplify the measurement process. While the microcontroller, pulse oximetry and color sensor were placed inside the black box. There were two additional components used, that is pushbutton and LCD. The pushbutton was used as a trigger to start measurement for each sensor, while the LCD was used to display the measurement and classification results. Body temperature measurement was made by pointing the MLX90614 sensor at the object's forehead. In order to obtain accurate measurement results, the distance between the MLX90614 sensor and the object's forehead must be less than 5cm.

Oxygen saturation measurement
In this research, the MAX30100 pulse oximetry sensor was used to measure oxygen saturation. This sensor has been used by Xuedan [22] to detect oxygen saturation in several parts of the body. Oxygen saturation is the percentage of the amount of hemoglobin with oxygen (oxyHb) to the total hemoglobin in the blood. The MAX30100 module sensor has two LEDs and a photodetector. One LED emits red light and the other one emits infrared light with a wavelength of 650nm and 950nm, respectively. Both light sources were emitted in the part of the body where oxygen saturation measured. Some of the light was absorbed by hemoglobin and some was reflected. OxyHb absorbs more infrared light than red light, while hemoglobin without oxygen (deoxyHb) absorbs more red light than infrared light. The reflected light was received by the photodetector. The comparison between red light and infrared light reflection was used to calculate the percentage of oxygen saturation. As shown in Figure 1, the MAX30100 sensor was placed in the hole in the black box. The hole was used to put the fingertip whose oxygen saturation will be measured. The MAX30100 sensor was mounted at the bottom of the hole, so it can be in direct contact with the finger. The black color used in the box aims to reduce ambient light interference which can affect sensor accuracy.

Fingernail color detection
The TCS3200 color sensor was used to detect fingernail color in the designed system. This sensor was used by Ragul [23] to diagnose health parameters using urine color. The TCS3200 sensor is a sensor module consist of 4 LEDs and 64 photodiode arrays. The 64 photodiode arrays used are composed of 16 red filter photodiodes, 16 green filter photodiodes, 16 blue filter photodiodes and 16 non-filtered photodiodes. The LED has a function to emit light on objects, some light is absorbed by the object and some light is reflected onto the photodiode array. Each color filter on the photodiode array is activated consecutively to detect red, green and blue colors of the objects. The activation of each filter can be done by adjusting the selector pin logic on the sensor. The photodiode generates a current that is proportional to the intensity of the light received. The current is then converted into frequency. To get the RGB color value of the object, a frequency counter program is needed on the microcontroller [24]. As shown in Figure 1, the TCS3200 sensor was mounted at the top of the hole in the black box. With such placement, the TCS3200 sensor can directly

Lung volume measurement
Measurement of lung volume in this research was performed using a flex sensor. This sensor has been used by Kristiani [25] to measure the respiratory rate, with an accuracy of 97.7%. The flex sensor is a variable resistor whose resistance value changes when the surface bends. A signal conditioner circuit is required to process the resistance change in the flex sensor. The first step of signal conditioning is to convert the resistance of the flex sensor to a voltage. This conversion process can be done using a Wheatstone bridge circuit. The Wheatstone bridge circuit produces a small voltage value, generally in the order of millivolts. Therefore, a second step of signal conditioning using a differential amplifier circuit is needed to amplify the voltage output from Wheatstone bridge circuit.
As shown in Figure 1, the flex sensor was attached to a flexible belt that is outside the black box. The flexible belt was tied around the chest with the sensor on the front. The wearing of a flexible belt on the chest is shown in Figure 2. Lung volume measured in this research was tidal volume. Tidal volume is the volume of air that moves in or out in the normal inhalation and exhalation. During inhalation, the chest expands and causes the flex sensor to bend. This condition makes the output voltage of the signal conditioner circuit increase, directly proportional to the increase in the resistance of the flex sensor. Otherwise, the chest deflates and the flex sensor returns to its initial condition during exhalation. This causes the output voltage of the signal conditioner circuit decrease. Lung volume was measured by calculating the difference in the output voltage between inhalation and exhalation. The lung volume processed in the system is a digital value of the output voltage that has been converted using the ADC on the microcontroller.

Data acquisition process
The data acquisition process from each sensor was triggered by the pushbutton used in the system. Pressing the pushbutton for the first time, triggers the measurement of body temperature by the MLX90614 sensors. The results of this measurement provide a feature of body temperature in Celcius (°C). Furthermore, pressing the pushbutton for the second time was used to trigger the oxygen saturation measurement by the MAX30100 sensor. From this measurement, the oxygen saturation feature was obtained in percent (%). The third trigger by a pushbutton has a function to acquire fingernail color by the TCS3200 sensor. In this measurement, fingernail color features were obtained in the form of R,G and B values. The last trigger was used to get the value of lung volume features from the flex sensor. After all feature values were obtained, these values were used in the classification process using the KNN method. Where this process was carried out on the Arduino Mega. The classification results are then displayed on a 16x2 LCD. The illustration of the data acquisition process is shown in Figure 3.

Dataset
The dataset used in this research was obtained from hospital in Malang and Pasuruan, Indonesia. The dataset is primary data obtained by measuring parameter values directly to patients with lung disease. Each data has 6 features, that is body temperature, oxygen saturation, the RGB value of fingernail color and lung volume. The dataset used already has a label in the form of healthy lung, pneumonia or tuberculosis, which is given based on the doctor's diagnosis. The dataset consists of 51 data with details of 24 healthy lung data, 12 pneumonia data and 15 tuberculosis data. Our dataset collection was limited due to the Covid-19 outbreak. Moreover, the data needed is primary data that requires direct contact to the patients with lung disease. So we can only collect 51 data obtained before the pandemic. In addition, small datasets can be overcome by adding data using several methods, one of which is by using the mean and standard deviation of the data obtained.

Classification of lung condition using k-nearest neighbor
KNN is a classification method that uses a supervised learning algorithm which requires labeled training data in each class. The labeling is used as a learning basis for future data processing. The KNN method performs a classification process based on the majority of classes that appear in the k number of nearest neighbor, where k is the number of nearest neighbor used. The neighbor is a training data that has the nearest distance to the test data. The most commonly used technique of determining distances between training data and test data is the euclidean distance [26]. In general, the steps of the KNN method are as follows: a. Determine the value of k b. Compute the distance between the test data and each training data using the Euclidean distance equation as shown in (1) = √∑ ( − ) 2

=1
(1) where d is the Euclidean distance, x is the training data, y is the test data, i is the n-th feature and n is the number of features. c. Sort the euclidean distance in ascending order d. Determine the majority of the class that appears in the k nearest neighbors as a result of the classification The classification process of lung conditions using KNN is shown in the flowchart in Figure 4. The system input is the feature value obtained from the acquisition of each sensor. There is a body temperature feature from the acquisition of the MLX90614 sensor, an oxygen saturation feature from the MAX30100 sensor, an RGB feature from the TCS3200 sensor and a lung volume feature from a flex sensor. As mentioned before, KNN is a supervised learning method. Therefore, training data are required as system input. In addition, input in the form of k values is also needed to determine the number of nearest neighbors used. The first process performed by the system is normalizing all feature data using the min-max method based on the formula in (2).
where xnorm is normalized value, x is raw value, xmin and xmax are minimum and maximum value of the raw data, respectively. This normalization is required to balance the data range of all features used in the Euclidean distance calculation. Without normalization, features with a narrow data range value does not have a significant effect if added with features with a wide data range. The normalization results make all features have data values between the range 0 and 1. After the normalization process, the Euclidean distance between the test data and each training data was calculated. The results of the Euclidean distance are then sorted in ascending order. The final step of the KNN process is determining the classification results based on the

RESULTS AND DISCUSSION
The test aims to determine the accuracy of the system in classifying healthy lung, pneumonia and tuberculosis condition. In addition, the measurement of computation time was also carried out which aims to determine how fast the system can provide classification results. Data acquisition was carried out in a hospital, accompanied by a doctor as an expert. The first step, the doctor examines the patients and provides a diagnosis of the patient's lung condition based on the medical record data. The next step, data acquisition of body temperature, oxygen saturation, fingernail color and lung volume was carried out using the system we have designed. These data were used as input for the KNN classification method. There are 51 datasets which are then divided into training data and test data with a 2:1 ratio. Thus, 34 data were used as training data and 17 data were used as test data. The accuracy of the system was obtained by comparing the system classification results with the doctor's diagnosis. The test was conducted using three different k values, that is k=3, k=5 and k=7. This aims to determine the optimal k value in the classification process. The results of the system accuracy test are shown in Table 1.
As shown in Table 1, there are six features used in the classification process using the KNN method with three different k values. These features are body temperature (Temp), oxygen saturation (SpO2), 3 features of fingernail color (R,G,B) and lung volume (Flex). Doctor's diagnosis in Table 1 was determined based on the patient's medical record, not based on these six features. From the test results using k=3, there are 4 data whose classification results are different from the doctor's diagnosis, which is used as a reference. Thus, an accuracy value of 76.47% was obtained using k=3. The best accuracy of the system was obtained using the values of k=5 and k=7. In this test, only 2 data were classified incorrectly. Then the system accuracy in classifying the lung condition at both k values was 88.24%.
The second test is the measurement of system computation time. This test was carried out using two k values that have the highest accuracy in the previous test, that is k=5 and k=7. The computation time measured in this test is the time required for the KNN method to perform the classification process in each test data. The computation time measurement starts when the values of all parameters have been collected  Table 2.
From the computation test result, the average computation time for k=5 was 912ms. While the computation time for k=7 was 1031ms. The use of a small k value can reduce the system computation time. This happens because a small k value can reduce the number of voting inputs in the KNN method. From this test it can be concluded that the system designed is able to provide real-time system classification results. Research on the early diagnosis of lung condition has been performed by the authors, Maulana [14]. In previous research, the diagnosis of lung condition only uses parameters of body temperature and fingernail color. Both parameter values were processed using the Naive Bayes classification method and the resulting system accuracy was 84.21%. Oxygen saturation and lung volume features were added in this latest research. In addition, this research uses the KNN method in the classification process. Although the computation time was 78ms slower than the previous research, this research was able to provide better accuracy in classifying lung condition.

CONCLUSION
The detection of the lung condition is very important, because the lungs have a vital function in the human body. This research proposed a system that is able to detect and classify the lung conditions in realtime. The classification in this system is determined based on body temperature, oxygen saturation, fingernail color and lung volume. These four parameters have a high correlation with lung conditions. This research used the KNN method to classify healthy lung, pneumonia and tuberculosis condition. 51 datasets obtained from the hospital were used in this research. Each data has a class label based on the doctor's diagnosis. From the test results using training data and test data with a ratio of 2:1, this system has an accuracy of 88.24% in classifying lung conditions with a computation time of 912ms using k=5. In order to provide better accuracy results, further improvements regarding the number of datasets used should be made. In addition, there needs to be an improvement in the lung volume measurement system. In this research, the measurement of lung volume was limited to the measurement of tidal volume.