Bulletin of Electrical Engineering and Informatics

Received Jul 17, 2022 Revised Sep 30, 2022 Accepted Oct 18,2022 Caries may be halted or reversed in their progression by early detection, better hygiene habits, and coadministered drugs. The major clinical procedures for identifying dental caries are visual-tactile examination and dental radiography. However, due to their location, approximate caries exceedingly difficult to detect and affect the clinical assessment. Incorrect interpretations may also hinder the diagnostic procedure. Computational approaches and technology can be used to help dentists assess caries. Teledentistry has the ability to improve dental health care by providing access to dental care services from a remote location. Teledentistry helps identifying various stages of caries lesions using neural network and devices connected to the internet. This research develops an image classification for teledentistry systems using depthwise separable convolutional neural network. The trainable parameters reduction of depthwise separable convolution (DSC) successfully reduces the computational cost of conventional convolutional neural networks (CNN). As a result, the DSC model is reduced by 91.49% when compared to the traditional CNN model. Several DSC models improve conventional CNN accuracies in the training, validation, evaluation, and testing stages.


INTRODUCTION
Teledentistry has enormous potential to enhance oral health service by expanding access to the telemedicine domain through the use of digital technology, communication technology, and dentistry [1]. Compared to traditional clinical treatments, teledentistry procedures can exhibit comparable efficacy and cost [2]. Teledentistry is a branch of telemedicine that offers convenience in dental health services such as diagnosis, action plans, consultations, and follow-up via digital transmission, which may be accessible from anywhere. Teledentistry may have a substantial impact on dental health care in rural places, for example, by giving consultation advice and supporting services in remote areas, thus saving travel expenses, waiting time, and unproductive time [3]. With teledentistry, a dentist can diagnose the various stages of a caries lesion using photos of the teeth taken from a smartphone or an intraoral camera. The findings show that severe lesions and healthy teeth may both be detected with high sensitivity; for example, the detection specificities for lesions at all stages were always higher than 83.3% [4].
Caries is the consequence of a continual process of multiple demineralization and remineralization cycles. Improved hygiene habits, early detection of caries, and supportive therapy may contain the potential to halt or reverse the progression of the condition. Clinically, visual-tactile examination and dental radiography are the primary methods for detecting dental caries [5]. However, due to their location, caries approximations are extremely difficult to recognize, making clinical examination challenging. Furthermore, inaccurate interpretations may obstruct the diagnostic procedure. Computational method and technologies can be employed to assist dentists in caries assessment [6]. The Laplacian filtering, adaptive threshold, statistical feature extraction, and back-propagation neural network are components of a computational algorithm which is widely used as a dental caries diagnostic system. The system shows that the back-propagation neural network can effectively classify a tooth surface as having caries or not having caries. The back-propagation neural network was trained over 105 intra-oral digital radiographies, which were annotated by the dentist. The model was performed with a learning rate of 0.4, momentum of 0.2, and 500 iterations. The model has 97.1% accuracy [7].
A graphics and intelligence-based script technology (GIST) descriptor was implemented to extract important information from dental caries images for black's categorization. Decision trees, fuzzy sugeno, neural networks, K-nearest neighbor, AdaBoost, and Naive Bayes are approached to categorize the primary and reduced characteristics. According to the findings, the AdaBoost classifier has 90.92% sensitivity and 90% specificity when used to diagnose infected teeth [8]. A modified linearly adaptive particle swarm optimization (LA-PSO) demonstrates a great binary classification of dental caries and non-caries. The feature extraction uses a grey level co-occurrence matrix (GLCM), which is acquired from the individual panaromic X-ray image. The combination of LA-PSO and GLCM yielded 99% accuracy [9].
Orthopantomogram (OPG) and radiovisiography (RVG) are two types of X-ray detectors used in dentistry. The upper and lower teeth are captured in a single picture by OPG. RVG takes X-ray pictures that are utilized to diagnose a specific tooth. Both detectors provide X-ray pictures, which are the most commonly used techniques for diagnosing dental disorders. In medical image processing, deep learning algorithms have a huge range of uses, including the classification of dental caries. A transfer learning architecture, namely VGG16, was trained using 251 (1,000×1,496 pixels) OPG and RVG X-ray dental images. The findings demonstrate the excellent accuracy of 88.46% with which the transfer learning architecture can identify OPG and RVG X-ray dental images [10]. Deep convolutional neural networks (DCNNs) have yielded impressive results in the classification of dental caries on periapical radiographs. By distributing 3,000 periapical radiographic images, 80% of which were training set images and 20% of which were validation set images, the DCNN's diagnostic showed the accuracies of premolar, molar, and both premolar and molar models were 89.0%, 88.0%, and 82.0% [11].
In a different study, a classification technique that combines convolutional neural networks (CNN) and long short-term memory (LSTM) model was developed for use in periapical dental images for the detection and diagnosis of dental caries. The CNN-LSTM model was trained using 1,500 dental X-ray images in 300 iterations. According to the results, an optimal CNN-LSTM model performed with 96% accuracy [12]. 112 bitewing radiographs were augmented to improve CNN to classify three labels, namely normal, incipient, and advanced. The augmentation result produces 3,464 images. The transfer learning models were selected, namely Inception and ReNet, which were trained in 11,500 iterations. The result shows the highest area under the curve (AUC) value of 86.1% [6]. Based on the faster-regional CNN (Faster-RCNN), an attempt was made to devise a robust approach for detecting and classifying various oral and dental disorders using OPG pictures. The technology will give a revolutionary approach for detecting and classifying different types of teeth (incisors, molars, premolars, and canine teeth), as well as some underlying oral oddities including fixed partial dentures and impacted teeth. The results show that Faster-RCNN has a detection accuracy of 91.03% [13]. The CNN were also trained on the annotated data to achieve panoramic X-ray segmentation. The dataset containing 1,000 images is divided into 14 classes, with each class representing a distinct dental issue. Over 200 epochs, an efficient residual factorized ConvNet (ERFNet) was performed, yielding 98 % accuracy, 98% precision, 91% recall, and 93% F1-score [14].
A modified DCNN has been studied to classify dental cavities from open-source datasets, namely the Kaggle dataset that is classified into cavities and non-cavities. The modified DCNN model was trained using 74 images and 30 iterations. With binary cross entropy loss and a learning rate of 0.001, the DCNN model was built for the classification of dental caries and non-caries. The DCNN model was able to obtain a maximum accuracy of 71.43% by adjusting hyperparameters [15]. By increasing the dataset images, a transfer learning model was trained using 500 images containing caries and non-caries labels. In 100 iterations, VGG16, VGG19, InceptionV3, and ResNet50 were successfully trained with 99.37%, 98.48%, 99.89%, and 98.01% accuracy, respectively [16].
This research proposed a classifier to support the teledentistry system shown in Figure 1 to identify the image that was captured using the intraoral camera. The controller will transform the input images to the classifier as tensors. The result will be transmitted to the web server for storing and displaying on the front-end as a smart teledentistry using the deep learning method. Therefore, this research proposed a new modified model which simplifies the conventional CNN architecture, namely the depthwise separable

Datasets
To ensure that the model's performance remains stable, the dataset is divided into 10% random testing distribution and 90% training distribution. The training distribution is also divided into 67% as the training set, 23% as the validation set, and 10% as the evaluation set. Figure 2 shows the dental caries dataset.

Figure 2. Dental caries dataset
The dataset was enhanced with ImageDataGenerator, which offers standardization, rotation, shifts, flips, rescale, and other features. The name ImageDataGenerator refers to real-time data augmentation, which implies that the model receives fresh variants of the pictures while using less memory. The ImageDataGenerator configured 23% of the validation set along random flips, with a rotation range of 2 degrees, a zoom range of 0.1, and a rescale of 1/255.

Depthwise separable convolutional neural network
CNN is commonly utilized as a classifier for image classification. CNN reduced image dimension using a downsample technique, namely feature extraction. The output of the feature extraction is flattened as the classifier input [17], [18]. Figure 3 shows a conventional CNN architecture. The output of feature extraction can be formulized as [19]: by assuming ( +1 ) as an input tensor which comprises of triple indexes including height (ℎ ), width ( ), and depth ( ). Spatial location of (ℎ , ) used from bank filter of and is a receptive field in . Therefore, the total trainable parameters of the feature extraction represent as Kernel formulized as [20]:  Figure 3. Conventional CNN architecture [15]- [18], [21] Considering how many convolutional parameters there are in the Kernel that need to be computed as vectors, CNN models with high-resolution images demand additional memory allocation. As a result, by decreasing convolutional trainable parameters, numerous CNN models may be simplified. Using DSC, the convolution layer of a conventional CNN is optimized to reduce computational costs by lowering the number of trainable parameters while maintaining equivalent results [22], [23]. The downsample architecture of DSC is represented in Figure 4.  Figure 4, DSC consists of a depthwise filter and a pointwise filter, which are formulized as: where denotes as 1 × 1 convolution namely pointwise filter. Therefore, the total trainable parameter of DSC layer formulized as [24]: Compares to (2) and (4), DSC shows lower trainable parameters than conventional CNN, which is formulized as [20], [24]. Based on (5), it is proven that DSC reduces the trainable parameter of the convolution layer.

Hyperparameters
This research proposed a depth-wise separable convolution architecture of the conventional CNN architecture. As an initial hyperparameter of the training process, the image size was set at 224×224 resolution, 100 iterations, and 0.5 dropout. The augmentation setup was utilized to avoid overfitting models. The augmentation setup contains 1/255 of rescale, 2 degrees of rotation range, 0.1 of zoom range, horizontal flip, and vertical flip. The first proposed architecture contains 3 convolutional layers with a total number of filters of 32, 32, and 64, respectively. This layer uses 3 Kernels and relu as an activation function. The maxpooling layer was constructed using a 2×2 pool size and placed in front of the convolution layer. The fully connected layer was designed in 2 layers, where the latest layer represents the desired image labels. Figure 5(a) depicts the first proposed conventional CNN architecture, while Figure 5(b) depicts the DSC architecture. The second proposed architecture contains 4 convolution layers with a number of filters of 32, Kernel size (5, 5, 1, and 5), and strides (4×4, 1×1, 1×1, and 4×4). The fully connected layer uses the same setup as the first proposed architecture. The second proposed conventional CNN architecture is shown in Figure 6(a), whereas the DSC architecture is shown in Figure 6 Based on Figure 5, the DSC was reduced to 83.79% of the trainable parameters of the first proposed architecture, and in Figure 6, the DSC was reduced to 90.92% of the trainable parameters of the second proposed architecture. The fully connected layer has similar trainable parameters to each proposed architecture. The proposed models then compared to the models that have architectures as shown in Table 1. Odd number models (except model-11) are conventional CNN models, whereas even number models (except model-12) are DSC models from odd number models. Sequentially, model-2 is a DSC model derived from model-1, model-4 is a DSC model derived from model-3, model-6 is a DSC model derived from model-5, and so on. Model-11 is a transfer learning model, namely InceptionV3, and model-12 is a transfer learning model, namely VGG16.

Model performance
The models were run on an Intel (R) Core i5-6300 HQ CPU running at 2.30 GHz, 16 GB RAM, and a Nvidia GeForce GTX 960 M with 4 GB VRAM. The classifiers were trained using data distribution, which includes training split, validation split, and evaluation split. The percentage of the data distribution was explained in the dataset part. The model performance is shown in Table 2. Based on Table 2, the term of the optimization of DSC is only to reduce the computational costs which are shown in the trainable parameter column. Trainable parameter reduction can be calculated by the parameter rates in the feature extraction layer. The reduction of the DSC layer in model-2 against model-1 is 72.58%. The reduction of the DSC layer in model-4 against model-3 is 88.82%. The reduction of the DSC layer in model-6 against model-5 is 83.80%. The reduction of the DSC layer in model-8 against model-7 is 90.92%. The reduction of the DSC layer in model-10 against model-9 is 48.44%. The reduction of the DSC layer in model-14 against model-13 is 91.49%. The reduction of the DSC in model-16 against model-15 is 45.55%. Meanwhile, the DSC model accuracy can be maintained as if it were conventional CNN accuracy. Figure 7(a) represents the highest performance graph of the training accuracies. The highest performance graph of the training losses is depicted in Figure 7 The proportion of correct classifications was the primary performance variable. The secondary performance indicator was the visualization of CNN-focused features into related image regions. [26]. Based on Table 1, the confusion matrices can be utilized to visualize the consistency of the model performance in test data distribution, which means the data was never trained. We computed the precision, recall, and F1-score for the data distribution of normal labels (N) and carries labels ( values. When the actual class is false but the predicted class is true, this is known as a false positive (FP). Situations where the actual class is true but the predicted class is negative are known as false negatives (FN). The proportion of correctly predicted positive observations to all predicted positive observations is known as precision. The ratio of correctly predicted positive observations to all observations in the actual class is known as recall (sensitivity/true positive rate). The F1 score is calculated by averaging precision and recall. As a result, this score takes into account both false positives and false negatives. The confusion matrices can be formulated as [26]: = +