TBNet: learning from scratch and limited training data, a CNN based tuberculosis bacilli detection

ABSTRACT


INTRODUCTION
Tuberculosis (TB) is caused by bacillus mycobacterium tuberculosis [1].Although it is a curable disease, the death rate of TB is up to 13.57% in 2018 [1].Moreover, one-third of the estimated incident cases (3 million) remain unknown to the health system due to an underreporting of detected cases and underdiagnosis [1].Most of the underreporting and underdiagnosis cases happened in India (25%), Nigeria (12%), Indonesia (10%), and Philippines (8%) [1].Therefore, improvements in the accessibility of TB diagnosis and treatment are urgently required in these countries.
There are several established methods to diagnose TB i.e., microscopic analysis, polymerase chain reaction (PCR), and electronic nose system [2].Among other methods, sputum smear microscopy is the most used method, especially in developing countries, because it is simple, low cost, and easy to maintenance [3], [4].Prepared and stained sputum specimen is analyzed manually under a microscope.TB bacilli is a gram-positive bacillus, on gram-stained smears specimen of a patient with TB, bacilli can be detected [5].TB bacilli counted manually depending on technician observation.Generally, laboratory technician needs about 40 min -3 h to examine 100 fields of view on each prepared specimen [5], [6].The number of detected TB bacilli used to determine the diagnosis [2].A prompt and precise diagnosis method of TB is essential to control the individual treatment and prevent further contagion [7].The manual microscopic method is tiring and time-consuming, gives varies sensitivity and high false-negative rate detection [8], [9].
The raw image processing of conventional algorithms without learning is limited.Deep learning has substantially better performance than conventional methods due to its automatic fast learning features [10], [11].Deep learning perform better accuracy in image classification, semantic segmentation, object detection, and ISSN: 2302-9285  simultaneous localization and mapping (SLAM) [12].Convolutional neural network (CNN) is the most popular deep learning based method due to its remarkable improvement in prediction performance using big data and plentiful computing resources [11].CNN has pushed the boundaries of what was possible i.e., automatic detection of TB bacilli by CNNs successfully identified both single and touching bacilli [13].Most of the recently advanced object detection lies on fine-tuned of pre-trained CNNs on ImageNet [14].This process can generate the final model quickly and requires way fewer instance-level annotated training data than the classification task [15].Yet, several limitations are undeniable, i.e., limited design space on network structures, learning/optimization bias, and domain mismatch [15].Therefore, a deeply supervised object detector (DSOD) framework is developed to overcome these problems.DSOD framework can learn object detectors from scratch [16].As the author knowledge, there are no research about TB bacilli detection based on the DSOD framework reported.These facts show a great research opportunity.Therefore, in this work, we proposed an enhanced architecture of the DSOD framework for TB bacillus detection based on the deep learning model.Process for collecting TB sputum smear for deep learning dataset is not easy.Domain specific data frequently limits the preparation of dataset.The usage right for medical resource often being exclusively owned by single institution.While TB patient number distribution are varied from area to area [1].Making it impossible to assemble one huge dataset.On the contrary, deep learning methods often needs images in the order of thousand to train a model.Another point to mark is the difficulty for preparing image annotation for the ground truth.Process involved in ground truth preparation for image classification, object detection, and segmentation are increased in complexity.These jobs are often done by human.Therefore, the more images that are worked on, the more hindrances will be encountered.
In this paper, we proposed a newly developed architecture with deep supervision for TB bacillus detection trained on a limited resource.First, we prepare sputum smear dataset.It consists of normal and overstained microscopical images.Then, we present a deep learning model with deep supervision designed specifically for TB detection called TBnet.We also provided a performance comparison between the TBnet with SSD, MobileNet SSD, Peelenet and DSOD model.The qualitative and quantitative studies showed that the new model has better performance in detecting TB bacilli.

RELATED WORKS
Much research regarding TB bacilli detection have been reported.Rulaningtyas et al. [17] developed an automatic classification of TB bacilli using a neural network.First, feature extraction was done to find the morphology (shape) of TB bacilli.The features arranged into a vector and submitted to the neural network.Then, bacilli classified by the backpropagation method.Although this research showed good TB bacilli classification results, this method used handcrafted feature vectors to discriminate bacilli pixels from non-bacilli pixels based on the morphology shape.Therefore, the performance heavily depends on bacilli features [13].Khutlang et al. [4] proposed an automatic detection of TB Bacilli based on two one-class classifiers.The first stage classification was done used a one-class pixel classifier.The object output filtered based on the object area.Features (Fourier, moment, eccentricity) were extracted from the remaining objects.Then, the second one-class object classification was done in different feature sets.The mixture of Gaussians performed the best result in first stage classification, but the accuracy of object outline detection is low, resulting in a low percentage of correctly classified pixels (75.74%).Ghosh et al. [18] proposed an automatic TB diagnosis by hybrid (crisp and fuzzy data representation) approach.Sputum image was pre-processed before segmented using a gradient-based region growing technique to find the accurate contour of TB bacilli.Then, the features (shape, color, and granularity) of TB bacilli extracted to generate individual fuzzy classification.Finally, the individual classification was combined to strengthen the diagnosis.The result showed quite high sensitivity (93.9%) and specificity (88.2%).Unfortunately, overlapped bacilli were failed to identify.
Recently, deep learning based research taking active part in the field of medical image analysis including image classification, object detection and image segmentation.Computer aided system for TB diagnosis is also influenced by this advance.Quinn et al. [19] presented a work for detecting TB bacilli in the microscope field of view that is captured with a mobile phone.They divide image into many small patches fed into CNN based network.The model classified each of boxes.This process is known for taking too much computation resources because the system needs to compute all bounding box's representation inside the image.They used prior knowledges training before the TB bacilli training for classification jobs done [13] used 2 folds segmentation to detect TB bacilli.In the first step, foreground and background were separated using Otsu's method to extract pixel with bacilli's color tendency.Next step, a CNN based segmentation method classified pixels inside the patches containing the objects.Using this approach, an end-to-end model training could not be achieved.Although their model setup did not need a prior training.Their result of TB bacilli segmentation accuracies relies on color feature classification during the first step.In more recent paper, Trilaksana et al. proposed the use of faster R-CNN to tackle the patch division steps before the classification process.Their approach still depends on the prior knowledge transfer learning to produce the result.
Automatic CNN based TB bacilli detection relies on the database.Costa et al. [21] introduced a two-parts database of sputum smear images i.e., i) autofocus database and ii) segmentation, classification database.Segmentation and classification were done by annotated the objects with geometric shapes: a circle is for true bacillus, a rectangle for agglomerated bacillus, and a polygon for doubtful bacillus.However, the 'agglomerated bacillus' and 'doubtful bacillus' segmentation and classification are still not satisfying.Shah et al. [22] introduced the Ziehl-Neelsen sputum smear microscopy image database which consists of seven categories datasets.The database can be used for the development of an efficient algorithm for autofocusing, autostitching, and automatic bacilli segmentation and grading.Even so, the number of images is limited for automatic TB detection.Moreover, the available database annotated images on a clean smear which differs from common sputum smear slides in Indonesia.The nature of the image background is one of the important factors that affect TB detection performance.Therefore, Trilaksana et al. [20] introduced a sputum smear images database with divers smearing background: clear and definitive bacilli to the highly cluttered and stained bluish background.There are three kinds of annotated images i.e., TB bacteria, non TB bacteria, and stainresidues.However, further development is needed to add more data.Commonly, the sputum smear images from laboratory technicians are overstained due to the dye's quality and the process of specimen preparation.In this work, we introduce our database which has been adjusted to this medical fact.

TBNET ARCHITECTURE
In this section, we introduce our CNN based model to detect tuberculosis bacilli.Our model comprises of two parts.First part is the feature extraction block, then it's followed by prediction block.Our first network part is slight adjustment from lightweight color depth semantic segmentation (LICODS) [23].The main difference is our model network consist only one branch to incorporate single image coming from input side.The first layer in comb block involving two convolutional layer and one max pooling layer.Here, we use 64 layers of depth for each convolutional filter compared 14 in the original network.Denser filters contribute more information to relay in further progress in the network.Further, we handle the propagated information contradictorily with [23].Right after the pooling layer in the transition block, we divide the main branch into 3 separate sub-branches in contrast with original LICODS [23].First sub-branch conveys the information right after the first expansion block.Second sub-branch squeezes the features maps into 64 depth layers.Third sub-branch is the expansion block part.In the end of first expansion block the second and third sub-branches are concatenated to collect multi feature map results.Expansion block two followed expansion block one with one a smaller number of depth wise convolutional layer.This fashion is repeated until three times.The first sub-branch then merged with three expansion block result.Next block in our network is the prediction block.Here, we employ similar approach for DSOD Net [15] to serve multiple scale corresponding with different size feature maps and combined with dense 601 structure prediction.We carefully follow the dense prediction structure presented in [15].Our network utilizes larger resolution feature map prediction structure compared the original dense structure.The largest feature maps resolution is 60×77.We increased the feature maps resolution to handle small object in digital sputum images.The prediction structure positioned after the prediction block except for the largest feature maps which connected to convolution layer after expansion block concatenation.In each prediction layer comprise of multi-scale information forwarded from different stages of layers.

DATASETS
TB bacilli expectorated in human's sputum is visible in individually or clustered.The mycobac-terium tuberculosis within stained sputum smear under a microscope seen as red-purplish colored blob with the shape of rectangular over the blue background, shown in the Figures 1(a) and (b) shows sputum smear images consisting of mycobacterial cell's wall comprises of a substance composed of mycolic acid.These is a β-hydroxy carboxylic acids with chain lengths up to 90 carbon atoms.The property of acid fastness is related to the carbon chain length of mycolic acid found in any species [24].The mycolic acid raise a barrier to dye entering, this problem is usually overcome by adding a lipophilic agent to a concentrated aqueous solution and partly by heating [25].We use TB positive stained sputum smear as its slides.20 different images are acquired from spatially shifted view of fields.We prepare a microscope to observe sputum smear on objective glass.Then, we turn on the computer connected to the microscope.We set computer's programs to open an observation program that is used to capture sputum microscopic images.After we turn on the computer, we open the standard cell application that is used to capture the sputum image, we set the cellsens application to match the magnification of the microscope.The magnification that we use on the microscope is 1000 magnifications.After everything is ready, we place the sputum smear on the preparation table on a microscope, after that we just need to observe and shift the preparation to move to the next layer.We record the digital form of microscope's field of view under the supervision of two physician.Our tuberculosis database (TBDB) contains of 350 images.Our data is a portray of a routine smear examination by expertise at medical facility.The process for data preparation is contributing for the dye's smear evenness quality.One example is during the patient's sputum spreading on the middle 1/3 slide.The pressure applied into the applicator are susceptible to the intensity change.Another example is the quality of dye.Therefore, images in the TBDB are composed of normal-stained and over-stained sputum smears.We prepare the ground truth for TB bacilli detection by human input.We use two labels to annotate images, which are: TB bacilli and debris.The annotation mark for overlapped bacillus is carried out one bacillus after another, and there is no distinguishing mark for any of them.We mark area containing one of three labels inside the image using labelImg program.Figures 2(a) and (b) shows the annotated sputum smear images.Data validation is established during annotation process by an expert.Afterward, the marking result is checked by the researcher which has the expertise on TB sputum smear examination.The two-step validation is to ensure the ground truth correctness.We obtain the total of 3,102 bacillus with shape and color intensity variation.Its intensity ranged from faint to strong red-purplish color.The Bacillus shapes are also varied, from single to in group bacillus.Figure 2. TB bacillus after annotation process on sputum smear images: (a) TB bacillus appear with its original dye and clear shape in front of faint background and (b) TB bacillus appear with slightly red-purplish

RESULTS
In this section, we present our finding and result for our proposed TBNet for detection TB bacilli in sputum smear images.We analyzed our network using metric stated in [25].In ( 1) is an expression for the Precision value.Precision value is an explanation toward model capability to recover only the precise object completed with its location across all detection results.True positive (TP) define object predicted correctly with the intersection over union (IoU) value above some points.False positive (FP) define object predicted in false manner.In ( 2) is an expression to find the recall value.Recall value is the model capability to recover all the precise object in observation.True positive (TP) define object predicted correctly with the IoU value above some threshold points.In ( 3) is used for calculating IoU.Where   is a prediction bounding box and   is a ground truth bounding box.Subsequently, precision  recall curve is composed of each predicted class.Lastly, the average precision (AP) number is calculated with (5).Table 2 shows quantitative detection result over several models for comparison.(4) We evaluated and compared our TBNet quantitative result toward several leading object detection model available, namely SSD, MobileNetSSD, and PeleeNet.We ran our test images in a single pass for each model training result.Our model performs comparably well throughout our tests.The mAP number give 3% significant different toward our based model DSOD, and more than 10% toward traditional approach model for training such as SSD and PeleeNet.These number continues at each class average precision result.Our approach to increase the feature maps resolution twice as big as original DSOD in final block perform outstanding.Figures 3(a) to (j) (in appendix) shows detail performance for each object detection across our test models.The other models in our comparison failed to detect TB Bacilli individually, except in Figure 4(d) SSD detection model.This behavior is predicted, for the reason that our model and SSD share the common prediction block basis.Except that TBNet's backbone comprises more efficient feature maps filter parameter, also a dense prediction structure inherited from DSOD being adopted.MobileNetSSD perform poorly to detect objects in across our dataset test images, therefore we exclude the detection results in the

CONCLUSION
In this paper we deliver our strategy to develop a CNN based model to detect object, especially TB bacillus and debris in a sputum smear image.Our strategies to develop detection model comprises a CNN backbone that accommodate a fast-training method without any prior knowledge toward dataset being present.Also, a comprehend understanding to construct a model architecture with limited resources come to our attention.Our result show that our model fit in our initial intention to minimally train a model.Furthermore, limited number of training image data also did not negatively impact our model performance.This training fashion support computer vision approach especially convolutional based neural network in a way efficient detection method for the limited image dataset and training session.

ACKNOWLEDGEMENTS
Authors would like to thank Faculty of Science and Technology, Universitas Airlangga for financial support through Penelitian Unggulan Fakultas (PUF) scheme.

Figure 1 .
Figure 1.Sputum smear images after the dye has fixed and examined under the microscope: (a) sputum smear image with light blue background and (b) sputum smear images with darker blue background

Figures 4 (
a) to (d) provides qualitative comprehension comparison toward our TBNet performance.Here, we use a test image taken from outside our datasets.Similar result with our previous test is shown, TBNet perform well across our 1000 test images.Figure 4(a) shows TBNet managed separate two debris Bulletin of Electr Eng & Inf ISSN: 2302-9285  inside the images quite well.This performance continues for TB Bacilli detection.Two TB Bacillus in close perimeter are detected individually.

APPENDIXFigure 3 .Figure 3 .Figure 3 .
Figure 3. Left side figure shows precision x recall curve performance for Debris detection.Right side figure shows precision x recall curve performance for TB Bacilli detection; (a) and (b) TBNet

Table 2 .
Sputum smear object detection result, shown result is in (%)