Vision-based autonomous mapping and exploration on robot tracked vehicle

ABSTRACT


INTRODUCTION
Computer vision properly address an interdisciplinary scientific field that emphasize on how a computer being able to perceive real world and acquire a high-level understanding on the acquired data [1].It can further be helping in engineering and project management by processing, analysis of the digital image, and extraction of high-dimensional data from the real world to produce decision making on the receive information.Computer vision has been used in tracking people's movement [2], productivity analysis [3], progress monitoring [4], and health and safety monitoring [5].It has been widely used to a degree that some computer vision processing achieved an unimaginable feed of creating its own world known as virtual reality (VR) for short.This VR for instance has been trending by the masses that it created another reality that exist within its own space.This feat can only be achieved by computer vision research and technology which vigorously explored by the researchers [6].Vision based mapping were quite a viral technology used in the virtual world as the movement of the user were determined by the position of the VR headset referencing between VR and real world.This application also viable to the real world vision based mapping which require a set of point and references that make up the map through calculation and estimation of computer vision digital image processing [7].Within the framework of this project, an autonomous vision-based mapping algorithm will be designed.The image containing real-world information was recognized and digitally processed to create a map through a number of calculations and estimates.The structure of the paper as follows.The literature review is introduced in section 2. The methodology is described in section 3. Result and conclusion are introduced in sections 4 and 5 respectively.

LITERATURE REVIEW
Silmuntaneous localization and mapping (SLAM) [8] which stands for simultaneous localization and mapping.SLAM mapping are done online with no knowledge of the robot location during data gathering.Map are subsequently used by the robot to navigate through the environment.SLAM algorithm involves odometry, landmark prediction, landmark extraction, data association and matching, pose estimation and map update.All of the process were used in cyclic fashion.Kalman filters are commonly used in SLAM which is extended kalman filters (EKF) [9]- [11] and rao-blackwellized particle filters (RBPF) also routinely used [12], [13].SLAM preferred the use of raw range scan sensor and feature in landmark extraction.Sensors that widely used in SLAM would be laser based and sonar based sensor which is compatible with SLAM.Over the research, there are also visual SLAM which provide heavy data gathering-comes with noises and uncertainty [9]. Figure 1 shows the result for SLAM.
S-PTAM [14] stands for stereo parallel tracking and mapping (S-PTAM) is a more robust, flexible and accurate visual SLAM.S-PTAM operates by matching landmarks on pair of synchronized views, recovering real depth of accurate map.By tracking robots frame after frame, it will improve the depth estimation and track of the robot pose.S-PTAM divides into two parallel task working in parallel which is camera tracking and map optimization.Main characteristics of S-PTAM is to solve SLAM problem in heavily parallelized achieving realtime performance, whilst minimizing inter-thread dependency.By using wheel odometry, point initialization, mapping and tracking for stereo constraint, the accuracy and robustness of the system improved.Global consistency can be improved by a maintenance process that runs in an independent thread iteratively refines the map in local co-visible area.S-PTAM relies on matching local image features for localization and mapping.For the systems, good features to track (GFTT) [15] algorithm is selected to detect the image key-points, and the binary robust invariant scalable keypoints (BRISK) [16] extractor to describe their features.S-PTAM pose tracking consists of 4 sequential steps which is matching, pose refinement, key frames selection and map points creation.Figure 2 shows the result of S-PTAM.Figure 2 shows result of S-PTAM.Leonardis et al. [17] using visual odometry algorithm (eVO) which aim to estimate the robot future position from its starting point.This algorithm can achieve two task.First, mapping which consists of providing a map with limited number of point localize in space using interest point [18] and features from accelerated segment test (FAST) [19].Second, localization which temporal tracking of 2D points from 3D image and matching 2D points in the image with kanade lucas tomasi (KLT) [20].By minimizing re-projection error, 3903 position and orientation of the camera are computed using random sample consensus (RANSAC) [21] procedure.The robot would be equipped with stereo camera pair and image have been processed using dense stereo algorithm [22], [23].First steps would be rectification which warp original left images such that epipolar lines are aligned with the images rows.Second steps, pre-filtering by using Laplacian-of-Gaussian or bilateral filter [24], [25] that smooth the rectified images with an edge-preserving filter to remove high frequency components.Third steps, correlation by using sum-of-absolute differences (SAD) and process of collecting data from every member of a population (CENSUS) [26] that construct disparity image by locating matching pixels in the left and right pre-filtered image.Figure 3 shows the result of eVO.

METHOD 3.1. Develop autonomous mapping and exploration algorithm
There are 7 steps in this algorithm that covers the necessity for the algorithm to work properly and create a desired outcome.The first step consists of acquiring images from a designated directory for processing.The image has been scaled to grayscale to improve image recognition accuracy.Then, the region of interest was determined by four points defined in the picture to further improve the reading of the object.An image detection known as speeded up robust features (SURF) was used to detect the point of interest in the image and to match the points of interest in the images.The algorithm then triangulates the distance from the object to the camera using the previously defined camera settings.The entire process was used in the output creating a dense construct map. Figure 4 shows an autonomous mapping and exploration algorithm flowchart.

Image acquisition
Using Raspberry Pi 3 connected to MATLAB via hardware addition and Simulink, the image data taken from the OV5647 camera was recorded in a designated directory.From the directory, the pictures were captured and processed.In total, five images were taken with a slight angular difference between the images and the object.

Grayscale in image processing
An early photographic images was a monochrome image which represents by black and white.Nowadays, colour images were a default as a result of photographic evolution on digital photography.For easy image processing, a gray scale is necessary for accurate image recognition and processing.The grayscale process is an image that emerges from colored images [27], [28] that is gamma-compressed only to be suppressed by gamma expansion.Afterwards, the weighted sum of the RGB color space could be applied to a linear color component to calculate the linear luminance.

Region of interest in image processing
The region of interest or short ROI is a selected region for an image in order to emphasize the focus of the images.In ROI can be found the point of interest that creates a form of the desired object in the images.In this job, the ROI is selected as a square [29]- [31] in the image to determine the focus of the object in the images.The purpose of an effective object detector is the precise determination of the location, extent and shape of the object.
SURF detection; SURF is an approximate Hessian matrix which relies on integral images to reduce computation time.The use of the Hessen matrix was known to increase accuracy and reduce computational time.There are 4 steps to SURF: Step 1: to find image interest point by using Hessian matrix.Step 2: to find the major interest points in scale space by using non-maximal suppression.
Step 3: to find feature direction to produce rotationally invariant features.
Step 4: to produce feature vectors.

Matching features from multiple images
Image matching features is crucial in detecting two of the same point in different images.However, it requires a similar point of interest from the two images in order for the matching to execute.The matching in between two images brings the true corners of multiple images and creates one true images.Matching features were used in this task as the post processed of SURF detection.

Triangulation in map estimation of images
Triangulation in image processing is a method used to estimate the position of the camera and object in the images.The images were shot from two slightly different angles in order to create a disparity between them.The distance was then calculated using a triangulation formula based on the two angles of the lens and the position of the object.

Analyse the accuracy of the autonomous mapping and exploration algorithm
In this experiment, the data collected from the algorithm was displayed on the 3D dense construct map.Based on real-life data, the coordinates of interest points appearing on the 3D map were analyzed and calculated.The calculation was based on the distance and position between coordinates and real-life parameters.Then, the experiment could be concluded as accurate or not accurate.The experiment was conducted on two different objects in order to analyze the accuracy precisely.Two of the objects were a shoebox, as shown in Figure 5, and a chess piece, as shown in Figure 6, which were placed 50 cm from the camera.The points were then calculated using the 2D geometry formulas of squares and rectangles.Once the square had been identified, the planar from the dense construct was shown and a conclusion on the accuracy of the algorithm was discussed.Formulas for square geometry and rectangle geometry were used.From the formulas, the length and width of the surface on the object were also calculated.The area was compared for accuracy.

. Preliminary task: validation of camera parameters
In the experimental setup, the distance between the object and the camera were measured, the height, the width and the length of the object were also measured.The measured camera parameters serve as reference Figure 7 shows the mean error from the calibration of camera parameters which is set at 0.44 will be represents an accuracy of 60%. Figure 8 shows the images of a checkerboard being projected in 3D mapping from the camera.The plane that appears in Figure 8 represents the surface of the board as seen from the camera at various angles.By calibrating the camera, the estimation of the camera parameters increases in accuracy.

Develop autonomous mapping and exploration algorithm
In this part of the result, the images from the task were taken step by step before the final result were shown in 3D projection.Both the box and chess piece will be shown in this part of the result.The mapping and exploration will be validated with SURF detection, matching features and dense construct 3D map.The validation is based on detect the point of interest in the image and to match the points of interest in the images.The point of interest in the image which later being used in matching features for matchup between the image's points of interest.The algorithm then proceeds with triangulation which calculating the distance between the object with the camera using camera parameters set beforehand.

Validation of SURF detection
Figure 9 shows box images that contains point of interest for the SURF detection and Figure 10 shows the chess piece images that contains point of interest for SURF detection.The SURF detector only detect 50

Validation of matching features
This part shows the matching features of points of interest between two images.The images were a comparison between a previous and an after image, creating a line, circle, and cross that represent the matching features of the two images.The cross is the previous image's point of interest and the circle is the later image's point of interest.Figure 11 shows the matching points of the box images and Figure 12 shows the matching points of the chess piece images.

Validation of dense construct 3D map
Figure 13 shows the image of a dense 3D construct constructed from multiple images of the box.From Figure 13, the red point is the position of the camera and the dispersed color range from green, blue, and yellow is the marker of the points of interest extracted from the multiple images.The value for the dense construct is scaled to 10 times smaller in centimeters per cube.Figure 14 shows the 3D construct of a chess piece.

Analyse the accuracy of the autonomous mapping and exploration algorithm
The dense construct from Figures 13 and 14 will be coordinated and marked to a closest point.Base on Figure 15, the box length and height has been measured to conclude the area on the box that surfacing the camera.Figure 16, the planar from the coordinate were extracted and calculated.Taking the height of point 4 to point 1 as the height difference due to smaller indifference in X axis.The detail about difference in points and surface of the shoebox is show in Tables 1 and 2.   The difference in surface was 50.4 cm 2 of surface area.The accuracy for the box estimation in 3D dense construct: 299.6/350 ×100=85.6%accurate.For the chess piece, the Figure 17 shown the planar that in the chess piece.The board of the chess piece will be the planar for accuracy test.From the Figure 14, the result seems to have a slight distortion in the planar rotation being the planar tilting backward from the camera.The calculation of the planar was calculated as the rotation of the planar tilting backward from the camera.Figure 18 shows the coordination of the 4 points of the planar.The planar that exist on the chessboard surface were tilted backward away.Thus, the accuracy of the map was lower, but the planes still needed to be calculated to confirm the accuracy between planes and chessboard.In Table 3, the plane surface area was calculated.The length of the four sides of the plane was rounded.Tables 3 and 4 show how to calculated the plane surface and the area for the chessboard measured.The difference in surface was 389 of surface area.The accuracy for the chess board estimation in 3D dense construct: 700/1089×100=64.27%accurate.

CONCLUSION
As conclusion, the development vision based autonomous mapping and exploration algorithm being able by using eVO through the lens of mono vision sensor as input.The controller design for the robot is a mandatory in order to acquire data on the image.Based on the result, the robot moved as intended using Raspberry Pi 3 as the receiver for the controller.The robot being able to move according to the command send to the robot.The box accuracy test in the 3D projection was a success, showing 85.6% accuracy to the real world, while the chess piece accuracy test was unsatisfactory, showing 64.27% accuracy with a tilt backward of 62.65° from the camera.
In this project is used a box and a chess piece as object was a small object.It is recommended that the use of bigger sample to be processed by the algorithm and the accuracy be tested.The region of interest on this project was manually set using human assistance.The region of interest can render the result inaccurate due to the region of interest not being able to cover the object completely.Therefore, for future project need to add a more robust region of interest in order to gain an accurate result base from the digital image.

Figure 4 .
Figure 4. Autonomous mapping and exploration algorithm flowchart

Figure 4
Figure 4 Autonomous mapping and exploration algorithm flowchart


ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 12, No. 6, December 2023: 3901-3910 3906 strongest points that being recognize by the features.Every point that has been detected were mark with cross and a circle around the cross as shown in Figures 9 and 10.

Figure 9 .
Figure 9. SURF on shoebox Figure 10.SURF on chess piece

Figure 11 .
Figure 11.Matching detection on shoebox Figure 12.Matching detection on chess piece

Figure 13 .
Figure 13.3D construct of shoebox Figure 14.3D construct of chess piece 3907

Figure 15 .
Figure 15.Surface for shoebox Figure 16.Planar for surface shoebox

Figure 17 .
Figure 17.Surface of chess board Figure 18.Planar for chess board

Table 1 .
Difference in points

Table 2 .
Surface of the shoebox

Table 3 .
The planar surface