RT-RCT: an online tool for real-time retrieval of connected things

Received Feb 22, 2021 Revised Jun 14, 2021 Accepted Aug 3, 2021 In recent years, internet of things (IoT) represents a giant and a promoter area in innovation and engineering fields. IoT devices are spread in various fields and offer advanced services which assist their users to monitor and control objects remotely. IoT has a set of special characteristics such as dynamic, variety of data and huge scale which introduces a great challenge in the field of retrieval technologies, more precisely real-time retrieval. This paper addresses the issue of real-time retrieval of connected things and tries to propose an innovative solution which allows the retrieval of these things and their descriptive data. The paper proposes an on-line tool for real-time retrieval of connected things and their descriptive data based on network port scanning technique. The performance of this tool proves to be powerful under normal conditions, however more tests must be implemented in the aim to improve the proposed solution. The tool resulted from this work appears to be promising and can be used as a reference by network administrators and IT security managers for the development of new security mechanisms and security reinforcement.


INTRODUCTION
Information and communication technologies (ICT) know a significant development especially in terms of hardware miniaturization, cost reduction and energy consumption optimization. This advancement enables the interconnection of a large number of physical objects namely using the Internet, forming what is called the internet of things (IoT). The IoT provides the opportunity to interact with these objects through sensors, actuators and smart applications which may help users in several areas such as transport, logistics, health care, agriculture, etc [1].
IoT represent a static objects that will be intelligent and able to share information and communicate with other devices in an autonomous way [2]. There are many elements used to run the IoT technology which include hardware and software such as sensors, GPS, cameras, applications, and so forth [3]. IoT devices are spread in different areas such as e-tracking, e-commerce, e-home, and e-health, etc. Thus, during the last ten years, the IoT technology has been a research focus [3]. These devices produce a big quantity of information, heterogeneous data, and their state changes very quickly (in a short period of time).
Internet is a popular global information system where users can search relevant information using search engines (SE). SE is a type of software that organizes various content collected from all resource  [4]. Searching in IoT networks has a different goal than the ones that typical search engines adapt where the users would operate the objects locally or remotely. As a result, this distinguishes between both sides and requires a new design concept for an IoT search engine. This is not simple due to the need to design new techniques of crawling, indexing, storing and querying [5]. Many IoT search engines [6]- [9] are designed to allow the search/retrieve and identification of connected things. Shodan.io is a search engine designed by programmer John Matherly in 2009. It interrogates devices ports and grabs the resulting banners, then indexes the corresponding public IP address and search into an intern databases for futures lookup [10]. Another popular IoT search engine is Censys, which collects all data it can about the connected devices in IPv4 on the net. it use the open source port scanner ZMap specially + ZGrab and stores everything it retrieves in a database, which is then accessible via the web interface, an API or plain text listings to download [11]. Thingful can be described as a "discoverable search engine" which allows its users to have a geographic index of connected objects around the world (https://thingful.net/). As such, Thingful boasts that it can index across multiple IoT networks and infrastructures, because this search engine can locate the geographical position of objects and devices [12]. But these search engines do not perfectly meet the need due to the quick changes of devices state and the complexity of their results, which require the development of a new mechanism for IoT devies retrieval which can respond to the different issues like real-time retrieval, fast response and accurate results.
This work aims to propose an on-line tool for real-time retrieval of connected things in worldwide and descriptive informations related to these devices based on network port scanning technique. The paper starts by introducing the basic concepts related to the development of the proposed tool. In Section 3, the specifaction requirements and the proposed tool are presented. Then, in Section 4 we will present and discuss the results. Lastly, the conclusion and future improvements.

BACKGROUND 2.1. Internet of things and connected things
IoT represent a giant infrastructure that enable machine-to-machine communication, remote monitoring and control of objects/devices in many fields and applications such as industry, agriculture, healthcare and education. It represents a network of connected things which are connected to IoT, and able to gather and share information related to the way they are employed and almost the environment around them. IoT represent the main focus of many research works [13]- [18] in latest years.
Connected things refer to smart devices, autonomous electronic devices that may be connected with each others in a network, mobile devices, computing devices which are typically small enough to be handheld. These things are connected by using various wired and wireless networks and protocols (Wi-Fi, 3G and 4G networks... etc.), and are usually monitored and controlled remotely. They are commonly embedded with a set of technologies such as processing chips, software, and sensors.
Things, in the IoT sense, can refer to a wide variety of devices such as heart monitoring implants, biochip transponders on farm animals, cameras streaming live feeds of wild animals in coastal waters, automobiles with built-in sensors, DNA analysis devices for environmental, food, pathogen monitoring, or field operation devices that assist fire fighters in search and rescue operations. Legal scholars suggest regarding "things" as an "inextricable mixture of hardware, software, data and service [19].

Searching
Web sites, which index and class other web sites according to their keywords, explanations and contents and make it easier and faster to reach obtained site-search results, are called as search engines [20]. Since their appearance in the 90s, they recognize a great success and presents a change in the way of information retrieval. It is a tool based on a set of algorithms which allows its users to search and access to a huge amount of web information in an easy way and also to have well-organized results. These engines become smart due to the integration of new methods like machine learning for results classification task and interpretation of requests.
IoT has a set of special features which present a great challenges for traditional search engines, in order to respond to these issues and continue the success of search engines with the large number of IoT devices joining the Web every day a new evolution of these tools appeared entitled IoT search engines [21]- [23]. It's a solution that allows us to obtain a new search tool able to find connected devices and information about them, and also solve a set of internet of things issues.
With the emergence of the internet of things, challenges relating to network security, devices management, devices status, access control and anomaly detection bring managers and administrators of the IoT infrastructure to think to the design and develop a new support and mechanism. The use of IoT search  [26] because they have the ability to identify devices and services connected to the Internet as well as vulnerable devices, also they allow learning and search information about IoT.

Network port scanning
Network analysis represent a technique which scan network ports as a vulnerability analysis, and usually used for security assessment and system maintenance. In addition, it's among necessary ways employed by attackers to assemble their data.
Network scanning consists of network port scanning as well as vulnerability scanning [27]. Network port scanning refers to the way of dispatch information packets via the network to a computing system's given service port numbers (for example, port twenty-three for Telnet, port eighty for protocol so on). This is often to spot the on the world network services on it explicit system. Network port scanning moreover as vulnerability scanning is associate degree information-gathering technique, however once applied by anonymous people, these are viewed as a prelude to associate degree attack. Network scanning processes, like port scans and ping sweeps, come details regarding that information science addresses map to active live hosts and therefore the kind of services they supply [28]. It can be done in an easy way by using the available scanning tools like nmap [29], Angry Ip Scanner [30], Advanced Port Scanner [31].

Multiprocessing
Multiprocessing or parallel processing is a type of processing which serve to run a set of tasks simultaneously on multiple processors in Figure 1. It aims to get more work done in a shorter period of time and reduce overall processing time than the serial processing. This type is typically used when very high speed is required to process a large volume of data. Multiprocessing serve to distribute a complex and larger tasks into multiple and smaller calculations, when each sub-process will have a dedicated CPU and memory slot. It refers to the ability of a system to support more than one processor at the same time and independently.

Figure 1. Multiprocessing
Multiprocessing can be used to improve existing version of different proposed solutions by speeding the processing time, like the work presented by Li et al. [32]. They develop an efficient guide RNA library designing tool entitled MultiGuideScan. It represents a multi-processing version of GuideScan software (developed to design CRISPR guide RNA libraries, which can be used for genome editing of coding and noncoding genomic regions effectively [32]). Experiments prove that the proposed solution speeds up the design of RNA guide library about 9-12 times by using 32 process than the original GuideScan.

RESEARCH METHOD
The main idea of this work is to propose a retrieval tool that provide to users all current available informations of each thing in request with minimum delay possible. The informations about devices in request are collected in real-time by using network port scanning technique especially we used python-nmap library. This data are collected from a set of scans, where each scan is responsible for retrieving a set of specific information which can take a significant time. In the aim to reduce data collection time we elaborate a parallel scans which serve to perform all scan in the same time and as fast as possible.

Software requirement specification
Requirement specification is the first step to define when developing a tool or application. For that we present in this sub-section the essentials requirements for the development of our solution; a) provide simple GUI and easy to use, b) provide accurate and understandable results, c) provide maximum of available information related to connected things, d) allow its users to perform real-time retrieval, e) users do not need any technical knowledge, f) free (no registration required), g) unlimited number of searches, h) full access to results.

Proposed algorithm
The flowchart of the proposed algorithm is shown in Figure 2 which include the following steps: a. The first step aim to send user query to the server: Query aim to specifiy target host Which can be represented by ip adress or hostname b. Launch a set of scans in parallel in order to find information relating to the device in request: Each scan is responsible for collecting specific data For scans which take up more time we divide them on sub scans as long as it is possible if not we launch similar scans in parallel c. Collect the results generated by the performed scans Case 1: collect the results from all scans and cobine them then move on to the next step Case 2: collect the results of each scan separately then move on to the next step d. Extract relevant informations to shown from collected data and send them to users Case 1: extract information from alll scan combined results Case 2: extract information of each scan results received separately e. Displays search results in an ergonomic and understanvble way on the system interface f.
Due to the dynamic change of information, all scans are relaunshed within a specified period of time and the content of the page is automatically refreshed as long as the user accesses to results interface.

RESULTS AND DISCUSSION
The proposed retrieval tool is developed and desgnied as a web application to allow an easy way to use this solution and does not require any prior installation, as well as guaranteeing the use of the latest version of this tool. The proposed web application was developed by using open-source micro framework for web development in Python (flask-python [33]) and other technologies and python librairies like pythonnmap [34]. This application is based on two main interfaces:  The first user interface in Figure 3 of this system aim to offer a simple and ergonomic interface that allows users to retrieve current data related to connected things and in an easy way.  The results interface is displayed in a short time after the user's request. This interface shows the current informations related to state, ports, protocols, os, device type, hostnames and addresses in Figure 4 and Figure 5. It offers a useful and clear visualization of all available data collected in real-time.

CONCLUSION
This work has resulted a design of a new tool for real-time retrieval of connected things. The main objective of this tool was to allow real-time and online retrieval of connected devices, using network port scanning which allow collecting data/informations related to these things in real time. The important informations are extracted from the collected data and presented easily to be understandable to all users. For our future works, we will attempt to improve results and evaluate performance of the proposed tool. To this end, we are going to perform a set of tests related to parallel retrieval and response time, use other resources for data collection and improve the security side.