Efficient incremental data backup of unison synchronize approach

ABSTRACT


INTRODUCTION
Nowadays, the internet has grown into a production communication system that connects people around the world [1]. The growth of the global computer network has an economic impact as well [2]. The problem of computer system is the risk of being attacked by intruders via computer network or computer viruses which damages data become and makes it unusable. In addition, computer data backup and data security are very important tools for all business sectors [3]. Computer data backup is a complex subject in many technologies, and each one has different features from the other [4]. Commercial data backup products and services that use the technologies in unconventional ways have been created. There are three importnant aspects that make data secure over a computer network; confidentiality, integrity and availability [5].
Confidentiality is protection against unauthorized data access via snooping or wiretapping from an intruder [6]. Integrity is protection from altering the data so that it arrived at the recipient exactly as it was sent. Availability is the prevention of service interruption that keeps data accessible for legitimate uses. However, the major problem in full backup is that data takes a long time to complete backup due to the need for large storage resources. It is very hard to restore an archived data which needs collection of the information. As a result, incremental backup plays an important role to data backup in computer network system [7]. To perform incremental backup, various techniques have been widely applied, such as synchronous and asynchronous  [8]. This is because duplicate data storage offers the most assorted set of data in terms of backup execution, implementation, and dynamics [9]. Thus, this paper proposes a new data backup technique mainly focusing on Incremental backup by using Unison synchronize technique. The proposed technique combines the Unison used for a file-synchronization tool and load balancing file synchronization management (LFSM) for traffic management and content analysis. This article presents methods for improving the efficiency of the Rajamangala University of Technology Isan backup environment and adapting to changing systems. The rest of this paper is organized as; section 2 discusses background and related works of data backup system. Section 3 explains the synchronize technique. Section 4 describes the methodology, Unison and LFSM. Experiment results are shown in section 5. Finally, conclusion is in section 6.

BACKGROUND AND RELATED WORKS
Many techniques of data backup system have been proposed and categorized as full-backup, differential backup and incremental backup techniques.

Full-backup
A full backup is the starting point and required for all other backup methods, because it contains a complete copy of all the folders and files in the storage space. It considered to the best storage management in terms of a single file in faster and simpler restoration operations [10]. However, making full backups all the time imposes considerable workload on the computer network. Full backup is a process limited to a weekly or monthly schedule due to the required large volume of data storage to be copied. The data storage space needs to have a largestorage capacity in the backup repository, as shown in Figure 1 [11].

Differential backup
Differential backup is a data backup procedure that makes a full backup initially save the data changes made since the last full backup. It serves a fast recovery time because it requires only a full backup and the last differential backup to restore the entire data repository. Thus, differential backup is faster than full backup given that the backup operation only requires the latest differential backup. However, its restore operation is slower than full backup because it requires two pieces of backup ;between the full backup and the latest differential backup, that needs to be restored, as shown in Figure 2 [12].

Incremental backup
Incremental backup makes one full backup first of the data source stored in a single backup file that copy only the portions that have changed since the last backup operation. It serves to reduce the amount of time and requires less storage space since it is only backing up changed files. The recovery process requires that the most recent full backup has been completed as well as all incremental backups up to the restore point. The problem of incremental backup is if one increment data in the chain is missing or corrupted, it will be impossible to perform full recovery, as shown in Figure 3 [13]. There are many problems with making a trade-off with the cost of performing normal backup operations and the cost of performing recovery after faulty occurrences [14]. Overall, a full backup requires a lot of free storage media and the recovery process takes a long time. Nakamura et al., proposed a stochastic model base on incremental backup which describe the behaviour of a database system. The proposed method can improve appropriate rough to actual database system from the point of view of the cost of backup operations [15]. Incremental backups of files are easier to restore for entering the file recovery process. Incremental database recovery involves two steps to restore a full backup version of the backup that then read the latest version of additional files [16]. This paper aimed to propose a synchronization technique of backup model with incremental backup. The experiment operated on the scheme that the cost of backup operation could be lower than that of the other backup methods in a normal condition.

SYNCHRONIZE FRAMEWORK
Data synchronization is a process widely used by specialized software back up data as well as make sure multiple venues contain the same data [17]. There are two types of data transmission techniques; synchronous and asynchronous data transmisson [18].

Synchronous
Synchronous data transmission is a data trans method between sender and receiver where it takes some time before the exchange is made. usually in synchronous transmission, a communication between the sender and receiver must be established and an agreement on which party is going to be in control is established [19]. Once the session is established, the two parties also ensure there is give and take conversation occurs in actualtime. the same timing for internal clock pulses of transmitter and receiver share a common clock pulse as well as having synchronization in communication. After the connection is correctly synchronized, immediate response on data transmission may begin. The receiver counts how many bits are sent over a periodof time then reassemble them into bytes. Thus, synchronous transmission modes work well when large amounts of data must be transferred very quickly from one site to the another. Data synchronization is the entering process for synchronizing data between two or more devices and updating changes automatically between them to maintain consistency within uniformity of data systems [20].

Asynchronous
Asynchronous transmission is a type of data transmission that follows a non-synchronized and does not allow continuous data flow in communication. In addition, sender and receiver do not define the parameters of the data exchange. However, the sender inserts an extra bit of data before and after each split that indicates when each split starts and ends the transfer. Thus start and stop bits are required to do intimate the receiver packet of data about the beginning and end of the data stream [21]. Asynchronous transfers work well as timing is not an important factor as transmitter and receiver operate at different clock frequencies.
Asynchronous transmission is used widely for communications over a physical medium and transfers work well when using reliable transfer media [22].

METHODOLOGY
The proposed data synchronization system was developed by using a send of commands from the shell script with Unison program [23]. Load balancing file synchronization management (LFSM) for distributing network traffic across multiple computing resources is introduced as shown in Figure 4. The incremental backup system designed runs on a Linux operating system environment is Debian server 8.10 software. The storage system type of the backup is Intel Xeon CPU, 2.00 GHz, 32 GB of RAM, SAS 360 GB of Hard Disk. The implemented incremental backup and recovery functions for file data is Ext4 file system type. Synchronization of data files to each server were configured with the following authentication algorithms:

sync (O, A, B) = If A = B then (A,B); if matching : finish else if A = O then (B,B); if A = O : check B else if B = O then (A,A); if B = O : check A else if A = MISSING then (A,B); delete/edit else if B = MISSING then (A,B); delete/edit else if ATOMIC in (dom(A)) U dom(B)) and dom(A) <> dom(O) and dom(B) <> dom(O) and dom(A) <> dom(B) Then (A,B) else for each child k, let (Ak,Bk) = sync(O(k), A(k), B(k)) in Let A' = { k -> AK } in Let B' = { k -> BK } in (A', B')
Unison has the command structure as a follows:

unison [Source Directory] ssh: // [Server IP Address] /[Destination Directory]
The process of calculating the size of a new file [24], Checksum, sends data that new block has added from new file to old file as shown in Figure 5. The synchronization system will carry out inspections with the Rolling Checksum process to check the accuracy of the information within the file [25]. The comparison of data for data checking are done by using (1).
Examination is a comparison stage of the size of the source data and the destination to compare different sizse of data that check the data changes in terms of Re-calculation of the file size and sending the resulting difference to the destination mod M to speed up the comparison M=2 16 as (2): ( , ) ∑ ( − + 1) )mod = (2) ( , ) = ( , ) + 2 16 ( , ) ( Suppose where s (k,I) is the result of the validation of all data Xk Xi is data backup [26]. LFSM system status checking uses shell scripting methods to check the name status of operations as check_service.sh of name service in the proposed method. The process operates on a condition to check the LFSM system every 10 seconds. Then shell script will send the record to a status.txt file name that will decide whether the normal is 0 and abnormal is 1 of connection status as a follows.

EXPERIMENT RESULTS
The performance of the proposed system was tested by determining the efficiency of synchronization duration. The new data backup technique of LFSM instruction was set for file quality checking and control of the workload distribution system as shown by the message log in Figure 6. The validation step was to find the changes to the files from the source machine that automatically use a command set using shell script language. The data validation method synchronized the file properties to every server in the workload distribution system by sending commands through the Unison program base on secure shell (SSH) protocol. The test compared two files format between the size of a single file and multiple data files with sizes of 1MB, 5MB, 10MB, 100MB, 200MB, 500MB, 700MB, and 1GB, respectively. The specific test data of proposed method are shown in Figure 7. The results of the single file data transfer rate of the test demonstrated the effectiveness of the most effective size was 600 MB data file. The analysis found that due to the timing of the synchronization at 60 seconds that the data transmission or data synchronization to the server was not complete according to the correct file size. The data transmission took longer than the specified time resulting in the program cancelling the original synchronization and having to send the same data file again. It explains a method that improve in computing terminology of throughput [27], [28] of the workload also packets per second (pps) that can be transferred from one place to another over a specified period of time. The multiple-file synchronization test was already 1MB of original data inside that number of files for each test. The specific test data of multiple-file of proposed method are shown in Figure 9. The result of the trial of sending multiple files showed that the number of files did not affect data transmission. The amount of data is an important factor that significantly affects data transmission too much which can affect the system and cause errors in the process. Comparison backup time consistency between the proposed method and other techniques are shown in Figure 10. Comparison of backup methods between the proposed methods and other methods. The test on database contains the initial 112 MB of files. The proposed method could improve the performance of data storage and data files backup as shown in Table 1.

CONCLUSION
The novel proposed method of the backup system in this paper was data synchronization using a set of shell script commands in LFSM with Unison. The novel of this work is the process of checking the correctness of data file changes before synchronization to help solve errors in backup systems. File size in the experiment is a synchronization test that 1 MB to 1 GB only increase the data size of a file. As the results showed, the proposed method could improve performance of data storage and data backup system by using an average 0.238 sec to sync 1 MB file size.