An optimized algorithm for peak detection in noisy periodic and quasi-periodic signals

ABSTRACT


INTRODUCTION
The peaks and troughs detection is a very important task in many real-world applications in different fields such as bioinformatics [1], mass spectrometry [2], image processing [3] and signal processing [4], [5].Because trough detection is equivalent to peak detection when we implement the peak detection for the additive inverse of the data, the algorithms are usually designed for peak detection.In signal processing, peak detection is the key step.However, the challenge of the peak detection is usually caused by noise where the light peaks are detected and the strong peaks are ignored.
Nowadays, with the explosion of smart devices and automatic intelligent control systems along with the ever-increasing volume of data, signal analysis is more difficult because of the increase in noise as well as computational requirements [6], [7].Many solutions have been proposed for peak detection in signal processing in the literature ranging from the adaptive threshold method (ATM) [8], wavelet transform techniques [9], [10], hidden Markov models [11], k-means clustering [12], and entropy-based techniques [13] to the automatic chromatographic peak detection (ACPD) [14], the peak of Shannon energy envelope (PSEE) [15], mountaineer's method [16], automatic multi-scale peak detection (AMPD) [17], [18] and ATPD [19].But most of them are designed for a specific type of signals.Only a few algorithms have been developed for general signals.
Some algorithms are designed only for specific signals.For instance, ACPD is designed for chromatographic signals [20]- [22], and the PSEE [23]- [25] are designed for electrocardiogram (ECG) signals, while the ATM and others [16], [26]- [28] are designed for photoplethysmorgraphy (PPG) signals.Some other algorithms (but fewer) are designed for peak detection in general such as AMPD and ATPD [13],  [29], [30].In which, the AMPD has more advantages because it operates without any parameters and quite robust regardless high and low frequency noise and detects peaks accurately in quasi-periodic signals.
Besides the accuracy, the computational complexity of peak detection algorithms is also main concern for researchers, especially in big data analysis for real-world problems today.Therefore, our aim is to modify the original AMPD, an algorithm that is suitable for many type of signals without any parameters, to improve its runtime performance and the memory storage requirements.The modified version is then compared to the original AMPD in both peak detection accuracy and runtime performance.Finally, some suggestions for improving peak detection algorithms are recommended.

METHOD
In this section, we take an overview of the original AMPD and point out some necessary optimizations for it.Finally, some improvements are proposed.

Overview of the original AMPD
Given  = ( 1 ,  2 , . . .,   ) is a sequence of observations from a univariate signal which the periodic or quasi-periodic peaks are involved.At the first stage, the AMPD calculates the local maxima scalogram (LMS)  ×  matrix where  = [/2] − 1 (notice that [] is the smallest integer that is not less than ) and  is the length of .For each  in {1,2, . . ., }, the  -th rows of LMS allocates all the local maxima positions of  at scale  (the level of zoom for peak finding) at time  by assigning its element, [, ], equal to zero if   >  − and   >  + for each  =  + 1 to  − .Otherwise, [, ] =  +  (where  is a random number from [0, 1]-uniformly distribution and  ∈ ℝ is a constant).In [17],  is fixed by 1.The th row of the MLS is calculated as in Figure 1 for  =  + 1 to  − .

Figure 1. LMS calculation of the AMPD
The first row of LMS has the highest resolution of the potential peak points and it reduces to the "low resolution" for each a half of the length of the signal.The second step is cutting the LMS by the  -th row to get LMSr matrix containing the  first rows of LMS, where  is the scale with greatest number of potential peaks (i.e. the greatest number of zeros).At the final stage of AMPD, the column-wise standard deviation of each column of LMSr is calculated.The position of the column where the standard deviation is equal to zero identify the maxima locations of.The Figure 2 (Figure 1 in [17]) shows the steps of the AMPD algorithm and Figure 3 (Figure 2 in [17]) visualizes the AMPD process by an example.

Some necessary optimizations for AMPD
There are two main observations are necessary optimized for the AMPD.Firstly, the most computational expensive in AMPD comes from calculating the LMS matrix.bound of memory requirements.Moreover, the LMS using the scales up to  = [/2] − 1 is unnecessary.In the example given in Figures 3(a) to (f) (see [17]) for more detail), we can see that the number of necessary scales in the LMS is at the dramatically low rate (i.e.only first 55 significant scales against 1500 in total).Secondly, the calculation of column-wise standard deviations to find the columns with zero value in the final stage is equivalent to finding the intersection of  first rows of the LMS matrix.Therefore, the calculation of uniform random number for fulling the void of the LMS is very computationally expensive.

Some improvements of the T-AMPD
In this section, we propose some solutions to overcome the obstacles of the original AMPD mentioned in section 2.2, the modified algorithm called T-AMPD.For  valued reducing, we can consider setting it with a fixed number.For example, if the length of  is greater than 5000 then set  = [5000/2] − 1, otherwise,  = [/2] − 1 as used in the R package ampd in [31].In other respects, if we call  1 the search space at the scale  then  1 = {  , . . .,  − } (see Figure 1).Suppose that  0 ( 0 ≥ 1) is the number of desirable peaks need to be detected in the signal and  is the last scale that [, ] containing all of these peaks then they are all of  1 .Therefore, the highest frequency of quasi-periodic signal is equal to: for  0 ≥ 2 in general.
Because  is the last scale that [, ] containing all of the true peaks, there is not any peak in  2 ∖  1 and  3 ∖  1 .It means that  ≥ .Now from (1) we have: In general cases, the computational resource in this proposal with  0 = 2 is dramatically cut down in comparison to the original algorithm.The suitable  scales with different  0 are illustrated as in Figure 4 where an example of MLS calculation was given by [17].When  0 = 1 then  = [/2] − 1 as in the original AMPD.But this is unusual case in most of real-world applications.Notice that if  0 is greater than the number of true peaks in the signal, high rate of false positive errors can happen.From the number of peaks is unknown and depend on the highest frequency of the signal, we can choose  0 by small number (i.e. 0 = 2,3, ...) or  0 = [/  ] if the information about the highest frequency of the signal,   , is known.This case is available in many facts (i.e., ECG, PPG signals).
Figure 4.The suitable  with different  0 (an example of [17]) For optimization of the LMS calculation, for each  in 1 to, the original AMPD (i.e. in [17] and another modified AMPD in [18]) finds the local maxima in the search space  1 by taking each  in  + 1 to  −  and checking whether   >  − and   >  −+1 .If it is true, the value [, ] = 0 and [, ] =  +  otherwise.This way causes much waste of computer resources while it is equivalent to comparing the elements of  1 with  2 and  3 correspondingly (see the Figure 1 for more details).From  1 ,  2 and  3 have the same length, instead of using two for-loops for finding the local maxima, we can compare the elements of  1 with  2 and  1 with  3 simultaneously using the built-in operator for comparing two vectors which is available in many programming language such as R or MATLAB (i.e. and  1 >  3 directly).This reduces the runtime complexity from ( 2 ) of the original AMPD to () of the T-AMPD.The positions of  which matched the condition are then stored in a vector, called . [], with auto length (i.e.number of these positions).This length is surely less than .Hence, we have a list of  vectors containing the potential peaks for each scale instead of a  ×  matrix.
Finally, let  be the first index of the vector among the vectors in the list where .[] has the greatest length in the vector list, then  is the scale with the most potential peaks.The intersection of all these first  vectors give the vector, called , including the indies of the true detected peaks.

RESULTS AND DISCUSSION
To demonstrate the efficiency of T-AMPD in comparison to the original AMPD, we run both algorithms on several signals and compare them in accuracy (i.e.false positive detection errors (FPDEs) and false negative detection errors (FNDEs)) and runtime performance.The algorithms are analysed in R on a 2.60GHz Intel Core i7-4510U CPU with 8GB RAM running on Window 10 v.20H2.For the up-to-date of the algorithms, the original AMPD is implemented by the ampd function in the R package called ampd [31] which has been optimized by its authors.Furthermore, the splitting size for peak detection of AMPD in ampd package is restricted by 5000 for signals with the length greater than 5000 and adapted for at least 10 local maxima are included (see the reference manual of the package in [31] for more details).Hence, T-AMPD will be implemented with  0 = 1 which is equivalent to the original AMPD,  0 = 10 for the common .The runtime for each case is 95% confident interval with 100 times of run-simulation.The first time series for comparison is the total monthly number of sunspots [17].The time series includes 3177 values (< 5000) ranging from 1/1947 to 9/2013 is enclosed in the R package called datasets.The results of comparison are given in Table 1.
Table 1.Comparion of T-AMPD and AMPD for sunspot data We can see that with only  0 = 1, the runtime of T-AMPD is much reduced as about 75% the runtime of AMPD and decreases as  0 increases.With  0 = 10, the runtime ratio of T-AMPD and AMPD is at a very impressive rate, The T-AMPD is 4 times faster than the AMPD.The first false positive detection error is caught when  0 is up to 66 where the runtime is very small with 25 times faster.Figure 5 shows all 23 true peaks of the signal and the value of  0 when the first FPDE is found.

Figure 5. T-AMPD for monthly sunspot data
The second time series for comparison is the length of day (LOD) data.In this data, the duration of a day according to the SI units definition (1 day=86,400s) is compared with the astronomically determined duration of a day on earth.We can obtain the data from the International Earth Rotation and Reference Service (IERS) at https://hpiers.obspm.fr/iers/eop/eopc04/with 1461 days (< 5000) from 2008 to 2011.The analysis shows that the results of T-AMPD with  0 from 1 to 207 have identical detection rates to AMPD where all of them have one false negative detection error.All the T-AMDPs with  0 from 208 to 729 have no false detection occurred (see Figure 6).The false positive detection error only occurs when  0 from 730 above.The runtime performance of T-AMDP against AMDP is given in Table 2.The dramatically improve ment in runtime of T-AMPD is the same to the sunspot analysis case.We can see that with  0 = 10 (for general cases), the T-AMPD is more than 5 times faster than the original AMPD.From these peaks we are easily to estimate the average period of the longest days is 13.6 days which is adapted with the value reported in previous literature [32].Finally, we analysis with a low-noise electrocardiography (ECG) time series with 200000 values (> 5000) downloaded from https://plux.info/sensors/277-electrocardiogram-ecg.html.Unfortunately, the original AMPD cannot run on our device in default setting because it cannot allocate the vector of too large size needed for the matrix LMS.Our T-AMPD still run well regardless the size of the scanning window though the runtime is a bit long.For fairly comparison, we set the splitting=TRUE for AMPD in order to limit the scanning window by 5000 which ensures the algorithm run normally, and our T-AMPD is also limited the scanning window in the same size.The results are given in Table 3.For the ECG signal with a large number of values as in this case, we can see in Table 3 that T-AMPD has approximately a duration of 144.036s (∼2.4 mins) which is approximately a half of AMPD runtime with  0 = 1, and it is only approximately 27.12s with  0 = 10 which is more than 8 times faster than AMPD. Figure 7 shows the detected true peaks of the ECG signal run by T-AMPD at  0 = 10.We can see all 229 true peaks are detected from 200000 observations without any errors, exactly the same as the original AMPD.

CONCLUSION
In this paper, we have proposed some important adjustments for the AMPD algorithm in improving its runtime performance while maintaining the accuracy by introducing the desirable number of detected peaks parameter, called  0 .The experimental results show that even with  0 = 1, our algorithm (T-AMPD) greatly improves the computation time with at least 2 times faster in comparison to the AMPD's runtime, and it gets even much better with larger but reasonable values of  0 .There is a way that the T-AMPD can be improved, that is we can fix "the best"  0 for the algorithm.This "the best"  0 as commented in section 2.3 that it depends on the highest frequency   of the signal.In most practical areas, this   can be approximated by technical devices.

Figure 3 .
Figure 3.An example of applying the AMPD to a simulated signal, (a) LMS, (b) rescaled LMS, (c) number of maxima for each scale, (d) standard deviation for each column of rescaled LMS, (e) true peaks, and (f) zoomed true peaks
Bulletin of Electr Eng & Inf ISSN: 2302-9285  An optimized algorithm for peak detection in noisy periodic and quasi-periodic signals (Luc Tri Tuyen) 2029 requirement of the AMPD and to the first one which meets at least one false positive detection error (the T-AMPD with different  0 is denoted by T-AMPD(n))

Figure 7 .
Figure 7. T-AMPD for ECG data with all 229 true peaks are detected at  0 = 10

Table 2 .
Comparion of T-AMPD and AMPD for LOD data

Table 3 .
Comparion of T-AMPD and AMPD for ECG data