Sample Entropy
Introduction
The findings of deterministic dynamics in seemingly random physical process have excited biological researchers who collect time series data. The best tools for this kind of analysis require extremely long and noise-free data sets that are not available from biological experiments. Nonetheless, one such tool has found widespread use. In 1991, Pincus adapted the notion of “entropy” for real-world use.1 In this context, entropy means order or regularity or complexity, and has roots in the works of Shannon, Kolmogorov, Sinai, Eckmann and Ruelle, and Grassberger and co-workers. The idea is that time series with repeating elements arise from more ordered systems, and would be reasonably characterized by a low value of entropy. Were the data sets infinite and perfect, it would be possible to determine a precise value of entropy. Biological data sets are neither, but Pincus had the important insight that even an imperfect estimate of entropy could be used to rank sets of time series in their hierarchy of order. He introduced approximate entropy (ApEn), and many papers have appeared drawing conclusions about the relative order of physiological processes.
In principle, the calculation is simple enough, and is shown schematically in Fig. 1. ApEn quantifies the negative natural logarithm of the conditional probability (CP) that a short epoch of data, or template, is repeated during the time series. Having selected a template of length m points, one identifies other templates that are arbitrarily similar and determines which of these remain arbitrarily similar for the next, or m + 1st point. “Arbitrarily similar” means that points are within a tolerance r of each other, where r is usually selected as a factor of the standard deviation (SD). The negative logarithm of the conditional probability is calculated for each possible template and the results averaged. If the data are ordered, then templates that are similar for m points are often similar for m + 1 points, CP approaches 1, and the negative logarithm and entropy approach 0.
The concepts are solid and the potential utility is great. We found, however, that there are practical issues of great importance in implementing the algorithm. These findings motivated us to develop sample entropy (SampEn) as an alternative method for entropy estimation in real world data. In this chapter, we first overview the problems of ApEn and how SampEn addresses them. We next present a formal implementation of SampEn and discuss the practical issues of optimization of parameters and data filtering. This is followed by a discussion of the difficulties with short data sets and nonstationary data. We end with comments on interpretation of entropy estimates and a direct comparison of ApEn and SampEn. The algorithms discussed are available at www.Physionet.org. For full details, we refer the reader to our original papers.2, 3
Section snippets
Motivation for SampEn Analysis
In our initial implementation of ApEn analysis of heart rate dynamics, we encountered practical questions.
1. What if some templates have no matches, and the CPs are not defined? Pincus follows the teaching of Eckmann and Ruelle and allows templates to match themselves. Thus if there are no other matches, the CP is 1 and ApEn is 0, a report of perfect order. If there are only a few template matches, then the result is biased toward 0, and the bias resolves with lengthening data sets and more
Sample Entropy Calculation
As a statistic, SampEn(m,r,N) depends on three parameters. The first, m, determines the length of vectors to be considered in the analysis. That is, given N data points {u(j): 1 ≤ j ≤ N}, form the N−m + 1 vectors xm (i) for {i ∣ 1 ≤ i ≤ N − m + 1} where xm(i) = {u (i + k): 0 ≤ k ≤ m − 1} is the vector of m data points from u(i) to u(i + m − 1). The distance between two vectors, denoted d[xm (i), xm (k)], is defined to be max {∣u(i + j) − u(k + j)∣: 0 ≤ j ≤ m − 1}, the maximum difference between
Optimizing Parameters
Having decided how to manage the data the next task must be to optimize the parameters for SampEn(m,r,N) by some rational strategy. For most current applications the parameters must be fixed to allow valid comparisons. Circumstances that may indicate varying parameters will be discussed separately. The parameter N is usually taken as the size of the data set. Since N determines the range of SampEn(m,r,N) by providing an upper limit for B, care must be used when comparing epochs of differing
Short Data Sets
The analysis of very short data sets may call for extra care. SampEn(m,r,N) is bounded by 0 and ln [(N − m)(N − m − 1)]⧸2. Thus short data sets will have a decreased range. They will also have smaller values of B and thus less precise statistics. This will be exacerbated by the fact that in a small data set a higher proportion of comparisons involves overlapping templates, a factor that will tend to inflate the variance of SampEn(m,r,N). We have also shown previously that for short sets of
Interpretation of SampEn
SampEn was originally intended as a measure of the order in a time series. We have noted, however, that low SampEn statistics, indicative of high CP estimates, cannot be assumed to imply a high degree of order. There are in general two distinct mechanisms for generating high CP estimates. The first is that genuine order has been detected. The second derives from the fact that r is usually taken as a proportion of the standard deviation of the series, thus rendering the analysis scale free. When
ApEn and SampEn
We now summarize the differences between sample entropy and approximate entropy and discuss possible bridges between the two approaches. Let Bi denote the number of template matches with xm(i) and Ai denote the number of template matches with xm+1(i). The number pi = Ai⧸Bi is an estimate of the conditional probability that the point xj+m is within r of xi+m−1 given that xm(j) matches xm(i). ApEn is calculated byand is the negative average natural logarithm
References (9)
- et al.
Physica. D
(2000) Proc. Natl. Acad. Sci. USA
(1991)- et al.
Am. J. Physiol.
(2002) - et al.
Am. J. Physiol.
(2000)
Cited by (299)
A vibration signal processing method based on SE-PSO-VMD for ultrasonic machining
2024, Systems and Soft ComputingOnline prediction of mechanical and electrical quality in ultrasonic metal welding using time series generation and deep learning
2024, Engineering Failure AnalysisAnalyzing entropy features in time-series data for pattern recognition in neurological conditions
2024, Artificial Intelligence in MedicineHardware fingerprint construction of optical transmitters based on permutation entropy spectrum of chaos for secure authentication
2024, Optics and Laser TechnologyAutomatic detection and characterization of uterine contraction using Electrohysterography
2024, Biomedical Signal Processing and Control