Elsevier

Methods in Enzymology

Volume 384, 2004, Pages 172-184
Methods in Enzymology

Sample Entropy

https://doi.org/10.1016/S0076-6879(04)84011-4Get rights and content

Publisher Summary

This chapter presents Sample Entropy (SampEn) alogorithm as an alternative method for entropy estimation in real world data. The chapter discusses the problems of approximate entropy (ApEn) algorithm and how SampEn addresses them. ApEn is optimally suited to measure the Gaussianity of a distribution and a process. Berg's theorem established that the maximum entropy for a random process with finite variance is attained by a Gaussian process. Thus ApEn values departing from the theoretical maximum indicate a lack of Gaussianity. SampEn can also be effectively used as a measure of Gaussianity though its maximum occurs for non-Gaussian random processes. When ApEn is used, modifications of ApEn handles zero and small number of matches can help minimize its bias and begin to approach the statistical stability of sample entropy. A formal implementation of SampEn is presented in this chapter. The practical issues of optimization of parameters and data filtering are also explained, which is followed by a discussion of the difficulties with short data sets and nonstationary data. Interpretation of entropy estimates is discussed and a direct comparison of ApEn and SampEn is also presented.

Introduction

The findings of deterministic dynamics in seemingly random physical process have excited biological researchers who collect time series data. The best tools for this kind of analysis require extremely long and noise-free data sets that are not available from biological experiments. Nonetheless, one such tool has found widespread use. In 1991, Pincus adapted the notion of “entropy” for real-world use.1 In this context, entropy means order or regularity or complexity, and has roots in the works of Shannon, Kolmogorov, Sinai, Eckmann and Ruelle, and Grassberger and co-workers. The idea is that time series with repeating elements arise from more ordered systems, and would be reasonably characterized by a low value of entropy. Were the data sets infinite and perfect, it would be possible to determine a precise value of entropy. Biological data sets are neither, but Pincus had the important insight that even an imperfect estimate of entropy could be used to rank sets of time series in their hierarchy of order. He introduced approximate entropy (ApEn), and many papers have appeared drawing conclusions about the relative order of physiological processes.

In principle, the calculation is simple enough, and is shown schematically in Fig. 1. ApEn quantifies the negative natural logarithm of the conditional probability (CP) that a short epoch of data, or template, is repeated during the time series. Having selected a template of length m points, one identifies other templates that are arbitrarily similar and determines which of these remain arbitrarily similar for the next, or m + 1st point. “Arbitrarily similar” means that points are within a tolerance r of each other, where r is usually selected as a factor of the standard deviation (SD). The negative logarithm of the conditional probability is calculated for each possible template and the results averaged. If the data are ordered, then templates that are similar for m points are often similar for m + 1 points, CP approaches 1, and the negative logarithm and entropy approach 0.

The concepts are solid and the potential utility is great. We found, however, that there are practical issues of great importance in implementing the algorithm. These findings motivated us to develop sample entropy (SampEn) as an alternative method for entropy estimation in real world data. In this chapter, we first overview the problems of ApEn and how SampEn addresses them. We next present a formal implementation of SampEn and discuss the practical issues of optimization of parameters and data filtering. This is followed by a discussion of the difficulties with short data sets and nonstationary data. We end with comments on interpretation of entropy estimates and a direct comparison of ApEn and SampEn. The algorithms discussed are available at www.Physionet.org. For full details, we refer the reader to our original papers.2, 3

Section snippets

Motivation for SampEn Analysis

In our initial implementation of ApEn analysis of heart rate dynamics, we encountered practical questions.

1. What if some templates have no matches, and the CPs are not defined? Pincus follows the teaching of Eckmann and Ruelle and allows templates to match themselves. Thus if there are no other matches, the CP is 1 and ApEn is 0, a report of perfect order. If there are only a few template matches, then the result is biased toward 0, and the bias resolves with lengthening data sets and more

Sample Entropy Calculation

As a statistic, SampEn(m,r,N) depends on three parameters. The first, m, determines the length of vectors to be considered in the analysis. That is, given N data points {u(j): 1 ≤ jN}, form the Nm + 1 vectors xm (i) for {i ∣ 1 ≤ iNm + 1} where xm(i) = {u (i + k): 0 ≤ km − 1} is the vector of m data points from u(i) to u(i + m − 1). The distance between two vectors, denoted d[xm (i), xm (k)], is defined to be max {∣u(i + j) − u(k + j)∣: 0 ≤ jm − 1}, the maximum difference between

Optimizing Parameters

Having decided how to manage the data the next task must be to optimize the parameters for SampEn(m,r,N) by some rational strategy. For most current applications the parameters must be fixed to allow valid comparisons. Circumstances that may indicate varying parameters will be discussed separately. The parameter N is usually taken as the size of the data set. Since N determines the range of SampEn(m,r,N) by providing an upper limit for B, care must be used when comparing epochs of differing

Short Data Sets

The analysis of very short data sets may call for extra care. SampEn(m,r,N) is bounded by 0 and ln [(Nm)(Nm − 1)]⧸2. Thus short data sets will have a decreased range. They will also have smaller values of B and thus less precise statistics. This will be exacerbated by the fact that in a small data set a higher proportion of comparisons involves overlapping templates, a factor that will tend to inflate the variance of SampEn(m,r,N). We have also shown previously that for short sets of

Interpretation of SampEn

SampEn was originally intended as a measure of the order in a time series. We have noted, however, that low SampEn statistics, indicative of high CP estimates, cannot be assumed to imply a high degree of order. There are in general two distinct mechanisms for generating high CP estimates. The first is that genuine order has been detected. The second derives from the fact that r is usually taken as a proportion of the standard deviation of the series, thus rendering the analysis scale free. When

ApEn and SampEn

We now summarize the differences between sample entropy and approximate entropy and discuss possible bridges between the two approaches. Let Bi denote the number of template matches with xm(i) and Ai denote the number of template matches with xm+1(i). The number pi = AiBi is an estimate of the conditional probability that the point xj+m is within r of xi+m−1 given that xm(j) matches xm(i). ApEn is calculated byApEn(m,r,N) = - 1N- mi = 1N- mlogAiBiand is the negative average natural logarithm

References (9)

  • T. Schreiber et al.

    Physica. D

    (2000)
  • S.M. Pincus

    Proc. Natl. Acad. Sci. USA

    (1991)
  • D.E. Lake et al.

    Am. J. Physiol.

    (2002)
  • J.S. Richman et al.

    Am. J. Physiol.

    (2000)
There are more references available in the full text version of this article.

Cited by (299)

View all citing articles on Scopus
View full text