Sample Entropy

doi:10.1016/S0076-6879(04)84011-4

Methods in Enzymology

Volume 384, 2004, Pages 172-184

https://doi.org/10.1016/S0076-6879(04)84011-4 Get rights and content

Publisher Summary

This chapter presents Sample Entropy (SampEn) alogorithm as an alternative method for entropy estimation in real world data. The chapter discusses the problems of approximate entropy (ApEn) algorithm and how SampEn addresses them. ApEn is optimally suited to measure the Gaussianity of a distribution and a process. Berg's theorem established that the maximum entropy for a random process with finite variance is attained by a Gaussian process. Thus ApEn values departing from the theoretical maximum indicate a lack of Gaussianity. SampEn can also be effectively used as a measure of Gaussianity though its maximum occurs for non-Gaussian random processes. When ApEn is used, modifications of ApEn handles zero and small number of matches can help minimize its bias and begin to approach the statistical stability of sample entropy. A formal implementation of SampEn is presented in this chapter. The practical issues of optimization of parameters and data filtering are also explained, which is followed by a discussion of the difficulties with short data sets and nonstationary data. Interpretation of entropy estimates is discussed and a direct comparison of ApEn and SampEn is also presented.

Introduction

The findings of deterministic dynamics in seemingly random physical process have excited biological researchers who collect time series data. The best tools for this kind of analysis require extremely long and noise-free data sets that are not available from biological experiments. Nonetheless, one such tool has found widespread use. In 1991, Pincus adapted the notion of “entropy” for real-world use.¹ In this context, entropy means order or regularity or complexity, and has roots in the works of Shannon, Kolmogorov, Sinai, Eckmann and Ruelle, and Grassberger and co-workers. The idea is that time series with repeating elements arise from more ordered systems, and would be reasonably characterized by a low value of entropy. Were the data sets infinite and perfect, it would be possible to determine a precise value of entropy. Biological data sets are neither, but Pincus had the important insight that even an imperfect estimate of entropy could be used to rank sets of time series in their hierarchy of order. He introduced approximate entropy (ApEn), and many papers have appeared drawing conclusions about the relative order of physiological processes.

In principle, the calculation is simple enough, and is shown schematically in Fig. 1. ApEn quantifies the negative natural logarithm of the conditional probability (CP) that a short epoch of data, or template, is repeated during the time series. Having selected a template of length m points, one identifies other templates that are arbitrarily similar and determines which of these remain arbitrarily similar for the next, or m + 1st point. “Arbitrarily similar” means that points are within a tolerance r of each other, where r is usually selected as a factor of the standard deviation (SD). The negative logarithm of the conditional probability is calculated for each possible template and the results averaged. If the data are ordered, then templates that are similar for m points are often similar for m + 1 points, CP approaches 1, and the negative logarithm and entropy approach 0.

The concepts are solid and the potential utility is great. We found, however, that there are practical issues of great importance in implementing the algorithm. These findings motivated us to develop sample entropy (SampEn) as an alternative method for entropy estimation in real world data. In this chapter, we first overview the problems of ApEn and how SampEn addresses them. We next present a formal implementation of SampEn and discuss the practical issues of optimization of parameters and data filtering. This is followed by a discussion of the difficulties with short data sets and nonstationary data. We end with comments on interpretation of entropy estimates and a direct comparison of ApEn and SampEn. The algorithms discussed are available at www.Physionet.org. For full details, we refer the reader to our original papers.2, 3

Section snippets

Motivation for SampEn Analysis

In our initial implementation of ApEn analysis of heart rate dynamics, we encountered practical questions.

1. What if some templates have no matches, and the CPs are not defined? Pincus follows the teaching of Eckmann and Ruelle and allows templates to match themselves. Thus if there are no other matches, the CP is 1 and ApEn is 0, a report of perfect order. If there are only a few template matches, then the result is biased toward 0, and the bias resolves with lengthening data sets and more

Sample Entropy Calculation

As a statistic, SampEn(m,r,N) depends on three parameters. The first, m, determines the length of vectors to be considered in the analysis. That is, given N data points {u(j): 1 ≤ j ≤ N}, form the N−m + 1 vectors x_m (i) for {i ∣ 1 ≤ i ≤ N − m + 1} where x_m(i) = {u (i + k): 0 ≤ k ≤ m − 1} is the vector of m data points from u(i) to u(i + m − 1). The distance between two vectors, denoted d[x_m (i), x_m (k)], is defined to be max {∣u(i + j) − u(k + j)∣: 0 ≤ j ≤ m − 1}, the maximum difference between

Optimizing Parameters

Having decided how to manage the data the next task must be to optimize the parameters for SampEn(m,r,N) by some rational strategy. For most current applications the parameters must be fixed to allow valid comparisons. Circumstances that may indicate varying parameters will be discussed separately. The parameter N is usually taken as the size of the data set. Since N determines the range of SampEn(m,r,N) by providing an upper limit for B, care must be used when comparing epochs of differing

Short Data Sets

The analysis of very short data sets may call for extra care. SampEn(m,r,N) is bounded by 0 and ln [(N − m)(N − m − 1)]⧸2. Thus short data sets will have a decreased range. They will also have smaller values of B and thus less precise statistics. This will be exacerbated by the fact that in a small data set a higher proportion of comparisons involves overlapping templates, a factor that will tend to inflate the variance of SampEn(m,r,N). We have also shown previously that for short sets of

Interpretation of SampEn

SampEn was originally intended as a measure of the order in a time series. We have noted, however, that low SampEn statistics, indicative of high CP estimates, cannot be assumed to imply a high degree of order. There are in general two distinct mechanisms for generating high CP estimates. The first is that genuine order has been detected. The second derives from the fact that r is usually taken as a proportion of the standard deviation of the series, thus rendering the analysis scale free. When

ApEn and SampEn

We now summarize the differences between sample entropy and approximate entropy and discuss possible bridges between the two approaches. Let B_i denote the number of template matches with x_m(i) and A_i denote the number of template matches with x_m+1(i). The number p_i = A_i⧸B_i is an estimate of the conditional probability that the point x_j+m is within r of x_i+m−1 given that x_m(j) matches x_m(i). ApEn is calculated by $ApEn(m,r,N) = - 1 N- m ∑ i = 1 N- m log A_{i} B_{i}$ and is the negative average natural logarithm

References (9)

T. Schreiber et al.
Physica. D
(2000)
S.M. Pincus
Proc. Natl. Acad. Sci. USA
(1991)
D.E. Lake et al.
Am. J. Physiol.
(2002)
J.S. Richman et al.
Am. J. Physiol.
(2000)

There are more references available in the full text version of this article.

Cited by (299)

A vibration signal processing method based on SE-PSO-VMD for ultrasonic machining
2024, Systems and Soft Computing
Ultrasonic Machining (USM) has broad applicability in numerous industrial fields. Accurate capture of ultrasonic vibration signals is pivotal to the functioning of USM, making them areas of significant research interest. However, the nonlinear and non-stationary nature of the USM vibration signal makes it unsuitable for analysis with conventional methods such as Fast Fourier Transform. Despite current methodologies like Discrete Wavelet Transformation (DWT) yielding valuable insights, they involve manual parameter selection and could lead to sub-optimal results. This paper presents a novel method, using Variational Mode Decomposition (VMD) to automatically decompose USM vibration signals into intrinsic mode functions (IMFs). This method is complemented by Particle Swarm Optimization (PSO) algorithm to optimize the number of IMFs and penalty factor, with Sample Entropy (SE) serving as a fitness function. The innovative SE-PSO-VMD method is validated in ultrasonic metal welding and demonstrates a notable ability in predicting the pull force of welding materials with a high coefficient of determination R2 value of 0.78.
Online prediction of mechanical and electrical quality in ultrasonic metal welding using time series generation and deep learning
2024, Engineering Failure Analysis
Ultrasonic metal welding (UMW) is a reliable solid-state joining technique used in manufacturing lithium batteries and wire terminals. Current research leverages deep learning methods to predict tensile strength of UMW joint. However, there is a gap in studying the prediction of contact resistance in UMW joints, which is a critical factor in the performance of lithium batteries and wire terminals. We propose a novel online method for predicting both the tensile strength and contact resistance in UMW. To enhance prediction accuracy, we use Variational Mode Decomposition (VMD) with Particle Swarm Optimization (PSO) to extract vibration signals and employ the Spectrum Customized Denoising Diffusion Probabilistic Model (SCDDPM) for time series data generation. Firstly, given the non-stationary nature of the raw vibration signal, VMD method is used for effective signal decomposition. To optimally separate effective vibration signals from interference, PSO is used to automatically determine the ideal decomposition number $K$ and penalty factor $α$ in VMD. Secondly, Considering the high-frequency and large data size of vibration signals, as well as the low-frequency of power and pressure signals, we develop a spectrum customized time series signal generation method named SCDDPM. This method overcomes the mode-collapse issue encountered with Generative Adversarial Networks and increases the number of abnormal samples. Lastly, a 17-layer MobileNetV2 network is used to extract features from vibration, power, and pressure. These extracted features are then concatenated for UMW quality prediction. Our proposed method has been evaluated in predicting two key quality aspects of UMW: contact resistance and tensile strength. It excels in predicting tensile strength with a RMSE of 10.11 and an $R^{2}$ of 0.85, outperforming existing best methods by reducing RMSE by 30.75% and increasing $R^{2}$ by 13.33%. Moreover, this is the first instance of successfully predicting contact resistance in UMW joints.
Analyzing entropy features in time-series data for pattern recognition in neurological conditions
2024, Artificial Intelligence in Medicine
In the field of medical diagnosis and patient monitoring, effective pattern recognition in neurological time-series data is essential. Traditional methods predominantly based on statistical or probabilistic learning and inference often struggle with multivariate, multi-source, state-varying, and noisy data while also posing privacy risks due to excessive information collection and modeling. Furthermore, these methods often overlook critical statistical information, such as the distribution of data points and inherent uncertainties. To address these challenges, we introduce an information theory-based pipeline that leverages specialized features to identify patterns in neurological time-series data while minimizing privacy risks. We incorporate various entropy methods based on the characteristics of different scenarios and entropy. For stochastic state transition applications, we incorporate Shannon’s entropy, entropy rates, entropy production, and the von Neumann entropy of Markov chains. When state modeling is impractical, we select and employ approximate entropy, increment entropy, dispersion entropy, phase entropy, and slope entropy. The pipeline’s effectiveness and scalability are demonstrated through pattern analysis in a dementia care dataset and also an epileptic and a myocardial infarction dataset. The results indicate that our information theory-based pipeline can achieve average performance improvements across various models on the recall rate, F1 score, and accuracy by up to 13.08 percentage points, while enhancing inference efficiency by reducing the number of model parameters by an average of 3.10 times. Thus, our approach opens a promising avenue for improved, efficient, and critical statistical information-considered pattern recognition in medical time-series data.
Hardware fingerprint construction of optical transmitters based on permutation entropy spectrum of chaos for secure authentication
2024, Optics and Laser Technology
In this paper, an optical transmitter authentication method using hardware fingerprints based on the permutation entropy spectrum (PES) of electro-optic chaos is proposed. By means of permutation analysis of chaotic time series generated by the electro-optic feedback loop, PES is defined and used as the hardware fingerprint of optical transmitters for secure authentication. The time division multiplexing (TDM) module is used to combine the chaotic signal and the message, while the optical temporal encryption (OTE) module is introduced to ensure the security of the fingerprint. At the receiver, the time division demultiplexing (TDD) module and the optical time decryption (TDD) module are introduced to recover the original chaotic signal. The trained support vector machine (SVM) models will then recognize legal and illegal optical transmitters by judging whether the PES of the received chaotic signal is legitimate or not. The simulation and experimental results show that PES of electro-optic chaos has obvious fingerprint characteristics and is sensitive to the feedback delay $T$ in the feedback loop. The authentication system based on PES can recognize both legal and illegal transmitters in optical networks, and the recognition accuracy can reach about 97%. Under the condition of a low signal-to-noise ratio, the authentication system based on the permutation entropy spectrum can still distinguish legal and illegal transmitters effectively. Meanwhile, our scheme has high controllability and flexibility.
Automatic detection and characterization of uterine contraction using Electrohysterography
2024, Biomedical Signal Processing and Control
Preterm birth is the leading cause of perinatal morbidity and mortality. In clinical practice, the information of uterine contraction is an important reference for preterm delivery. The commonly used tocography method for detecting uterine contractions has low sensitivity and is not suitable for long-term measurement. The electrohysterography (EHG) can record the electrical activity of uterine muscle cells associated with contractions through electrodes on the abdomen. Therefore, this paper proposes a new automatic detection method of uterine contractions based on EHG signals. Specifically, utilizing the nonlinear property of entropy analysis, contraction and spike noise are distinguished in the complexity domain to highlight contraction activity. Then, the location of the contraction is detected by an adaptive thresholding method. Finally, the detected contractions are analyzed for pregnancy and labor. The results show that compared with the existing root mean square methods, the proposed method is less affected by spike noise, and the contraction detection rate on the Icelandic EHG database reaches 87.9%. Meanwhile, features extracted from detected contractions have significant differences between non-labor and labor categories. This demonstrates the feasibility of contraction detection by EHG signal, which can contribute to better care for high-risk pregnant women.
Reliability and minimal detectable change of nonlinear analysis measure of postural control in older adults with mild cognitive impairment
2024, Gait and Posture
Evaluating quiet stance under various conditions using nonlinear analysis may be an effective method of measuring postural control in older adults with mild cognitive impairment (MCI). However, no studies have examined the reliability of using sample entropy (SampEn) in older adults with MCI.
What are the within- and between-session reliability and minimal detectable change (MDC) of a nonlinear analysis measure of postural control during quiet stance in older adults with MCI?
Fourteen older adults with MCI performed static standing under four conditions, and the center of pressure signal was calculated and applied to SampEn nonlinear analysis. The within- and between-session reliability and MDC were explored.
Within-session reliability was found to be fair to good and excellent (ICC = 0.527–0.960), and between-session reliability was excellent (ICC = 0.795–0.979). MDC values were less than 0.15.
The between-session reliability of SampEn in all conditions demonstrates SampEn’s stable performance. This method may be useful in assessing postural control in older adults with MCI, and MDC values may be helpful in detecting subtle changes in patient performance.

View all citing articles on Scopus

View full text