# Analysis of temporal correlation in heart rate variability through maximum entropy principle in a minimal pairwise glassy model ### Summary of experimental data

In this Section, we give some details about the data and the quantities considered in our analysis. This research has been accomplished as a part of the Project MATCH (Mathematical Advanced Tools to Catch Heart-Rate-Variability), a scientific cooperation among Ascoli Piceno Hospital (APH), POLISA, University of Calabria and University of Salento: Holter recordings have been acquired at APH and their data-base built at POLISA during the period 2016–2019.

The database is made of ECG recordings on (M=348) patients, wearing an Holter device for nominal 24 h. From these recordings we extract the RR series

begin{aligned} { mathbf {r}(i) }_{i=1,ldots ,M} = { r_n (i)}_{begin{array}{c} i=1,ldots ,M \ n=1,ldots ,N_i ,end{array}} end{aligned}

(1)

where i labels the patient and n labels the number of beats in each sequence (which is order of (10^5) and depends on the patient). Patients belong to three classes, according to their clinical status: healthy individuals (H), individuals with atrial fibrillation (AF) and individuals with congestive heart failure (hereafter simplified as cardiac decompensation) (CD). Their number is (M_H = 149), (M_{AF}=139), and (M_{CD} =60), respectively; of course, (M=M_H + M_{AF}+M_{CD}). In Fig. 1 (left) we show examples of the series (mathbf {r}(i)) for three patients belonging to the different classes.

In order to make a meaningful comparison of the variability among the RR series (mathbf {r} (i)) of different patients, we standardize them with respect to their temporal mean and standard deviation, so that the study of HRV is recast in the study of fluctuations of the standardized RR series around the null-value. More precisely, we introduce

begin{aligned} z_n(i) = frac{r_n(i) – langle {mathbf {r}}(i) rangle }{text {std} [mathbf {r}(i)] }, ,, text {for} ,, n=1, ldots , N end{aligned}

(2)

or, in vectorial notation,

begin{aligned} mathbf {z}(i) = frac{mathbf {r}(i) – langle mathbf {r}(i) rangle }{text {std} [mathbf {r}(i)] }, end{aligned}

(3)

where we defined

begin{aligned} langle mathbf {r} (i) rangle = frac{1}{N_i} sum _{n=1}^{N_i} r_n (i), quad langle mathbf {r}^2(i) rangle = frac{1}{N_i} sum _{n=1}^{N_i} r^2_n (i), quad text {std}[mathbf {r}(i)] = sqrt{langle mathbf {r}^2(i) rangle – langle mathbf {r} (i) rangle ^2}. end{aligned}

(4)

The raw histograms for the standardized inter-beat intervals in the three classes of patients are shown in Fig. 2: notice that the frequency distributions exhibit heavy-tails.

We consider the points in the standardized RR series as random variables sampled by a hidden stochastic process, in such a way that the value of (z_n(i)) at a given step n depends in principle on all the values ({z_m(i)}_{m<n}) taken in the previous steps (m<n) since the beginning of sampling. From this perspective, a meaningful observable to look at is the auto-correlation function at a distance (tau), defined as

begin{aligned} C(i, tau ) = frac{1}{N_i – tau } sum _{n=1}^{N_i -tau } big ( z_n(i) -langle {z}(i) rangle _{+}big )big (z_{n+tau }(i)- langle {z}(i)rangle _{-}big ), end{aligned}

(5)

where (langle {z}(i) rangle _{+} = frac{1}{N_i-tau } sum _{n=1}^{N_i-tau } z_n (i)) and (langle {z}(i)rangle _{-} = frac{1}{N_i-tau } sum _{n=tau }^{N_i} z_n (i)). Given the standardization over the whole segment ([1, N_i]), as long as (tau ll N_i), we expect that (langle {z}(i)rangle _{+}) and (langle {z}(i)rangle _{-}) are both close to zero and shall be neglected in the following (indeed, we checked that this is the case, since (langle {z}(i)rangle _{+},langle {z}(i)rangle _{-} sim 10^{-15}div 10^{-17})). Then, the auto-correlation function we measure simply reduces to

begin{aligned} C(i, tau ) = frac{1}{N_i – tau } sum _{n=1}^{N_i – tau } z_n(i) z_{n+tau }(i) . end{aligned}

(6)

Some examples of the autocorrelation function for patients of the three classes are reported in the right plot of Fig. 1, where we stress that the autocorrelation is non-null over a large (tau) window and its shape is patient-dependent. The last property stems from the 1/f noise shown by HRV which, in turns, yields to dominant contributions in the low-frequency domain of the RR series.

Finally, we introduce a further average operation, this time on the sample of patients, namely, we define

begin{aligned} mathbb {E}_{class}(mathbf {z})= & {} frac{1}{M_{class}} sum _{i in class} mathbf {z} (i), end{aligned}

(7)

begin{aligned} mathbb {E}_{class}(mathbf {z^2})= & {} frac{1}{M_{class}} sum _{i in class} mathbf {z}^2 (i),end{aligned}

(8)

begin{aligned} mathbb {E}_{class}(C(tau ))= & {} frac{1}{M_{class}} sum _{i in class}C(i, tau ) . end{aligned}

(9)

where (class in { H, AF, CD}) and with “(i in class)” we mean all the indices corresponding to patients belonging to a certain class. In the following we will consider the vectors (mathbf {z}) as random variables sampled from an unknown probability distribution (P^{true}_{class}(mathbf {z})), which we will estimate by the probability distribution (P_{class}(mathbf {z})) characterized by a minimal structure and such that its first and second moments are quantitatively comparable with (mathbb {E}_{class}(mathbf {r})), (mathbb {E}_{class}(mathbf {z^2})) and (mathbb {E}_{class}(C(tau ))), respectively.

### Discussion on the model and on the inferential procedure

Our atomic variable is the sequence ({ z_1, z_2, ldots , z_N }) and, as anticipated above, we denote with (P(mathbf {z})) the related probability distribution emerging from the inferential operations on the sample of experimental data. The Shannon entropy (widetilde{H}[P(mathbf {z})]) associated to (P(mathbf {z})) is

begin{aligned} widetilde{H}[P(mathbf {z})] = – int {mathrm {d} mathbf {z} P(mathbf {z}) ln P(mathbf {z})}. end{aligned}

(10)

According to the maximum entropy principle, we look for the distribution (P(mathbf {z})) that maximizes (widetilde{H}[P(mathbf {z})]) and such that its moments match those evaluated experimentally, in particular, here the we choose to apply the constraints on the one-point and two-points correlation function that is, (mathbb {E}_{class}(mathbf {z})), (mathbb {E}_{class}(mathbf {z^2})) and (mathbb {E}_{class}(C(tau ))), respectively. To lighten the notation hereafter these moments shall be referred to simply as, respectively, (mu ^{(1)}), (mu ^{(2)}) and (C(tau )), without specifying the class. In fact, the inferential procedure works analogously regardless of the class, the latter affecting only the quantitative value of the parameters occurring in (P(mathbf {z})). Constraints are set via Lagrange multipliers ((lambda _0, lambda _1, lambda _2,lambda _tau )) in such a way that the problem is recast in the maximization of the functional

begin{aligned} widetilde{H}_{lambda _0,lambda _1,lambda _2,lambda _tau }[P(mathbf {z})]&=widetilde{H}[P(mathbf {z})]+ lambda _0left( int {mathrm {d} mathbf {z} , P(mathbf {z})}-1right) nonumber \&quad + lambda _1left( sum _{n=1}^{N}int {mathrm {d} mathbf {z} , P(mathbf {z}) z_n }-Nmu ^{(1)}right) + lambda _2left( sum _{n=1}^{N}int {mathrm {d} mathbf {z} , P(mathbf {z}) {z}_n^2}-Nmu ^{(2)}right) nonumber \&quad + sum _{tau =1}^{N}lambda _tau left( sum _{n=1}^{N-tau }int {mathrm {d} mathbf {z} , P(mathbf {z}) {z}_n z_{n+tau }}-(N-tau )C(tau )right) , end{aligned}

(11)

where integration is made over (mathbb {R}^N). Note that, while the derivation with respect to (lambda _1), (lambda _2) and (lambda _tau) ensure, respectively, the agreement between the theory and the experiments at the two lowest orders, i.e. the temporal average (mu ^{(1)}), the second moment (mu ^{(2)}) and the auto-correlation function (C(tau )), (lambda _0) guarantees that (P(mathbf {z})) is normalized, so that (P(mathbf {z})) is a probability distribution function. In the asymptotic limit of long sampling ((N rightarrow infty)) and under a stationarity hypothesis (see4 for a similar treatment and the section dedicated to Methods for the proof), the solution of the extremization procedure, returning the probability of observing a certain sequence (mathbf {z}), is given by

begin{aligned} P({z_n}_{n=1}^infty )=frac{1}{Z} left( prod _{n=1}^{infty } P_0 (z_n)right) exp left( sum _{n=1}^infty sum _{tau =1}^infty J(tau )z_n z_{n+tau }+hsum _{n=1}^infty z_nright) , end{aligned}

(12)

where h and (J(tau )) can be estimated from available data (vide infra). Here, (P_0) is the (mathscr {N}(0,1)) distribution and plays the role of prior for the variable (z_n), the parameter (J(tau )) represents the pairwise interaction between elements at non-zero distance (tau) in the series (notice that each element occurs to be coupled to any other), and the parameter h represents the bias possibly affecting the single value in the sequence (and it is expected to be zero as we standardized the RR series). The factor Z plays here as a normalization constant, like the partition function in the statistical mechanics setting10. Notice that the interaction between two elements (r_n) and (r_m) ((n>m)) depends on the distance (tau =n-m), but not on the particular couple considered. This stems from a“stationary hypothesis”, meaning that one-point and two-point correlation functions calculated on a segment spanning (O(tau ll N)) elements along the series are approximately the same and since the starting time of sampling is arbitrary, we get that (J(n,m) = J(m-n)).

The standard inference setup for the model parameters is based on a Maximum (log-)Likelihood Estimation (MLE), i.e. the maximization of the function

begin{aligned} mathscr {L}(mathbf {J}, h vert mathscr {D})= -frac{1}{M} sum _{mathbf {z} in mathscr {D}} log P(mathbf {z}vert mathbf {J},h), end{aligned}

(13)

where (mathscr {D}) is the time-series database (of a given class) and where we made clear the dependence of P on the model parameters. However, such an approach requires the computation of the whole partition function Z, which is numerically hard in this case. Then, we chose to adopt as objective function for the inference procedure the pseudo-(log-)likelihood function4:

begin{aligned} mathscr {L}(mathbf {J}, h vert mathscr {D})=-frac{1}{M} sum _{mathbf {z} in mathscr {D}}log P(z_{L+1}vert {z_n}_{n=1}^L), end{aligned}

(14)

that is, given L observations in a fixed time-series (mathbf {z}), we maximize the conditional probability to observe the value (z_{L+1}) at the successive time step (i.e., we are introducing a cut-off in the interaction range). Further, we make two main modifications with respect to the standard pseudo-likelihood approach: (i) in order to use the entire available time-series in our database, we also adopt a window average procedure; (ii) we add regularization terms in order to prevent divergence for the model parameters. A detailed discussion is reported in Appendix 4.2. Our objective function is therefore given by

begin{aligned} mathscr {L}^{(text {reg})} (mathbf {J}, h vert mathscr {D})=frac{1}{M} sum _{mathbf {z} in mathscr {D}}left[ -frac{1}{2(N-T)}sum _{n=T}^{N-1}big (z_{n+1}-h-sum _{tau =1}^{T}J(tau ) z_{n+1-tau } big )^2-frac{lambda }{2}h^2-frac{lambda }{2}sum _{tau =1}^T f(tau ) J(tau )^2right] , end{aligned}

(15)

where T is the largest (tau) we want to consider (namely T must be larger that the maximal decorrelation time), (lambda) is the regularization weight and (f(tau )) is a temporal regularizer that prevents the elements of (J(tau )) to get too large for large (tau) (see Appendix 4.2 for a detailed description).

The inference method allows us to determine the values of the parameters (J(tau )) and h as well as their uncertainties (sigma _{J(tau )}) and (sigma _h). As for the parameter h, due to series standardization, its value, evaluated over the different classes, is expected to be vanishing (this is indeed the case as it turns out to be (hsim 10^{-3}) with a related uncertainty of the same order). As for the pairwise couplings, we find that for all the classes considered, (J(tau )) is significantly non-zero only for relatively small values of (tau), with a cutoff at (T sim 10^2), and, for a given (tau < T), the coupling does not display a definite sign, that is, for pairs ((z_n, z_{n+tau })) and ((z_m, z_{m+tau })) at the same distance (tau) the related couplings can be of opposite signs. These results are shown in Fig. 3: in the left column we reported the inferred (J(tau )) with the associated uncertainties for all (tau), and in each panel in the right we reported the frequency distributions for the first values of (tau) as examples.

In order to study how decorrelation of RR intervals takes place, it is interesting to study how the interaction vector (J(tau )) (regardless of its sign) vanishes as the delay time (tau) increases. We found that the long delay time behavior of the magnitude (i.e. disregarding the signs and oscillatory characteristics) of the interactions is well-described by a power law of the form

begin{aligned} |J(tau )|_text {leading}sim {A}cdot tau ^{-beta }. end{aligned}

(16)

We thus fitted the tails of the inferred (J(tau )) with this trial function (the range of fitted points is chosen in order to maximize the adjusted (R^2) score). In Table 1, we report best-fit parameters, the adjusted (R^2) score and the reduced (chi ^2). It is interesting to note that the scaling parameter (beta) is around 1, meaning that, for each of the three classes, the leading behavior of the interactions at large (tau) is (sim 1/tau) (as we will see later, the same scaling also characterize the power spectral density in the Fourier domain, see Fig. 7). In the upper row of Fig. 4, we depicted with red circles the results of the inference procedure (once taken their absolute values), while the best fit of the general trend is represented with dashed black lines. In the lower row of the same figure, we also reported the residuals of the experimental values with respect to the best fit (normalized to the corresponding uncertainty). Apart from the first few points in the AF and CD cases, we see that the residuals are distributed in a range of at most (2 sigma _{J(tau )}) (where (sigma _{J(tau )}) is the standard deviation at each (tau) point), and, in particular, for sufficiently long (tau) (where oscillations are softened), experimental values are always contained in the range ([-sigma _{J(tau )},sigma _{J(tau )}]), implying that the leading behavior of the delayed interaction is well-captured by (sim 1/tau) noise both in our datasets as well as in the model’s prediction.

To summarize, here we highlight the main results.

• Couplings can be both positive and negative (see Fig. 3), defining the heart as a complex glassy system.

• The coupling magnitude decays in (tau) as a power-law whose leading order is (sim 1/tau) (see Fig. 4).

• The coupling magnitude displays a sharp scaling (1/tau) solely in healthy patients, while for the remaining patients it display a bump in the short time-scales (see Fig. 3 and the residual plots in Fig. 4).

• By fitting data via Eq. (16), we obtain estimates for the power-law exponent (beta) (as reported in Table 1): interestingly, different classes are associated to different best-fit values of (beta), in such a way that classification of cardiac failures via HRV via this route seems possible.

### Discussion on the model and on the generalization procedure

Once the model and the related parameters are inferred for each of the three classes, we can use the original sequences ({ mathbf {z} }) to generate synthetic sequences ({tilde{mathbf{z}}}) of length (N-T). The procedure followed to get the synthetic sequence is briefly described hereafter, while an extensive explanation is provided in App. 4.3.

For any class, we consider our estimate for (J(tau )), along with the estimate (sigma _{J(tau )}) of its uncertainty, and we build the noisy estimate for (J(tau )), that is (bar{J}(tau )=J(tau ) + delta J(tau )) where (delta J(tau )= eta sigma _{J(tau )}) and (eta) is a (mathscr {N}(0,1)) random variable. Next, taken a certain ({ mathbf {z} }), we convolve it with (bar{J}(tau )) and this returns ({tilde{mathbf{z}}}). Of course, due to the initial standardization of the RR series, the inference procedure returned a vanishing bias parameter h, hence the synthetic series will also be centered at zero. However, a synthetic sequence is no longer standardized and this is done by hand.

Then, it is natural to compare the synthetic sequences and the experimental ones. We generate a sample of data with the same size of the experimental data available, and we compute the empirical cumulative distribution function for both the experimental and synthetic data in order to compare them: results are reported in Fig. 5. In the first row, we directly compare the experimental (red solid line) and the synthetic (black dashed line) cumulative distributions highlighting an excellent agreement for all the classes. This is then corroborated by checking the probability plots in the same figure (second row): here, the red solid line shows the synthetic cumulative distribution versus experimental cumulative distribution, while the black dashed curve is the identity line. The green regions in the plot are confidence intervals with (p=0.95).

Next, we test whether the model is able to effectively capture correlation in the RR series, in particular by comparing experimental auto-correlation functions and their predicted counterparts. However, since autocorrelation functions are individual-dependent, starting from a single (randomly chosen) RR series we generate 100 synthetic series with different realizations of the (bar{J}(tau )) according to the above mentioned procedure (see also App. 4.3). In this way, we can use our estimation of the uncertainties on (J(tau )) in order to give a confidence interval for our predictions. In Fig. 6 we compare the auto-correlation functions for the experimental series and for the synthetic series; for the latter we also highlight the confidence interval with (p=0.68). More precisely, we depict the experimental (red solid line) and theoretical (black dashed line) auto-correlation functions and see that the former always fall inside the confidence interval of the re-sampled series (the green region). Thus, we can conclude that our inferred minimal pairwise model is able to effectively capture the temporal autocorrelation in the RR series.

As a final comment, we also looked at the power spectrum density (PSD) of the provided datasets ({mathbf {z}}) that, as expected (see e.g.,12,13,21), displays the long tail 1/f (see Fig. (7), upper panel) and we made the following comparison: for all the patients, we evaluated its PSD and in the region ([10^{-4}, 10^{-2}]) Heartbeat(^{-1}) we fit with a power-law

begin{aligned} text {PSD}(f) = alpha cdot f^{-gamma } end{aligned}

where (gamma sim 1) and its value is taken as the x-coordinate of that patient in the lower panels of Fig. (7). The corresponding y-value is obtained by calculating the PSD over 100 synthetic RR-series generated by convolution with the empirical series playing as seed and using as value of (varvec{J}) the one pertaining to the class the patient belongs to (H, AF, CD); results are in good agreement on the diagonal.