# Seventy-year long record of monthly water balance estimates for Earth’s largest lake system

#### ByAUTHOR

Aug 21, 2020

Figure 2 shows a schematic of the Bayesian inference approach encoded in the L2SWBM. The following sections elaborate on the independent data sources used as inputs, and describe the components of the L2SWBM in greater detail.

### A compilation of multiple data sources for the Great Lakes water balance

Multiple hydro-climate datasets are available to represent the water balance of the Great Lakes31 ranging from gauge-based aggregated data43,52 to model simulations47 and remote sensing products40. However, they were mostly developed independently with limited consideration of fidelity to the water balance31. This inconsistency among available data sets is a long-standing challenge facing Great Lakes hydrologic research41, and has motivated the development of the L2SWBM50. To inform the Bayesian framework encoded within the L2SWBM, we selected eight independent data sources, including:

• Beginning of month (BOM) water levels (H) for each of the Great Lakes, provided by the binational Coordinating Committee on Great Lakes Basic Hydraulic and Hydrologic Data (referred to as “Coordinating Committee”, or CCGLBHHD, hereafter; for more information about this ad-hoc group, please see Gronewold, et al.31). Lake wide-average water levels were calculated as the arithmetic mean of daily water level measurements over a subset of in situ gauges located across the coastline of each lake. A greater detail of the underlying data sets is discussed in Gronewold, et al.31.

• Diversions (D) into, or out of, each lake were provided by the Coordinating Committee31. Diversions include the Long Lac and Ogoki Diversion into Lake Superior, the Chicago Diversion from Lake Michigan-Huron, and the Welland Canal, which diverts water from Lake Erie to Lake Ontario.

• Connecting channel flows (I or Q) are obtained from two independent data sources. The first dataset was estimated by the Coordinating Committee using a variety of methods such as stage-fall discharge equations or aggregation of discrete flow measurements31. The second dataset was measured using Acoustic Doppler Velocity Meters, which were installed across International Gauging Stations (IGS) maintained by the United States Geological Survey and Water Survey Canada31.

• Over-lake precipitation (P) is obtained from four data sources: (i) the NOAA-GLERL Great Lakes Monthly Hydrometeorological Database (GLM-HMD)43; (ii) output of the Great Lakes Advanced Hydrologic Prediction System (AHPS)53, which is operated by the United States Army Corps of Engineers (USACE); (iii) National Weather Service Multisensor Precipitation Estimates (NWS MPE)54; and (iv) Meteorological Service of Canada’s Canadian Precipitation Analysis (CaPA)40,55.

• Over-lake evaporation (E) is obtained from three data sources: (i) the NOAA-GLERL GLM-HMD43; (ii) output of the USACE AHPS41; (iii) the Environment and Climate Change Canada’s Water Cycle Prediction System (ECCC WCPS)39; and output of the NOAA-GLERL Finite-Volume Community Ocean Model (FVCOM)56.

• Tributary lateral runoff (R) is obtained from three data sources: (i) the NOAA-GLERL GLM-HMD43; (ii) output of the USACE AHPS41; and (iii) the ECCC WCPS39.

Table 1 provides a summary of these data sets and indicates which data set was used to estimate the prior distributions and likelihood functions. Besides the data sets included in Table 1, there are other regional31 and global57,58 data products that have been identified for potential applications of the L2SWBM on the water balance of the Great Lakes (and other large lakes) in the future.

### The Large Lakes Statistical Water Balance Model (L2SWBM)

The L2SWBM uses a conventional water balance model to constrain component estimates, ensuring that the water balance can be closed over multiple timespans for the Great Lakes system. For Lake Superior, Lake Michigan-Huron, Lake Erie, and Lake Ontario, changes in storage over a specific time window were defined using Eq. (1).

$$Delta {H}_{j,w}={H}_{j+w}-{H}_{j}=mathop{sum }limits_{i=j}^{j+w-1}left({P}_{i}-{E}_{i}+{R}_{i}+{I}_{i}-{Q}_{i}+{D}_{i}+{varepsilon }_{i}right)$$

(1)

where: ΔH: change in lake storage over w months, i.e. from month j to month j + w (mm);

P: over-lake precipitation (mm);

E: over-lake evaporation (mm);

R: lateral tributary lake inflow (mm);

I: inflow from upstream lake (m3/s);

Q: outflow to downstream lake (m3/s);

D: inter-basin diversions (to or from a specific lake) and consumptive uses (m3/s);

ε: process error term representing water level changes not explained by the other components (mm) such as ground-water fluxes or glacial isostatic rebound59.

We note that the L2SWBM code converts I, Q, and D from flow rate (m3/s) to lake-depth (mm) using lake surface area whenever required (e.g., the unit of millimetre is required to calculate water balance closure). The sign of D depends on whether water is diverted to (positive values) or from (negative values) a specific lake. In addition, this study used a rolling window of w = 12, which generally leads to better results regarding water balance closure48,49.

Over Lake St. Clair, which has a substantially smaller surface area relative to the other four lakes, the combined effect of inflow (from Lake Michigan-Huron via the St. Clair River) and outflow (to Lake Erie via the Detroit River) generally dominates the hydrologic cycle. Therefore, only net basin supply (NBS = PE + R) was modelled, and the water balance equation for Lake St. Clair was modified as below.

$$Delta {H}_{j,w}={H}_{j+w}-{H}_{j}=mathop{sum }limits_{i=ji=j}^{j+w-1}left(NB{S}_{i}+{Q}_{MH{U}_{i}}-{Q}_{i}+{varepsilon }_{i}right)$$

(2)

where QMHU is the outflow from Lake Michigan-Huron while the other variables are defined following those of Eq. 1.

Each water balance component was then inferred through a Bayesian approach, in which the “true” value of a variable (e.g., over-lake precipitation for Lake Superior) at a specific time-step (e.g., Jan 2019) was probabilistically estimated using a prior probability distribution and likelihood functions parameterized from multiple independent data sources. The following section will describe our approach to parameterizing the Great Lakes water balance using the L2SWBM. It is informative to note that the following sections share some similarity to the recent publication on the L2SWBM48. However, we also included more details on specific modifications (e.g., data used to derive the L2SWBM parameters) in our application to derive a seventy-year long record for the Great Lakes water balance.

#### Prior distributions of water balance components

We first modelled each water balance component with a probability distribution family, representing a “prior belief” of the possible range of values. The parameters of these distributions were empirically estimated from historical data spanning from 1950 to 2019 (presented in Table 1). Specifically, over-lake evaporation (E), connecting-channel inflow (I), connecting-channel outflow (Q), diversions (D) as well as net basin supply (NBS; for Lake St. Clair) corresponding to each calendar month m ((min [1,12])) were modelled with a normal distribution:

$$pi left({E}_{m}right)={rm{N}}left({mu }_{E,m},{tau }_{E,m}/2right)$$

(3)

$$pi left({I}_{m}right)={rm{N}}left({mu }_{I,m},{tau }_{I,m}right)$$

(4)

$$pi left({Q}_{m}right)={rm{N}}left({mu }_{Q,{m}_{t}},{tau }_{Q,m}right)$$

(5)

$$pi left({D}_{m}right)={rm{N}}left({mu }_{D,{m}_{t}},{tau }_{D,m}right)$$

(6)

$$pi left(NB{S}_{m}right)={rm{N}}left({mu }_{NBS,m},{tau }_{NBS,m}right)$$

(7)

where the mean (μ) and precision (τ) parameters were calculated empirically from historical data. The use of the precision ((tau =1/{sigma }^{2})) rather than the variance (({sigma }^{2})) in this study is the conventional practice for Bayesian inference60. We note that the precision of the prior probability distribution for E was divided by two (i.e., the variance was doubled) as showed in Eq. 3. This modification allowed a broader range of feasible values to account for a potential shift of evaporation in a warming climate48.

Lateral runoff (R) drained to each lake from the corresponding basin for each calendar month m ((min [1,12])), which is always positive, was then modelled with a lognormal prior probability distribution:

$$pi left({R}_{t}right)={rm{LN}}left({mu }_{{rm{ln}}(R),m},{tau }_{{rm{ln}}(R),m}right)$$

(8)

where t is a specific time step, and prior mean (({mu }_{{rm{ln}}(R),m})) and precision (({tau }_{{rm{ln}}(R),m})) were calculated for each calendar month m using historical data records for that month. For example, at time step (t) January 2019, we have m equals 1 and the lateral runoff is modelled using mean and precision calculated from all observed January runoff values.

We modelled over-lake precipitation (P) using a gamma probability distribution, where the distribution parameters for each calendar month m were also calculated empirically from historical data.

$$pi left({P}_{m}right)={rm{Ga}}left({psi }_{m}^{1},{psi }_{m}^{2}right)$$

(9)

The shape (({psi }^{1})) and rate (({psi }^{2})) parameters of the gamma distribution were defined as below (following Thom61).

$${psi }_{m}^{1}=frac{1}{4{phi }_{m}}left(1+sqrt{1+frac{4{phi }_{m}}{3}}right)$$

(10)

$${phi }_{m}=lnleft({mu }_{P,m}right)-{mu }_{{rm{ln}}(P),m}$$

(11)

$${psi }_{m}^{2}={psi }_{m}^{1}/{mu }_{P,m}$$

(12)

where ({mu }_{P,m}) (Eq. 11), and ({mu }_{{rm{ln}}(P),m}) (Eq. 12) are respectively the mean of historical precipitation, and the mean of the logarithm of precipitation for calendar month m.

The error term ε in Eq. 1 and Eq. 2 was also modelled using a vague normal prior probability distribution following Gelman62 across all calendar months:

$$pi left({varepsilon }_{m}right)={rm{N}}left(0,0.01right)$$

(13)

#### Likelihood functions for analysis period

To derive the likelihood functions for the analysis period, data from multiple data sources spanning over the 1950–2019 period was used (note that the temporal coverage varies substantially across the data sets, as presented in Table 1).

For changes in lake storage over a period of w months, the likelihood function was defined as:

$${y}_{Delta {H}_{j,w}}={y}_{{H}_{j+w}}-{y}_{{H}_{j}} sim Nleft(Delta {H}_{j,w},{tau }_{Delta {H}_{j,w}}right)$$

(14)

in which the observed change in storage over a rolling window of length w months (({y}_{Delta {H}_{j,w}})) is the difference between water level measurements (yH) at the beginning of month j + w and month j. We modelled this value with a normal distribution with mean (Delta {H}_{j,w}) and precision ({tau }_{Delta {H}_{j,w}}).

The likelihood functions for water balance components on the right hand side of Eq. 1 (Eq. 2 for Lake St. Clair) follow a normal distribution:

$${y}_{t,theta }^{n} sim {rm{N}}left({theta }_{t}^{n}+{eta }_{theta ,{m}_{t}}^{n},{tau }_{t,theta }^{n}right)$$

(15)

where (theta in (P,E,R,I,Q,D,NBS)), ({y}_{t,theta }^{n}) is data source (nin [1,N]) for component θ (which has N independent data sources) at time step t (e.g., t = Jan 2019; note that mt = 1 in this case); ({eta }_{theta ,{m}_{t}}^{n}) is the bias of data source number nth in calendar month m ((min [1,12])) and ({tau }_{t,theta }^{n}) is the precision of data source number nth at time step t.

Similar to other applications of the L2SWBM50,51, the precision of changes in lake storage (({tau }_{Delta {H}_{j,w}})) and the precision of data sources of each water balance component over each time step (({tau }_{t,theta }^{n}); (theta in P,E,R,I,Q,D,NBS)) were modelled with a gamma prior probability distribution with both shape and scale parameters equal 0.1:

$${tau }_{Delta {H}_{j,w}}={rm{Ga}}left(0.1,0.1right)$$

(16)

$${tau }_{t,theta }^{n}={rm{Ga}}left(0.1,0.1right)$$

(17)

Except for channel flows (of which the bias is relatively low for most lakes46), the bias of each contributing data set was modelled using a normal distribution with mean 0 and precision 0.01 (i.e., a standard deviation of 10):

$$pi left({eta }_{theta ,{m}_{t}}^{n}right)={rm{N}}left(0,0.01right)$$

(18)

#### Statistical inference of water balance components

To infer new estimates for the Great Lakes water balance over the 1950–2019 period, the L2SWBM was used to encode the prior distributions and likelihood functions estimated from available independent datasets into a JAGS (Just Another Gibbs Sampler) model inference routine63, which is an open-source successor to BUGS (Bayesian inference Using Gibbs Sampling)64. The ‘rjags’ package within the R statistical software environment65 was then used to simulate the JAGS model over 1,000,000 Markov Chain Monte Carlo (MCMC) iterations using two parallel MCMC chains. We omitted the first 500,000 iterations as a “burn-in” period. The remaining 500,000 iterations were then thinned at 250-iteration intervals to retain the final subset of 2,000 iterations. The 95% credible interval of the final subset was used to infer a feasible range for each water balance component.

It is informative to note that historical estimates of over-lake evaporation are not readily available from 1900 to 1949. As a result, this study only used data spanning from 1950 to 2019 to inform the statistical inference. To ensure that no observation was used to estimate both the prior distribution and the likelihood function (and thus would be favoured by the L2SWBM) at any specific time step, we used the following approach to derive the prior distributions:

• For the analysis period from 1950 to 1984: the prior distributions are generated from historical data covering the 1985–2019 period.

• For the analysis period from 1985 to 2019: the prior distributions are generated from historical data covering the 1950–1984 period.