### Regionally-downscaled climate change projections

We used a management strategy evaluation (MSE) applied to ensemble projections of a climate-enhanced multispecies stock assessment within the integrated modeling framework of the Alaska Climate Change Integrated Modeling project (ACLIM)^{19}. For this, six high resolution downscaled projections of oceanographic and lower trophic level conditions in the Bering Sea (using the Regional Ocean Modeling System^{49,50}) were coupled to the BESTNPZ nutrient-phytoplankton-zooplankton model^{51}; we refer to this model complex throughout this paper as the Bering10K ROMSNPZ, or just ROMSNPZ, model. Boundary conditions were driven by three global general circulation models (GFDL-ESM2M^{52}, CESM1^{53}, and MIROC-ESM^{54}) projected (2006–2099) under the high-baseline emission scenario Representative Concentration Pathway 8.5 (RCP 8.5) and midrange global carbon mitigation (RCP 4.5; note, that for CESM under RCP 4.5, projections from 2080–2100 were unavailable so conditions from 2080–2099 were held constant at 2080 conditions for that scenario only) future scenarios from the Coupled Model Intercomparison Project phase 5 (CMIP5)^{29,55}. Hermann et al.^{30,56} also report on downscaled hindcasts of oceanographic and lower trophic conditions in the EBS from 1970–2012 (see refs. ^{30,56,57} for detailed descriptions of model evaluation and performance). For each downscaled model simulation, we replicated the National Marine Fisheries Service Alaska Fisheries Science Center annual summer bottom-trawl survey in time and space in the ROMSNPZ model (using historical mean survey date at each latitude and longitude of each gridded survey station) to derive estimates of sea surface and bottom temperatures (Fig. 1). We additionally used a polygon mask of the survey area to estimate the average zooplankton abundance in the system during spring, summer, winter, and fall months. These indices were derived for each climate projection scenario, as well as a persistence scenario where conditions were held constant at the average of those for 2006–2017 from a hindcast simulation. All index projections were bias corrected to the 2006–2017 hindcast period using the delta method assuming unequal variance in the GCM projections and hindcast^{58} such that:

$$T_{mathop {{{mathrm{fut}}}}limits^prime ,y} = bar T_{{mathrm{hind}},overrightarrow {scriptstyle{mathrm{ref}}} } + frac{{sigma _{{mathrm{hind}},overrightarrow {scriptstyle{mathrm{ref}}} }}}{{sigma _{{mathrm{fut}}, overrightarrow{scriptstyle{mathrm{ ref}}} }}}left( {T_{{mathrm{fut}},y} – bar T_{{mathrm{fut}},overrightarrow {scriptstyle{mathrm{ref}}} }} right)$$

(1)

where (T_{mathop {{{mathrm{fut}}}}limits^prime ,y}) is the bias-corrected projected timeseries, (T_{{mathrm{fut}},y}) is the raw projected timeseries, (bar T_{{mathrm{hind}},overrightarrow {scriptstyle{mathrm{ref}}} }) is the mean of the hindcast during the reference years (overrightarrow {{mathrm{ref}}}) (2006–2017), (bar T_{{mathrm{fut}},overrightarrow {scriptstyle{mathrm{ref}}} }) is the mean of the raw projected timeseries during the reference years (overrightarrow {{mathrm{ref}}}), (sigma _{{mathrm{hind}},overrightarrow {scriptstyle{mathrm{ref}}} }) is the standard deviation of the hindcast during the reference years (overrightarrow {{mathrm{ref}}}), (sigma _{{mathrm{fut}},overrightarrow {scriptstyle{mathrm{ref}}} }) is the standard deviation of the raw projection timeseries during the reference years (overrightarrow {{mathrm{ref}}}).

### Climate-enhanced multispecies stock assessment model

Bias-corrected indices were then used as covariates in the climate-enhanced multispecies stock assessment model for the Bering Sea (hereafter CEATTLE)^{59} to evaluate the performance of alternative management approaches on future fish biomass and catch. CEATTLE is a climate-enhanced multispecies statistical age-structured assessment model with parameters for growth that are functions of temperature (i.e., temperature-specific average weight-at-age) and predation that are functions of temperature (via a bioenergetics-based predation sub-model)^{59,60,61}. Since 2016, the model has been used operationally in the Bering sea as a supplement to the annual BSAI pollock stock assessment^{61}. Various configurations of CEATTLE are possible; for this study we chose one where temperature-specific predator and prey interactions influenced natural mortality, temperature influenced weight-at-age, and the spawner-recruit relationship was a function of physical and biological future conditions as well as random variability (i.e., a climate-informed multispecies model). We fit the model using penalized maximum likelihood to survey biomass, diet, and fishery harvest data for three groundfish species pollock, Pacific cod, and arrowtooth flounder from the EBS in the EBS over the period 1979–2017. We also used the Bering10K ROMSNPZ model to produce detailed hindcasts of temperature for the period 1970–2017. We used hindcast-extracted timeseries from the ROMSNPZ model and CEATTLE model estimates of recruitment ((R_{i,y,l})) and spawning biomass ((B_{i,y – 1})) in hindcast year *y* for each species *i* to fit a climate-enhanced logistic recruitment per spawner model^{36}, such that:

$$ln left( {hat R_{i,y}} right) = alpha _i – beta _{0,i}B_{i,y – 1} + ln left( {B_{i,y – 1}} right) + {{{bf{B}}}}_i{{{bf{X}}}} + varepsilon _{i,y}$$

(2)

where ({{{bf{B}}}}_i{{{bf{X}}_l}}) is the summed product of each covariate parameter (beta _{ij}) and the corresponding environmental covariate (X_{j,y}) for each bias-corrected environmental index (j = ( {1,2…n_j} )). We selected indices representative of ecological conditions important for groundfish recruitment in the Bering sea^{39}; spring and fall large zooplankton abundances, survey replicated bottom temperature, and extent of the residual cold pool of extremely dense and cold sea water that persists across the EBS shelf following spring sea ice retreat. We assumed normally distributed (in log space) residual errors for each species ((varepsilon _{i,y} sim Nleft( {0,sigma _i^2} right))). The CEATTLE model was then projected forward where ROMSNPZ indices from individual projections drove growth, predation, and recruitment in each future simulation year^{36,62}.

### Evaluation of harvest management approaches

Previous authors have defined EM (i.e., the incorporation of ecosystem information into marine resource management) as a continuum between two paradigms of management and focus^{18}. On one end is within-sector single-species management that considers ecosystem information (EAFM) and on the other is cross-sectoral whole of ecosystem management (i.e., EBM). EBFM is intermediate between these and is defined by quantitative incorporation of ecosystem interactions into assessment models and target setting (EBFM). Most fisheries management in the Bering Sea can be characterized as EBFM or EAFM, with increasing trends toward cross-sectoral coordination at the scale of EBM. Here we focus on one aspect on this scale of potential management options, operational EBFM and EAFM as captured through the CEATTLE multispecies stock assessment model and harvest policies decisions made annually under the constraint of the 2 MT cap (modeled via the ATTACH model).

MSE is a process of “assessing the consequences of a range of management strategies or options and presenting the results in a way which lays bare the tradeoffs in performance across a range of management objectives”^{63}. MSE has been frequently used to evaluate alternative management strategies based on single-species estimation methods^{64}. It is increasingly used to evaluate ecosystem management performance, although these evaluations are far less commonplace due to the complexity of modeling and assessing the performance of ecosystem level metrics^{64}. Importantly, MSE “does not seek to proscribe an optimal strategy or decision”^{63}, rather it aims to describe the uncertainty and tradeoffs inherent in alternative strategies and scenarios. In this case, through a series of workshops, we worked with managers and stakeholders to identify priority scenarios and outputs^{19}. From this, risk, sensitivity, and uncertainty under contrasting climate scenarios were requested outputs of the analysis, as was the performance of current climate-naive EBFM policies.

A key component of MSE is identifying and quantifying uncertainty (i.e., process, observation, estimation, model, and implementation error) and representing it using an operating model. In the case of this MSE, the focus was on process error uncertainty due to variation in recruitment about the fitted stock-recruitment relationship, one major source of model error in the form of alternative climate scenarios, and implementation error. The MSE does not account for estimation error (uncertainty in the parameters of the operating model) nor observation error. This is because the estimates of recruitment and spawning biomass from CEATTLE for the BSAI are very precise (see Fig. 10 in ref. ^{60}) and the estimation and operating models are therefore very similar. Thus, CEATTLE is the operating model for this MSE and implicitly the estimation method. In this approach we assume that while allowing for observation error would have increased overall error, the effect would have been minor compared to the investigated uncertainties. Future analyses using a full MSE (i.e., separate operating and estimation models) could evaluate the effect of observation error, but perhaps more importantly, the potential for model error, whereby the population dynamics model (on which the estimation method is based) differs from that of the operating model such that the estimates on which management decisions are made are biased relative to the true values in the operating model.

Given this we summarized the relative change in catch and biomass for the three species in the model under the following fishing scenarios (Fig. 1): (a) projections without harvest ((F_{i,y} = 0)) in each year *y* of scenario *l* for each species *i*, (b) projections under target harvest rate (Supplementary Fig. 7 left) and with a sloping harvest control rule (HCR) (Supplementary Fig. 7 right), (c) as in 2 but with the constraint of a 2 MT cap applied dynamically to the three focal species only.

Under the North Pacific Fishery Management Council (NPFMC) constraint of the 2 MT cap on cumulative total annual catch, realized harvest (i.e., catch) and specification of individual species harvest limits known as Total Allowable Catch (TAC; metric tons) are a function of the acceptable biological catch (ABC) for the given species, as well as ABC of other valuable species in the aggregate complex^{19,34} (https://github.com/amandafaig/catchfunction). TAC must be set at or below ABC for each species, therefore TAC of individual species are traded-off with one another to avoid exceeding the 2 MT cap. From 1981 to 1983, the TAC of pollock was reduced significantly below the ABC and in 1984 the 2 MT cap became part of the BSAI fishery management plan^{21,34,65}. Pacific cod regulations have changed markedly over recent decades and it was only in the 1990s that in many years the catch and TAC approached its ABC. Thus, we used the socioeconomic ATTACH model (the R package ATTACHv1.6.0 is available with permission at https://github.com/amandafaig/catchfunction^{34}) to model realized catch in each simulation year as a function of CEATTLE assessment estimates of ABC (tons) for pollock, Pacific cod, and arrowtooth flounder under future projections (2018–2100). This entailed three steps for each future simulation year (y):

- 1.
project the population forward from (y – 1) to (y) using estimated parameters from the multispecies mode of the CEATTLE model fit to data from 1979 to 2017 and recruitment based on biomass in simulation year (y) and future environmental covariates from the ROMSNPZ model downscaled projections (see “Methods” above) to determine ({mathrm{ABC}}_{i,y,l}) for each species (

*i*) under each scenario (*l*) given the sloping harvest control rule for pollock, Pacific cod, and arrowtooth flounder in each simulation year (y); - 2.
use ({mathrm{ABC}}_{i,y,l}) of each species from step 1 as inputs to the ATTACH model in order to determine the North Pacific Marine Fishery Council Total Allowable Catch (TAC

_{i,y,l}) for the given simulation*l*year*y*; - 3.
use TAC

_{i,y,l}from step 2 to estimate catch (tons) in the simulation year (Fig. 1); remove catch from the population and advance the simulation forward 1 year.

### Determine the annual ABC

We used end-of-century projections (2095–2099) to derive a maximum sustainable yield (MSY) proxy for future harvest recommendations (ABC_{i,y,l}) for each scenario *l*. To replicate current management, we used a climate-specific harvest control rule that uses climate-naive unfished and target spawning biomass reference points ((B_{0,i}) and (B_{{mathrm{target},i}}), respectively) and corresponding harvest rates ((F_{i,y} = 0) and (F_{i,y} = F_{{mathrm{target}}}) and (B_{i,y,l}) in each simulation *l* year (y) for each species (i)

$${mathrm{ABC}}_{i,y,l} = mathop {sum }limits_a^{A_i} left( {frac{{S_{i,a}F_{{mathrm{ABC}},i,y,l}}}{{Z_{i,a,y,l}}}left( {1 – e^{ – Z_{i,a,y,l}}} right)N_{i,a,y,l}W_{i,a,y,l}} right)$$

(3)

where (W_{i,a,y,l}), (N_{i,a,y}), and (Z_{i,a,y,l}) is the climate-simulation specific annual weight, number, and mortality (i.e., influenced through temperature effects on recruitment, predation, and growth) at age (a) for (A_i) ages in the model, and (S_{i,a}) is the average fishery age selectivity from the estimation period 1979–2017^{59,60}. (F_{{mathrm{ABC}},i,y,l}) and was determined in each simulation timestep using an iterative approach^{66} whereby we: (i) first determined average (B_{0,i}) values in years 2095–2099 by projecting the model forward without harvest (i.e., (F_{i,y} = 0)) for each species under the persistence scenario. We then (ii) iteratively solved for the harvest rate that results in an average spawning biomass ((B_{i,y})) during 2095–2099, that is, 40% of (B_{0,i}) (i.e., (F_{{mathrm{target}},i})) for pollock and Pacific cod simultaneously, with arrowtooth flounder (F_{i,y}) set to the historical average (as historical *F* for arrowtooth flounder ≪(F_{40% })); once (F_{{mathrm{target}},i}) for pollock and Pacific cod were found, we then iteratively solved for (F_{{mathrm{target}},i}) for arrowtooth flounder (Supplementary Fig. 7 left panel)^{59,60}. Last, (iii) to derive a climate-informed ({mathrm{ABC}}_{i,y,l}) in each simulation year, the North Pacific Marine Fisheries Council (hereafter, “Council”) Tier 3 sloping harvest control rule with an ecosystem cutoff at (B_{20% }) was applied to adjust (F_{{mathrm{ABC}},i,y,l}) lower than (F_{{mathrm{target}},i}) if the simulation specific (climate-informed) (B_{i,y,l}) was lower than 40% of the climate-naive (B_{0,i}) at the start of a given year or set to 0 if (B_{i,y,l} ,<, 20% B_{0,i}); (F_{{mathrm{ABC}},i,y} = F_{{mathrm{target}},i,y}) for the remainder of the simulations where (B_{i,y,l} ge 40% B_{0,i}))^{21}.

This approach follows the status quo Council reviewed multispecies assessment methodology^{60} and represents a precautionary approach that minimizes inflation of ABC due to predator release^{59} and also minimizes potential non-intuitive compound effects of climate change and fishing under declining conditions (i.e., whereby a climate-informed (B_0) declines with climate change and produces a lower target ((B_{{mathrm{40% }}})) biomass). While beyond the scope of this study, future simulations might further explore the performance of alternative (B_0) approaches (e.g., periodically updated climate-informed (B_0), annually varying (B_0) with climate-penalized (B_{{mathrm{40% }}})).

### Simulate TAC

ATTACH estimates the TAC that would be set by the North Pacific Fishery Management Council, and the subsequent catch that would be harvested by the commercial fishery, based on historical data and assuming current policies and priorities remain relatively unchanged in projections. The impacts of existing policies such as Amendment 80 (i.e., A80, which created multispecies cooperatives), the American Fisheries Act (i.e., AFA, which created pollock cooperatives), and large spatial closures are included and evaluated in the retrospective analysis. Future scenarios will explore relaxation or alteration of these underlying assumptions^{19}. ATTACH first estimates the TAC from the specified ABC through an ensemble of three log-linear regressions and seemingly unrelated regressions (SUR) where normally distributed error terms (varepsilon _{i,y}^{{mathrm{TAC}}}) are independent across time, but may have cross-equation contemporaneous correlations^{67}. Specifically, the models are statistically fit to historical ABC and TAC data from 1992 to 2017 such that:

$$ln left( {{mathrm{TAC}}_{i,y}} right) = , alpha _i + beta _i {mathrm{ln}}left( {{mathrm{ABC}}_{i,y}} right) + mathop {sum }limits_{j = 1}^{N_j} gamma _{ij}{mathrm{ABC}}_{j,y} + mathop {sum }limits_{k = 1}^{N_k} theta _{ik}I_k + varepsilon _{i,y}^{{mathrm{TAC}}}, \ qquad{mathrm{where}},varepsilon _{i,y}^{{mathrm{TAC}}}sim Nleft( {0,sigma _i^{{mathrm{TAC}}}} right)$$

(4)

where the harvest limit for species (i) in a given historical year (y) (({mathrm{TAC}}_{i,y})) is a function of the assessment model-based ABC (({mathrm{ABC}}_{i,y}), in metric tons) for the species (i), (alpha _i) is the log-linear intercept and (beta _i), (gamma _{ij}), and (theta _{ik}) are coefficients for ABC and policy covariates (I_k) (e.g., closures, A80, AFA). (varepsilon _{i,y}^{{mathrm{TAC}}}) is the residual error and is log-normally distributed. The residuals of equations estimated as a SUR system are assumed to be correlated, and this is used to more efficiently estimate the regression coefficients. This parameterization assumes exogenous shocks affect all included species to varying degrees. The mean relationship between TAC and ABC (Eq. 4) was used to simulate TAC_{i,y,l} in each projection year using inputs of ABC_{i,y,l} from the CEATTLE model (Eq. 3). Historically, in all but two years the sum of TAC across species has equaled 2 MT exactly, thus we imposed an addition constraint; if the cumulative predicted TACs from the ensemble exceeded 2 MT, the ({mathrm{TAC}}_{i,y,l}) of each species was proportionally reduced to satisfy the constraint of a 2 MT limit on the sum of all ({mathrm{TAC}}_{i,y,l}). Of note, this step simulates the current management regime, and is not an optimization.

### Simulate annual harvest (catch)

Similarly, ATTACH uses an ensemble of three models to estimate catch biomass ((C_{i,y}); tons) for a given target species (i) as a function of TAC_{i,y} and sometimes the TAC_{j,y} of 1 to 2 additional species *j*, as well as relevant policy/events in a given year:

$$ln left( {C_{i,y}} right) = , alpha _i + beta _iln left( {{mathrm{TAC}}_{i,y}} right) + mathop {sum }limits_{j = 1}^{N_j} gamma _{ij}{mathrm{TAC}}_{j,y} + mathop {sum }limits_{k = 1}^{N_k} theta _{ik}I_k + varepsilon _{i,y}^C, \ qquad {mathrm{where}},varepsilon _{i,y}^Csim Nleft( {0,sigma _i^C} right)$$

(5)

The three models in the ensemble differ in their error structure for catch: model 1 assumes each of the log-linear equations are independent, model 2 has two groups of linked SUR (representing species that are caught concurrently), and model 3 has three groups of linked SUR (representing a third group, on top of the two of model 2, of species whose catches are linked). As in (Eq. 4), (alpha _i) is the log-linear intercept and (beta _i), (gamma _{ij}), and (theta _{ik}) are coefficients for TAC and policy covariates (I_k) and (varepsilon _{i,y}^C) is the residual error and is log-normally distributed and assumed to be correlated across linked SUR.

The authors of the ATTACH model evaluated the performance of the TAC and catch prediction models using leave-one-out cross-validation (LOO-CV)^{34}. For each year, they estimated the coefficients of each model using data from all but that year, and then used those estimators to predict the TAC or catch of the omitted year. They then calculated the difference between the predicted and actual catch for each year (1992–2017) to evaluate the models using a variety of metrics: simple sum of differences, sum of percent differences, sum of squared differences (weighted by value, ABC, TAC, and catch, respectively), the sum of squared percent differences. They also evaluated those metrics for the final models (trained on the entire data set). They found that the ATTACH model performed best for species targeted by directed fisheries, as is the case for the species in this study, while predictive skill of the model was weaker for less economically valuable species. In all species, however, the ATTACH model performed better than assuming catch is equal to ABC (Supplementary Fig. 8).

For the MSE application in this study ({mathrm{ABC}}_{i,y,l}) and ({mathrm{ABC}}_{j,y,l}) of pollock, Pacific cod, and arrowtooth flounder output from the CEATTLE model were used as inputs into the ATTACH function for each simulation year, while the ({mathrm{ABC}}_{j,y}) of the remaining non-CEATTLE species were set to their respective historical averages (1992–2017). Pollock, Pacific cod, and arrowtooth flounder ABCs were used to predict ({mathrm{TAC}}_{i,y,l}) for each species, which in turn was used to predict catch (C_{i,y,l}) using estimated coefficients from the regressions described above (Eq. 4, 5). The (C_{i,y,l}) of pollock, Pacific cod, and arrowtooth were then removed from the population of each species in the simulation year (y) (by calculating the effective harvest rate (F_{i,y,l}) given the input catch (C_{i,y,l})) and the simulation was rolled forward 1 timestep.

Under this approach, TAC and catch estimates from ATTACH change solely in response to changes in pollock, Pacific cod, and arrowtooth flounder input ABCs. This assumption is significant but difficult to assess without considering alternative models that resolve the population dynamics of the remaining species managed under the 2 MT cap along with their biological interactions and potential future changes in relative harvest values. That said, the three species in CEATTLE are known to strongly interact, account for a large proportion of total EBS fish biomass, and strongly drive TAC allocation. Future sensitivity analyses, possibly using more speciose food web models^{32,68} would be useful to quantify this sensitivity.

### Evaluating management performance and risk

We evaluated the performance of these various management strategies under moderate and high-baseline climate futures. We explored temporal patterns in risk defined as the probability of a 10% decline in biomass or catch relative to the persistence baseline scenario for each harvest scenario. We also evaluated the probability of severe decline and collapse, defined as a greater than 50 and 80% decline (respectively) in biomass or catch in climate change scenarios (relative to the persistence scenario).

### Threshold and tipping point analysis

We use ecosystem threshold analyses^{69,70} to identify thresholds and tipping points for each species in the non-linear relationship between catch (response) and temperature (pressure). Using the multispecies model estimates of recruitment for each species ((i)) for each future year (*y*) of the simulation and each future scenario ((l)) we drew random samples from the log-normally distributed parameter estimate for climate-enhanced recruitment relationships and projected the model forward under the 2 MT cap management scenario. This formed 100 replicates of each future scenario for each species under no fishing and no cap simulations, and 30 replicates under the 2 MT cap. We calculated the change in catch (({Delta}C_{i,y,l})) relative to the persistence scenario ((l = 1)) as:

$${Delta}C_{i,y,l} = left( {C_{i,y,l} – C_{i,y,l = 1}} right)/C_{i,y,l = 1}$$

(6)

We then pooled the full set of ({Delta}C_{i,y,l}) and corresponding bottom temperatures (*T*) to estimate the threshold for the ({Delta}C_{i,y,l}sim fleft( T right)) relationship. Following Samhouri et al.^{69}, we fit general additive models using thin plate regression spline smoothing terms, using the mgcv^{71} package in R, with the basis dimension set to 4 to avoid overfitting. We then calculated the first and second derivative of the smoothing function s(x), as well as the 95% CI (as the 2.75 and 0.975 quantiles) of the smoothing and its second derivative by bootstrapping (*n* = 1000) the residuals. Last, the inflection or tipping point was defined as the temperature whereby the second derivative (s(x)”) changed sign and the 95% CI of the second derivative (smoothed with a local polynomial regression smoother using the loess() function in R with the span set to 10%) was most different from zero^{69}.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.