### Experimental design

Raw data used in the manuscript were collected by the ARC-SGI of South Africa in open field test plots. The raw data include observed dryland wheat yields matched by location with daily minimum and maximum temperatures and total precipitation recorded during the growing season at near-by weather stations. Weather station data were downloaded from NASA GSOD using GSODR^{59}. From the raw data, we only include wheat trial locations that have a weather station within 75 km and at least five years of wheat field trials, and wheat cultivars must appear in at least two trial years. It is possible that there are slight differences in the weather station observations and the actual weather at wheat trial locations, particularly with respect to precipitation. However, the weather station observations in the study region appear representative based on climatic norms^{60} and are the best available data for capturing daily extremes. This results in 18,881 wheat yield observations from ARC-SGI spanning 17 locations and 71 cultivars from 1998 to 2014.

The yield and weather data vary substantially in-sample, which supports robust estimation of wheat yield responses to extreme and average weather conditions (Supplementary Tables 1, 2; Supplementary Figs. 1 and 2). The growing season for each location-year-cultivar is defined by the planting and harvest dates, and typically span mid-May to late October. Planting and flowering dates are observed, not estimated. Flowering is defined as the day at which 50 percent flowering occurs. Harvest dates are not observed and thus inferred using a rule of 30 days after the observed flowering date, which provided consistent results with other alternatives discussed in the robustness checks below. Phenological information were collected by both ARC staff and wheat producers based on a field observation conducted once per weekday for ARC run stations and daily for producer fields. The temperature bins are calculated from maximum and minimum temperatures using a sinusoidal interpolation of temperature exposure within each day and span 5 °C intervals. Total days (24 h) spent within intervals for the entire season are summed into eight temperature exposure bins. All negative temperatures are summed into a single bin, as well as all temperatures above 30 °C. Notably, exposures greater than 30 °C occur substantially more in the Free State compared to the Western Cape.

### Statistical analysis

The preferred regression model specifies log wheat yield as a function of location, cultivar, and year fixed effects, as well as a quadratic polynomial for cumulative precipitation and the eight temperature bins mentioned above^{61}. The weather variables are seasonal aggregates from the observed planting date to the inferred harvest date. The highest temperature bin of >30 °C represents exposures known to negatively affect wheat yields^{62,63,64}. Average exposures across bins are provided for the two main dryland wheat-growing provinces, the Free State and Western Cape, in Supplementary Fig. 1. We considered simplified models that include linear and quadratic trends instead of year fixed effects, or (alternatively) omitting the temperature bins in favor of average temperatures, and found that they substantially reduced model performance (Supplementary Tables 3, 4). In addition, we considered extensions of the model that added pre-season precipitation (30 days before planting), or (alternatively) a cubic polynomial for in-season precipitation instead of a quadratic, and found that they also did not improve model performance. The preferred model is specified in Eq. (1):

$$y_{ijt} = alpha _i + alpha _j + alpha _t + beta _1p_{ijt} + beta _2p_{ijt}^2 + mathop {sum}limits_{k = 1}^8 {delta _k} Bin_{ijkt} + varepsilon _{ijt},$$

(1)

where *y*_{ijt} is log yield for cultivar *i* in location *j* in year *t*. Fixed effects (α) are included separately for cultivars, locations, and years. The weather variables include a quadratic polynomial effect for cumulative precipitation *p*_{ijt} and the nonlinear effect of weather across temperature bins *Bin*_{ijkt}.

There is likely a large amount of spatial correlation among the error terms of the model across cultivars in the same location, as well as across locations more generally. One could cluster standard errors by year to account for all spatial correlations, however there are 17 years in the data which is a questionably small number of clusters^{65,66}. Instead we cluster errors by year-province as there are only two provinces in the data, Western Cape and the Free State, and their boundaries are several hundred kilometers apart. This method accounts for correlations among the regressors which can also bias standard errors. Cameron and Miller (2015)^{65} report the variance inflation factor in their equation 6 as 1 + ρx ρu (*N* – 1), where *N* is the cluster size, ρu is the within-cluster correlation of the regression errors, and ρx is the within-cluster correlation of the regressor. Note that spatial correlation of the regressors can bias regression standard errors downward even if the errors are only slightly correlated. Just under their equation 6, Cameron and Miller (2015)^{65} cite a study in which the correlation of the errors was small at 0.03 but the inflation factor was 13 because the regressors were highly correlated.

To better investigate the role that spatial correlation is playing in this analysis, Moran’s I was calculated for each year in the data for both the log yield observations and the regression errors from the preferred model above. The averages of the Moran’s I across years is presented in Table 1. The distance of 1 km captures the within-trial correlations, whereas the distances 100, 500, and 1000 km capture broader groupings. Positive correlation exists in the log yield data and it is highest within-trial as expected. The correlation remains positive as distance increases but dilutes to its smallest value at 1000 km. The regression purges much of the correlation from the data as indicated by the Moran’s I for the errors, although some remains. As noted above, the clustered errors may still produce an adjustment by increasing the variances compared with classical Ordinary Least Squares which does not account for correlations. For example, the standard error on our measure of heat (the >30 °C bin) is 0.0196 under clustering but 0.00433 without clustering (i.e., robust standard errors). This suggests an inflation factor of approximately 20 which is quite large and important for adequately representing the statistical uncertainty in our warming impacts.

Heterogeneous cultivar-level temperature effects are investigated to assess the potential for climate change adaptation via cultivar selection. The preferred specification was modified to account for differences in cultivar effects using the following multilevel model^{66} specified in Eq. (2):

$$y_{ijt} = alpha _i + alpha _j + alpha _t + beta _1p_{ijt} + beta _2p_{ijt}^2 + mathop {sum}limits_{k = 1}^8 {delta _k} Bin_{ijkt} + u_iBin_{ij8t} + varepsilon _{ijt},$$

(2)

where we extend the preferred model to include a random slope (*u*_{i}) across cultivars for the highest temperature bin (30 °C+). Note that the fixed effects from the preferred model are include here as dummy variables in the fixed portion of the multilevel model. The only random effect in the multilevel model is for the effect of the >30 °C bin.

Warming impacts are based on uniform changes in the daily temperature data. For example, we use the observed (historical) daily minimum and maximum temperatures and increase them by 1 °C and then re-calculate the growing season bins for all locations and years^{3,43,45}. Averaging these across years and locations then provides a shifted climate to simulate yield change based on the initial regression model parameters and yield estimates. The impacts are calculated as (100left[ {e^{left( {{boldsymbol{Bin}},1 – {boldsymbol{Bin}},0} right)delta } – 1} right])where ** Bin** is a vector of the temperature bins for shifted (1) and baseline (0) climate. The same steps are repeated for the 2 and 3 °C warming scenarios as well. Estimates from the regression in Supplementary Table 3 are used for δ. The point estimation for warming scenarios relies on the Delta Method of asymptotic approximation for large samples as implemented via the nlcom command in Stata version 16.

### Robustness checks

The first robustness check we consider is replacing the temperature bins with a quadratic specification of seasonal average temperatures. Interestingly, a two-tailed joint test under this model implies that temperatures do not have a statistically significant effect on yields (F(2,30) = 0.57, *p* = 0.5716), thereby suggesting that seasonal averages cannot capture yield reductions associated with heat above 30 °C as in our preferred model. The seasonal average model generates misleadingly small warming impacts (Supplementary Fig. 4).

Next we investigate the appropriateness of the equally spaced five degree exposure bins by examining three alternatives: (i) bins of length three degrees, (ii) bins of length five degrees but with a threshold of 29 °C and, separately, (iii) a threshold of 31 °C. We find that all three alternatives produce similar marginal effects of temperatures and warming impacts as our preferred model (Supplementary Figs. 5 and 6).

Under our preferred model the parameters for precipitation and precipitation squared are statistically significant for a two-tailed joint test (F(2,30) = 9.43, *p* = 0.0007). We find that the yield effects of precipitation are not trivial as a one standard deviation reduction in cumulative rainfall below the average level is associated with a 9.6% yield reduction. To more directly investigate the differentiated impacts of drought and heat, the precipitation component was modified to include the quadratic function (as in the preferred model) along with an indicator variable that takes on a value of “1” when cumulative precipitation is below the 10th percentile of all observed rainfall data. This indicator captures low rainfall conditions likely associated with droughts, and findings suggest the effect of 10th percentile rainfall is an 18% yield reduction (Delta Method = −2.95, *p* = 0.003). The inclusion of the additional low-rainfall control variable produced similar marginal effects of temperatures and warming impacts as our preferred model (Supplementary Figs. 7 and 8). In addition, we consider controlling for the seasonal variation of precipitation as in Rowhani et al. (2011)^{67}, but found a similar pattern of results for the temperature and warming effects (Supplementary Figs. 8 and 9). Thus, the high temperature effect and precipitation effect seem well differentiated from each other, likely due to the location and year fixed effects that control for (among other things) locations with a more drought-prone climate and widespread droughts across locations within years.

It is essential that cultivars in the data experience sufficient heat exposure to capture the temperature effects, especially when we estimate the cultivar-specific heat effects. Within the sample, every cultivar was exposed to temperatures above 30 °C ranging from 4 to 115 h. Not every cultivar was exposed to temperatures above 30 °C at every location, but cultivars with no exposure above 30 °C at every location account for less than 10 percent of observations. Nonetheless, as a robustness check for the warming impact estimates we drop cultivar-years not experiencing exposures above 30 °C and re-estimate the model. We find similar marginal effects of temperatures and warming impacts as our preferred model (Supplementary Figs. 9 and 10).

We also consider whether allowing the temperature and precipitation effects to vary within season affects the warming impacts. We separate the growing season into three stages: (i) planting to 20 days before flowering to capture the vegetative stage, (ii) 20 days before to 10 days after the flowering date to capture the flowering stage, and (iii) 10 days after flowering to the end of season to capture the grain-filling stage. We then re-estimate the model including stage-specific measures of the precipitation and temperature variables, and find that warming impacts are very similar to those from our preferred model approach (Supplementary Fig. 11).

Next, we analyze whether cultivars developed from specific breeders provide differential heat effects by interacting the temperature bin variable for exposures above 30 °C with dummy variables for each of the three breeders represented in our data: Pannar, Sensako, and the South African ARC-SGI. A two-sided joint test of these interactions suggest that the heat effects do differ across breeders for *n* = 18,629 yield observations with breeder information (F(2,30) = 6.68, *p* = 0.004), however the magnitude of the differences are small and the warming impacts are similar across all three breeders (Supplementary Figs. 12 and 13). We also consider whether heat effects differ across the spring, facultative, and winter wheat cultivars represented in the data using the same dummy variable approach. A two-sided joint test suggests a lack of statistical significance for these differences (F(2,30) = 2.20, *p* = 0.128), and the temperature and warming effects are similar across all three types (Supplementary Figs. 13 and 14).

Another robustness check interacts the temperature bin variable for exposures above 30 °C with a continuous variable for the year that each cultivar was publicly released. The in-sample release years span 1984–2012 and we again find a lack of statistical significance for the interaction with a two-tailed test (t(30) = 0.53, *p* = 0.471) coupled with similar temperature and warming effects (Supplementary Figs. 13 and 15).

The robustness of weather station data was tested by including all available weather stations within 200 km (regardless of missing data) for every wheat trial location using distance-weighting (1/distance^{2}) of the weather observations at the location-year-day level. This increased the number of field trial sites to 32 (some were dropped before because of missing weather data) and the number of unique weather stations to 107. The number of stations matched to a particular site ranged from 12 to 30. We then re-estimate the model using these alternative data and find that the temperature and warming effects are similar to the preferred model (Supplementary Figs. 16 and 17). It is also possible that this distance-weighted interpolation approach is overly simplistic, thereby introducing measurement error that can bias estimates. This type of error would likely affect precipitation more than temperature due to its more localized nature, so we replace our measure of rainfall with that of the gridded Climate Hazards group Infrared Precipitation with Stations (CHIRPS) dataset^{68}. We re-estimate the model and find that the temperature and warming effects are again similar to the preferred model (Supplementary Figs. 16 and 17).

Some studies have shown that wheat maturity occurs more quickly under heat stress^{62,69}. Thus, to test our assumption of a flowering-to-harvest time of 30 days at each location-year, we use this expanded weather station data and re-calculate the temperature bins for a shorter 20 day maturity period. We define the optimal maturity length by running separate regressions of log yield on the weather covariates for each location-year in the data. Each iteration produces two measures of R-squared, one for each of the two maturity lengths, and the higher one is used for that location-year. We find that 30 days is optimal for approximately 2/3 of the location-years (Supplementary Fig. 18). A regression of the improvement in R-squared from varying the maturity length on the occurrence of temperatures above 30 °C suggests that a one percent increase in heat occurrence only improves model fit by approximately 0.001 percent. In addition, we find that optimizing the maturity lengths by location-year produces similar temperature and warming effects as the preferred model (Supplementary Figs. 16 and 17).

Expanding the weather data also provides an opportunity to consider the potential effects of shifting planting dates. Producers may adapt to increasing heat stress by planting earlier to avoid critical periods of heat exposure. To test the implications of this adaptation, warming impacts were simulated based on the initial temperature impacts with different weather variables created by planting date shifts at 7 and 14 days earlier with fixed (days-to-flowering and days-to-harvest) season lengths. For +1 °C, shifting planting dates to 14 days earlier provides approximately one percent reduction in the warming impact on yields, while for +3 °C a 14 day earlier planting date may reduce impacts by about four percent (Supplementary Fig. 19).

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.