# Development and validation of the CHIRTS-daily quasi-global high-resolution daily temperature data set

Sep 14, 2020

### Motivation

Assessing weather-related hazards in a changing climate requires data sets that have accurate high-resolution spatial mean fields, good performance in data-sparse regions, and limited sources of non-stationary errors. Many scientists are interested in the impacts associated with a degree or two of warming. But non-stationary errors—errors that create time-varying biases in data sets—can easily be as large or larger than this climate change signal. Spatial fidelity also matters. High-resolution mean fields are important because impacts to health, agriculture, and other sectors are always local and typically non-linear. Impacts on humans or crops will be related to extremes in specific locations, and these impacts will often be strongly related to the variations in the absolute value of the weather variable under consideration. A 1- or 2-degree change in mean temperature, for example, can dramatically alter the number of days exceeding some specific temperature threshold. This spatial accuracy is important for monitoring extreme events on a year-to-year basis, and for assessing the impacts of climate change. As demonstrated below, data sets that are too cool may also underestimate changes in the frequency of heat waves.

Another important consideration is performance in areas with low densities of publicly available weather stations. Climate hazards typically convolve weather-related shocks, human exposure, and human vulnerability. The most vulnerable populations, and those with the most rapidly expanding populations and exposure, are often in areas (like Africa, Central America, and parts of Asia) with few available in situ weather observations.

A third consideration is consistency and stationarity, both of which can be important in data-sparse areas, where the spatial location and density of available in situ observations changes over time, and typically declines. When weather station observations are blended with spatially explicit background fields, non-stationary systematic errors can arise, either through changes in the observational network and/or discontinuities in the spatially explicit background fields. Accurate high-resolution mean fields can reduce homogeneities arising from shifts in observational networks. When the spatially explicit background fields track closely with the in situ observations, disruptions associated with changing networks are minimized. Spurious systematic non-stationary errors in the background fields, however, can create large incorrect changes.

In general, there are two main approaches to overcoming these limitations: the creation of tailored data sets that combine satellite proxy information with weather station observations, or alternately, the use of ready-made modern reanalysis systems.

Drought early warning systems, especially those focused on data-sparse regions, must grapple with these issues. While the accurate and early identification of drought conditions can trigger mitigation activities that save lives and livelihoods, consistent, accurate, high resolution data sets are required to make such assessments. For 20 years, scientists at the University of California, Santa Barbara’s (UCSB) Climate Hazards Center (CHC) have focused on developing high-quality precipitation estimates suitable for supporting famine early warning, and crop and hydrologic modeling in data-sparse regions. The result of this work, the Climate Hazards center InfraRed Precipitation with Stations (CHIRPS5) is now one of the most widely used products for global drought monitoring. CHIRPS has been adopted by the World Food Program, the US Agency for International Development’s (USAID) Famine Early Warning Systems Network (FEWS NET, www.fews.net), the European Union, the Food and Agricultural Organization (FAO), and a host of regional and national agencies. The spatial resolution, accuracy, and consistency of CHIRPS also make it widely useful for applications such as crop insurance and climate change studies. In a typical month, more than 700 unique users download more than 100 gigabytes of CHIRPS data. The CHIRPS data set is hosted at the CHC, whose large computational capacity arises from ongoing support for and by FEWS NET. Every month, CHIRPS, along with many other valuable environmental data sets, helps FEWS NET guide billions of humanitarian assistance dollars to millions of extremely food-insecure people.

At present, there is a dearth of accurate information supporting the monitoring and evaluation of extreme temperatures in many food-insecure regions. Such extremes can wilt crops or decimate livestock herds, setting the stage for famine. Yet our ability to track these extremes in countries without weather station observations remains limited.

To address this limitation, the CHC has developed a modeling philosophy based on a geostatistical framework that decomposes environmental variables into static mean fields and time-varying anomaly fields. There are two stages to this process: a monthly Tmax algorithm, described in the CHIRTSmax manuscript1 and illustrated in Fig. 1a, and a daily disaggregation procedure, described in Fig. 1b and in the following sections. In the CHC’s approach, great attention is given to building the high-resolution (0.05° × 0.05°) mean fields. The CHC’s method for this (Moving Window Regression, or MWR) builds localized regression models using large sets of in situ climate normals. Complicated statistical modeling is supported by the fact that there are typically many more available stations to estimate the long-term average conditions than there are to represent variations on a given day or month. The CHC’s MWR process also makes unique use of high-resolution satellite mean fields as predictors. The MWR approach and satellite-means allow the CHC climatologies to perform well in data-sparse regions, and even in regions with complex topography.

Within the CHC’s approach, temporal variations are represented by combinations of in situ observations and geostationary satellite-based thermal infrared (TIR) observations. The monthly CHIRTSmax product uses a unique cloud-screening process to produce accurate global 2-meter Tmax anomalies. This accuracy arises through a maximum-compositing process similar to that used in developing gridded vegetation index data sets, such as the Normalized Difference Vegetation Index. Atmospheric water vapor can cause spurious declines in greenness indices. In the case of surface temperatures, partial cloud cover can reduce the temperatures observed by a satellite. In both cases the signature of the contamination is known to suppress the signal. Taking maximum composites over a period of time, therefore, can be used to minimize contamination. For the CHIRTSmax, this process provides a robust global and very high resolution (0.05°) set of monthly TIR-based temperature anomalies. These anomalies, when combined with the CHC’s high-resolution climatology, provide an accurate and consistent source of estimates, even when there are no nearby weather stations.

Unfortunately, the CHC’s maximum compositing approach cannot work on daily data, because at daily time steps it is difficult to distinguish TIR signal contributions from the land surface and clouds with measurements from only the 11 μm band provided by the GridSat6 data set—hence the need for a different approach to disaggregation (Fig. 1b). For this, the CHC uses modern reanalyses. Modern reanalyses use atmospheric models and assimilation schemes to merge vast quantities of information to produce physically based syntheses that provide a complete description of the land and atmosphere. For example, ERA5 uses a four-dimensional assimilation scheme to assimilate satellite radiances from 25 infrared and microwave sources and satellite scatterometer data from four sources. This rich set of data sources provides valuable information about land surface temperatures, soil moisture, atmospheric water vapor, atmospheric air temperatures, precipitation, clouds, and atmospheric circulation anomalies.

This rich set of information, and all the benefits accruing from physically modeling the Earth’s systems, provides an excellent source of information about diurnal temperature variations. At the same time, it should be recognized that the inclusion of these multiple data sources also creates a stream of input data that is heterogeneous in time. For example, many infrared and microwave-based sounders and profilers only appear late in the data record, typically arriving in the late 1990s or early 2000s. Even relatively consistent imagery coming from geostationary satellites can be substantially influenced by inter-satellite calibration issues or orbital changes in any given satellite. Reanalyses that ingest station data, furthermore, face threats related to large shifts in the station data that go into these reanalyses. Both changes in the satellite systems and observation networks can alter the local energy and water budget, potentially introducing spurious random errors.

The four sources of information used in the CHIRTSmax contribute in different ways (Fig. 1a,b). The spatial mean fields provide local context. The carefully validated and curated monthly station and TIR temperature anomalies provide a consistent source of climate information, carefully constructed to reduce potential non-stationary systematic errors. The Berkeley Earth organization (www.berkeleyearth.org) was founded in 2012 to collect, quality control, and analyze an integrated set of global air temperature observations. Details on this data set and methods can be found at http://berkeleyearth.org/methodology. Finally, ERA5 reanalysis information is used to disaggregate within a specific month, and greatly reduces any possible issues associated with changes in reanalysis inputs.

### CHIRTSmax

The monthly CHIRTSmax product is the foundational data set from which the CHIRTS-daily products are developed. CHIRTSmax combines three components: a high-resolution (0.05° × 0.05°) climatology, interpolated in situ temperature anomaly fields, and remotely sensed infrared land surface emissions anomalies based on GridSat6 B1 Thermal Infrared geostationary weather satellite observations. Complete details can be found in the CHIRTSmax manuscript1, though a brief description is provided in this subsection and Fig. 1a for completeness.

There are three components that are combined to create the CHIRTSmax:

1. (1)

CHTclim, a high-resolution (0.05° × 0.05°) monthly maximum temperature (Tmax) climatology developed using Moving Window Regression5 with FAO station normals, ERA5 long-term average 2-meter temperatures, latitude, longitude, and elevation as predictors.

2. (2)

CHIRTmax, a high-resolution (0.05° × 0.05°) monthly time series of satellite-based Tmax anomalies.

3. (3)

CHTSmax, a high-resolution (0.05° × 0.05°) monthly time series of interpolated monthly Tmax anomalies based on Berkeley Earth (http://www.berkeleyearth.org) and Global Telecommunication System Tmax air observations.

Let (bar{C}) denote the long-term average (CHTclim). Let I′ and S′ denote, respectively, the CHIRTmax and CHTSmax anomalies from their individual long-term means. Then, the final CHIRTSmax estimate T is a weighted linear combination of these three components, as follows:

$$T=bar{C}+alpha I{prime} +beta S{prime}$$

where α and β are weights that sum to 1 and are derived using the expected variance explained by the CHTSmax and CHIRTmax estimates. The variance explained by the CHTSmax component is based on an empirical covariogram and the distance to the closest station. The variance explained by the CHIRTmax component is assumed to be 0.25. The weights α and β are proportional to these variance values. The final CHIRTSmax estimate, therefore, is an adjusted version of the climatology (CHTclim). The adjustment is based upon a weighted combination of satellite-derived estimates of Tmax anomalies (CHIRTmax) and interpolation-based estimates of station-observed Tmax anomalies (CHTSmax). In data-sparse regions, the satellite-derived anomalies will receive greater weight than their interpolation-based counterparts. Conversely, in regions with high station density, the interpolation-based anomalies will receive the greater weight, effectively leveraging the strengths of both data sources.

### Downscaling ERA5

Daily temperatures from the ERA5 are critical to developing the CHIRTS-daily products, as they define the relative evolution of daily temperatures within a given month. The most apparent limitation to using the ERA5 simulations in tandem with CHIRTSmax is the difference in spatial scales between the data products. The spatial scale of CHIRTSmax is approximately 5 km by 5 km (0.05° × 0.05°), while that of ERA5 is approximately 25 km by 25 km (0.25° × 0.25°). To bridge this gap and facilitate the merging of the two data products, the ERA5 simulations are downscaled using bilinear interpolation in the Interactive Data Language (IDL7) using the CONGRID command. Maximum and minimum temperatures for each day are treated independently in the downscaling procedure. Additionally, there is no explicit treatment of day-to-day temporal dependence. We assume that the inherent temporal autocorrelation is captured by the ERA5 simulations and is preserved in the downscaling routine. We also assume the dependence between maximum and minimum temperatures on a given day is preserved in the downscaling process. Ultimately, the decision to use the ERA5 simulations to disaggregate the monthly CHIRTSmax to daily scale is motivated by the latency of the product. Our goal is to provide updates to the CHIRTS-daily product with minimal delay. Collaboration with partners at the National Oceanic and Atmospheric Administration (NOAA) should ensure timely updates to the monthly CHIRTSmax product. These updated CHIRTSmax products will then be disaggregated with ERA5, providing a much-needed source of information that can be used to monitor extreme temperature conditions. These conditions can have dire impacts on human and livestock health and crops, while also setting the stage for potentially extensive wildfires.

### CHIRTS-daily

To produce the CHIRTS-daily Tmax (CHIRTSX) values, the downscaled ERA5 Tmax are first translated into anomalies from the monthly ERA5 Tmax average. These daily Tmax anomalies are then added to each month’s CHIRTSmax value. The resulting CHIRTSX product thus varies on monthly timescales with the CHIRTSmax while tracking the day-to-day variations of the ERA5 reanalysis. The ERA5 Tmax and Tmin are then used to determine the daily diurnal temperature range (DTR) at each 0.05° pixel. DTR is then used to produce CHIRTS-daily Tmin (CHIRTSN) by subtracting the DTR from CHIRTSX.

The steps taken to produce the CHIRTS-daily temperature fields are summarized as follows.

1. 1.

Compute the DTR using the downscaled ERA5 fields:

1. a.

(DT{R}_{t}=ERA{5}_{{X}_{t}}-ERA{5}_{{N}_{t}}) for (t=1,ldots ,T)

1. 2.

Convert the downscaled ERA5 Tmax to anomalies:

1. a.

(ERA{5}_{{X}_{t}}^{m,anom}=ERA{5}_{{X}_{t}}^{m}-ERA{5}_{X}^{m}) for (t=1,ldots ,T) and (m=1,ldots ,M)

1. 3.

Apply the results of Step 2 to the monthly CHIRTSmax values to produce CHIRTSX:

1. a.

(CHIRT{S}_{{X}_{t}}^{m}=CHIRT{S}_{max}^{m}+ERA{5}_{{X}_{t}}^{m,anom}) for (t=1,ldots ,T) and (m=1,ldots ,M)

1. 4.

Apply the results of Step 1 to CHIRTSX to produce CHIRTSN:

1. a.

(CHIRT{S}_{{N}_{t}}=CHIRT{S}_{{X}_{t}}-DT{R}_{t}) for (t=1,ldots ,T)

In the above equations, T represents the total number of days in the CHIRTS-daily data record, where (1, 2, 3, …, T-2, T-1, T) represents (Jan 1 1983, Jan 2 1983, Jan 3 1983, …, Dec 29 2016, Dec 30 2016, Dec 31 2016); M represents the total number of months in the CHIRTSmax data record, where (1, 2, 3, …, M-2, M-1, M) represents (Jan 1983, Feb 1983, Mar 1983, … Oct 2016, Nov 2016, Dec 2016).

### Ancillary data fields and Heat Index calculation

As a convenience to end users, several ancillary daily variables have been derived from the ERA5 archive and provided at a downscaled 0.05° resolution matching that of CHIRTS-daily. The downscaling procedure used to produce these ancillary data fields is the same used to downscale the ERA5 temperature fields (i.e., CONGRID command in IDL). These data have not received additional validation but are provided to facilitate research for the end users. The ERA5 data set, on which these fields are based, has been widely used and validated. Hourly temperature and dew point temperature from the ERA5 data set are used to estimate relative humidity (RH). These derived RH values, along with CHIRTS-daily, are used to calculate the Heat Index (HI) using the series of equations and rules provided by the National Weather Service (NWS).

Relative humidity (RH) is estimated from temperature and dew point temperature using the Magnus equation8, as follows:

$$RHcong 100ast expleft(cast bast frac{left(TD-Tright)}{left(c+Tright)ast left(c+TDright)}right)$$

where the constants are defined as b = 17.625 and c = 243.048.

The HI is calculated using the equations provided by the NWS. The main HI equation is a refinement of the multiple regression analysis from a 1990 NWS Technical Attachment (SR 90-23). This regression modeled a set of estimated “apparent temperatures” based on a model of human biophysical thermal temperature equilibria9. The following set of equations, referred to as the Rothfusz regression, is abstracted from the NWS website (https://www.wpc.ncep.noaa.gov/html/heatindex_equation.shtml):

$$begin{array}{lll}HI & = & -42.379+2.04901523ast T+10.14333127ast RH-0.22475541ast Tast RH\ & & -0.00683783ast {T}^{2}-0.05481717ast R{H}^{2}+0.00122847ast {T}^{2}ast RH\ & & +0.00085282ast Tast R{H}^{2}-0.00000199ast {T}^{2}ast R{H}^{2}end{array}$$

where T is temperature in degrees Fahrenheith (F) and RH is relative humidity in percent. HI is the heat index expressed as an apparent temperature in degrees F. If the RH is less than 13% and the temperature is between 80 and 112 degrees F, then the following adjustment is subtracted from HI:

$$AD{J}_{1}=frac{13-RH}{4}ast sqrt{frac{17-{rm{abs}}left(T-95right)}{17}}$$

If the RH is greater than 85% and the temperature is between 80 and 87 degrees F, then the following adjustment is added to HI:

$$AD{J}_{2}=frac{RH-85}{10}ast frac{87-T}{5}$$

The Rothfusz regression is not appropriate when conditions of temperature and humidity warrant a heat index value below 80 degrees F. In those cases, we mask and flag the cells where these conditions occurred.