AbstractA data set comprising rainfall-runoff data gathered at 31 Agricultural Research Service experimental watersheds was used to explore curve number calibration. This exploration focused on the calibrated value and goodness-of-fit as a function of several items: calibration approach, precipitation event threshold, data ordering approach, and initial abstraction value. Calibration methods explored were least-squares, the National Engineering Handbook (NEH) median, and an asymptotic approach. Results were quantified for events exceeding two precipitation thresholds: 0 and 25.4 mm. Natural and frequency-matched data ordering methods were analyzed. Initial abstraction ratios of 0.05 and 0.20 were examined. Findings showed that the least-squares calibration approach applied directly to rainfall-runoff data performed best. Initial abstraction ratios clearly influenced the magnitude of the calibrated curve number. However, neither ratio outperformed the other in terms of goodness-of-fit of predicted runoff to observed runoff. Precipitation threshold experiments produced mixed results, with neither threshold level producing a clearly superior model fit. Frequency-matching was not considered to be a valid analysis approach, but was contrasted with naturally ordered results, indicating a bias toward producing calibrated curve numbers that were 5–10 points larger.IntroductionRainfall-runoff methods from the Natural Resources Conservation Service (NRCS) have served the hydrologic community for many decades. These methods estimate runoff from actual and synthetic rainfall for purposes ranging from flood forecasting to hydrologic design. The first step in using these methods is to determine the curve number (CN), which is a value between 1 and 100, for the modeled watershed. The CN modulates the conversion of rainfall volume into runoff volume, with larger CNs estimating greater runoff. Forested watersheds and other natural systems that allow considerable infiltration may have CNs as low as the 40s and 50s, whereas highly urbanized systems with impervious surfaces may have CNs in the 80s or 90s. Determination of the NRCS CN is often the first of several calculations required in rainfall-runoff modeling, with subsequent modeling seeking to quantify the timing of the runoff peak and/or the shape of the overall runoff hydrograph resulting from a rainfall event. If the CN poorly models the runoff volume, the subsequent modeling of the peak discharge will be poor.Although tables quantifying CNs as a function of land use and underlying soil type have long existed (SCS 1973, 1986), there is new interest in CN quantification, resulting primarily from efforts over the last decade to revisit the long-standing initial abstraction ratio, λ (Fu et al. 2011; Galbetti et al. 2022; Lal et al. 2017; Lim et al. 2006; Menberu et al. 2015; Yuan et al. 2014). The observation commonly made to support a smaller initial abstraction ratio is that the long-standing value of λ=0.20 underpredicts runoff for small precipitation events. The initial abstraction ratio and CN are not independent. A smaller (larger) initial abstraction ratio will result in a smaller (larger) calibrated CN value for a given rainfall-runoff data set.This paper examined multiple elements that collectively contribute to the calibration of a CN value from a set of rainfall-runoff observations. Decisions about what elements to employ and how to apply them influence the CN that is calibrated. Goodness-of-fit, bias, and defensibility of the methods used were analyzed. This paper discusses the core principles that should guide a recommended approach for future CN calibration efforts.BackgroundThe NRCS curve number method dates to 1954, and was developed and calibrated using observations of rainfall-runoff data gathered from small, primarily agricultural watersheds instrumented by the Agricultural Research Service (ARS).The central equation of the NRCS rainfall-runoff method is (1) Q=(P−λS)2P+(1−λ)S,if P>λSQ=0,otherwisewhere Q = runoff (mm); P = precipitation, i.e., rainfall depth, originally and commonly taken over a 24-h duration (mm); and S = storage (mm). The term λS represents the initial abstraction, which is the depth of rainfall captured by depression storage, leaf litter, and watershed surfaces. The parameter λ is the initial abstraction ratio. This fraction of the precipitation is not available for runoff. If P is less than λS, then Eq. (1) does not apply, and Q is zero. The long-standing value of λ is 0.20, although studies have proposed that λ is smaller, with a value of 0.05 the most commonly suggested alternative (D’Asaro et al. 2014; Galbetti et al. 2022; Hawkins et al. 2009; Lal et al. 2017). Proposals to formally change λ to 0.05 also include a redefinition of S (and its directly associated CN value), which partially compensates for the reduced value of λ (USDA 2017a, b).The value of CN is related directly to storage through (2) Values of CN were determined and tabulated (SCS 1973, 1986) by calibration methods that often were poorly documented (Hawkins et al. 2009). These tables present curve numbers as a function of land use and underlying soil type. With the advent of GIS, such tables commonly are used to infer pixel-scale CN values from commonly available sources of land use–land cover and soil maps.The NRCS method for estimating runoff has been criticized (e.g., Ogden et al. 2017). It is empirical, and generally is applied in a coarse or lumped approach to determine runoff. However, this method remains popular because of its ease of application, its ability to quantify effects caused by land-use change, and the fact that it satisfies the basic engineering need to estimate watershed response at ungauged locations for purposes of hydrologic and hydraulic design. Owing to its widespread application and ongoing use as a design tool, an examination of the elements selected when performing a CN calibration is valuable and needed.MethodsThis paper explored four elements that comprise the CN calibration for a specific set of observed rainfall and runoff data: calibration approach, precipitation event threshold, data ordering approach, and initial abstraction value. This paper demonstrated how these different elements influence the CN value that is calibrated and the goodness-of-fit between the observed and predicted runoff data set.Calibration ApproachThree calibration approaches were examined: least-squares optimization, the NEH median, and the asymptotic approach. The essential elements of each approach are described in this section.Least-Squares Calibration of Curve NumberA straightforward approach for calibration of CNs draws simply and directly on Eq. (1). An objective function, Z, is defined as (3) where Qobs,i = observed runoff from precipitation event i; and Qi = modeled runoff for this same event as determined from Eq. (1). The storage, S, in Eq. (1) is the source of variation that is being calibrated; S is varied, creating a modeled set of Qi for each event in a watershed. The objective function is a simple sum of squared errors determined from the analysis of n rainfall-runoff events occurring within the watershed. The value of S that minimizes Z represents the least-squares optimum value of storage. Fig. 1(a) plots the observed rainfall and runoff from a watershed in Hastings, Nebraska. The dotted lines in Fig. 1(a) show the individual errors between Eq. (1) and observed data. Eq. (2) was used to transform S into a value of CN that corresponded to the least-squares optimum curve number for that watershed. This approach hereafter is referred to as LS.The quality of the fit of the least-squares model is quantified by the relative standard error, RSE, defined as (4) where Eq. (4) applies when one predictor is present (precipitation depth, in this case). The quantity σ in the denominator is the standard deviation of the observed runoff depths for the watershed in question. If RSE is greater than 1, the model is poorer at estimating the runoff than the mean of the observations, and therefore the runoff model is not a useful prediction tool. A unique RSE value was produced for individual watershed calibration performed in this study. The distribution of RSE values for each experiment is examined and discussed subsequently.Median Value Approach for Curve Number CalibrationThe National Engineering Handbook (NRCS 2004) describes a graphical approach in which the analyst of a rainfall-runoff data set is directed to “[d]etermine the curve … that divides the plotted points into two equal groups. That is the median curve number.” Alternatively, the NEH-4 procedure: Median CN approach described by Mishra et al. (2007) proposes that for a set of observed rainfall and runoff data, a root-finding exercise is used to determine the value of S that satisfies Eq. (1) for each individual observation in the data set. A set of n estimates of S or CN is produced, assuming n events. The median of these values is used in Eq. (2) to estimate the median CN value for that watershed. This process is repeated for all watersheds.Fig. 2 shows the cumulative distribution functions (CDFs) for the set of CNs determined for the same P-Q data in Fig. 1(a). The value of CN determined for a unique P-Q pair is a function of λ, as indicated in Eq. (1). Fig. 2 shows CDFs for this data set for both λ=0.05 and 0.20. The median is the value at which the CDF equals 0.5. Fig. 2 also shows the 25th and 75th quantiles for both values of λ, providing an estimate of the variability in CN within the watershed for that assumed λ value. As with the least-squares approach, RSE is determined for each data set using Eq. (4). This approach hereafter is referred to as NEH median.Asymptotic Approach for Curve Number CalibrationA third approach for calibrating the watershed curve number takes advantage of an observed declining relationship in event-derived curve numbers as a function of rainfall depth (Hawkins 1993; Sneller 1985; Zevenbergen 1985). In this approach, the function (5) CN^i=CN∞+(100−CN∞)e−kpiis determined for all i observations in a watershed data set. Parameter pi is the precipitation associated with each observed CN value, CN∞ is the asymptotic value for CNi when pi is large, and k is a parameter that controls how quickly CNi reaches its CN∞ asymptote. Eq. (5) is fit to the set of observed CNs, minimizing the sum of squared errors, as was the case in Eq. (3).Fig. 3 plots the set of individual event CNs versus their causal precipitation for a watershed in Safford, Arizona. The fitted curve follows the functional form indicated in Eq. (5). The parameter CN∞=38.7 for the data in Fig. 3. The structure of this method inherently places more weight on the larger precipitation events in the data set. This approach hereafter is referred to as ASYMP. As with the LS and NEH median approaches, RSE is determined for each data set using Eq. (4).Consider the data in Fig. 3. The CN value for each data point was determined using Eqs. (1) and (2) and the known P-Q values for that point. Fig. 3, which shows CN versus P, presents CN (which was originally calculated as a function of P) versus P. Therefore, the CN is not an independent quantity. Using CN and P to fit the characteristic exponential decay structure of these graphs results in a phenomenon called spurious correlation (e.g., Benson 1965; Kenney 1982; Berges 1997; Shivers and Moglen 2008). Hawkins (1993) acknowledged that this method led to the presence of “some spurious correlation.” This calibration approach is presented in this study for completeness because of its widespread use, despite its flawed construction.Precipitation Event ThresholdHistorical rainfall-runoff data, stored as pairs of P and Q, and compiled by the ARS, exclusively were used in this analysis. A total of 31 watersheds from 11 locations across the US were used in this study. Table 1 summarizes these data, and Fig. 4 shows the locations of these watersheds. To be consistent with the original process associated with the development of the CN method, rainfall depths were aggregated into 24-h periods. Effects of antecedent rainfall were minimized by using only the first day’s rainfall and runoff when multiday events were encountered in the observational record. Additionally, days in which rainfall was reported without runoff and in which runoff was reported without rainfall were removed from the data sets. Very rarely, more runoff was reported than rainfall. These events also were deleted. More than 12,700 rainfall-runoff pairs were used, with a median period of record of 23 years and a minimum of 10 years of record. There was a median of 344 rainfall-runoff observations per site after the aforementioned data removals. Data used were derived from the data sets found in the ARS Water Database (Moglen 2017). The previously described process produced a unique rainfall-runoff data set for each of the 31 watersheds, and collectively amounted to a sizeable data set from which to derive general conclusions.Table 1. Summary of study watershed propertiesTable 1. Summary of study watershed propertiesSite and locationWatershed area (ha)Period of recordYearsNumber of observationsNumber of observations with P≥25.4 mmSafford, Arizona (1)2101939–19693111020Safford, Arizona (2)2761940–19693010421Safford, Arizona (3)3091939–1969319016Tifton, Georgia (1)33,4001971–19801029754Tifton, Georgia (2)1,5701970–19801144187Tifton, Georgia (3)1,5901968–198013552105Reynolds, Idaho (1)23,4001963–1981197854Reynolds, Idaho (2)3,1801968–1981144923Monticello, Illinois (1)33.21949–198133222124Monticello, Illinois (2)18.41949–198133344138Treynor, Iowa (1)60.71964–19862360290Treynor, Iowa (2)43.31964–19862386693Treynor, Iowa (3)33.51964–19862376296Hastings, Nebraska (1)1951940–19622329397Hastings, Nebraska (2)1661939–196729332110Hastings, Nebraska (3)8441938–196730293104Hastings, Nebraska (4)1,4101939–196729303111Albuquerque, New Mexico (1)99.61939–19693117517Albuquerque, New Mexico (2)16.21939–19693117616Albuquerque, New Mexico (3)62.71939–19693111013Coshocton, Ohio (1)0.5101939–199254214118Coshocton, Ohio (2)1.061938–199255531180Coshocton, Ohio (3)0.5141942–199251427147Riesel, Texas (1)7.971968–19811421085Riesel, Texas (2)1251968–19811437879Riesel, Texas (3)53.41968–19811427582Danville, Vermont (1)59.11961–19711154622Danville, Vermont (2)8361960–19792084460Blacksburg, Virginia (1)3611957–19721678139Blacksburg, Virginia (2)73.71958–19681149657Blacksburg, Virginia (3)5951958–19721573475There is considerable variability in watershed behavior, especially among smaller precipitation events. Typically, only a subset of the largest events is used in calibration. In this study, the overall set of observations and the subset that exceeded a precipitation threshold, PT, of 25.4 mm were examined. Fig. 5 shows the effect of this precipitation threshold for the data for Hastings, Nebraska (Fig. 1). The left-hand CDF of individually calibrated CNs in Fig. 5 corresponds to CNs derived from those rainfall events in which PT=25.4 mm. The right-hand CDF is for all (PT=0 mm) runoff-producing rainfall events, regardless of size, from this same data set. For the NEH median method, imposing a precipitation threshold produced a substantial difference in the CDFs of event CNs in this watershed; the calibrated CN for events exceeding 25.4 mm was smaller than the CN derived from the full set of rainfall-runoff events (Fig. 5).Data Set Ordering: Natural Order versus Frequency-MatchingFrequency-matching is an approach developed to reduce the inherent noise in rainfall-runoff observations. The reasoning for supporting this approach is the tacit assumption in hydrologic design that the n-year rainfall event will produce the n-year runoff event. Schaake et al. (1967) provided one of the first examples of applying this method, using it in the context of determining runoff coefficients for the rational method. Since then, the frequency-matching method has been employed in the context of CN calibration from observations (Ajmal and Kim 2015; D’Asaro et al. 2014; Galbetti et al. 2022; Hawkins 1993; Hawkins et al. 2009; Hjelmfelt 1980; Lal et al. 2017).Reactions to the frequency-matching method by individual hydrologists vary. Some researchers accept the method as a valid and useful approach to reduce variability or dispersion (Galbetti et al. 2022) in rainfall-runoff response, in essence amplifying the signal in the observational record. Other researchers are considerably skeptical about the validity of the approach, and express concerns about its growing application. This study sides with those expressing skepticism of this method. The act of performing frequency-matching removes causality between the rainfall and runoff observations. In the authors’ view, the scientific basis for this procedure is challenging to defend, although it certainly produces better model fit and more aesthetically appealing figures. Frequency-matching was used in this study only to demonstrate and quantify the effects of this approach relative to naturally ordered data. The implementation of frequency-matching here is not an endorsement of its application.As a first example of demonstrating the effects of frequency-matching, Fig. 1(a) shows the LS calibration of CN for naturally ordered P-Q observations from a watershed in Hastings, Nebraska. These naturally ordered data were sorted independently for P and for Q. These data then were joined back into P-Q pairs such that the largest Q value was associated with the largest P value, the second largest Q with the second largest P, and so on, resulting in a frequency-matched set of P-Q data. The LS approach was repeated for this revised data set. Two things immediately are apparent: (1) the scatter evident in Fig. 1(a) is considerably greater than the scatter in Fig. 1(b); and (2) the calibrated CN value increased from 65.6 to 69.2. Both features are typical consequences of the frequency-matching approach.This study used the 31 study watershed data sets to explore the frequency-matching approach in its specific application to the NRCS rainfall-runoff method and curve number calibration. Results obtained from frequency-matched data were contrasted with those resulting from naturally ordered data to examine bias in calibrated CN values and to demonstrate implications for goodness-of-fit.Initial Abstraction RatioAs described previously, the long-standing value of the initial abstraction parameter, λ=0.20 [Eq. (1)], has come into question. This study examined two values: λ=0.20 and 0.05. This study focused on the impact of this parameter on the calibrated CN value and associated goodness-of-fit statistics.Fig. 2 shows the effect of λ for the data for Hastings, Nebraska (Fig. 1). The left-hand CDF of individually calibrated CNs in Fig. 2 corresponds to CN values derived from λ=0.05. The right-hand CDF is for λ=0.20. As stated previously, different values of λ produced a sizeable difference in the CDFs of CNs in this watershed, with the NEH median calibrated CN for λ=0.05 is considerably smaller than the NEH median calibrated CN for λ=0.20. Regardless of calibration approach, CN values determined for λ=0.05 always were smaller than the corresponding CN value determined for λ=0.20.FindingsA total of 12 (3×2×2) experiments were performed in this study, defined by all permutations of three calibration approaches, two precipitation thresholds, and two initial abstraction fractions (Table 2). Furthermore, these 12 experiments were repeated for both naturally ordered and frequency-matched data. Within an experiment, individual calibrations were performed for each of the 31 study watersheds. For all naturally ordered data experiments, Figs. 6 and 7 present box-and-whisker plots of the distribution of calibrated CNs and RSE values, respectively. Figs. 8 and 9 provide box-and-whisker plots of calibrated CNs and RSE values for the frequency-matched data sets for the same watersheds and experiments. Finally, Figs. 10 and 11 show side-by-side box-and-whisker plots for direct comparison of the LS calibration findings for CNs and RSE, respectively, for naturally ordered and frequency-matched data. The box part of these plots shows the first and third quartiles, with the median represented by the horizontal line within the box. Outliers, defined as being more than 1.5 times the interquartile range, are graphed individually. The whisker part of these plots shows the extremes of the observations that were not determined to be outliers. In Figs. 7 and 9, the box plots for RSE>2 are truncated to show more-important detail for RSE<2.Table 2. Elements of calibration experiments performed in this studyTable 2. Elements of calibration experiments performed in this studyCalibration approachPrecipitation threshold, PT (mm)Initial abstraction fraction, λData ordering approachLeast-squares, NEH median, and asymptotic0 and 25.40.05 and 0.20Naturally ordered and frequency-matchedThe major findings deriving from the naturally ordered data experiments are summarized in Table 3. The largest CNs were calibrated from the NEH median method, and the smallest CNs were calibrated from the ASYMP method. Similarly, the largest RSE values resulted from the NEH median method and the smallest RSE values resulted from the ASYMP method. Although the LS method ranked in the middle for both measures, it produced both CN and RSE values that are much more comparable in magnitude to those of the ASYMP method than to those of the NEH median method. If the quality of a calibration method is determined from the magnitude of the RSE values it produces, the ASYMP method performed best and the NEH median method performed the worst. If robustness of a calibration method is determined from the outlier results it produces, the LS method was the most robust, and the NEH median method was the least robust. Again, whether quality or robustness is the measure, the ASYMP and LS methods were very comparable, and the NEH median method performed much more poorly.Table 3. Summary of calibration experiment findingsTable 3. Summary of calibration experiment findingsCalibration methodCalibrated curve numbersaGoodness of fitaLeast-squaresλ: CN(0.2)>CN(0.05)λ: RSE(0.2)>RSE(0.05)PT: little effectPT: RSE(25.4)>RSE(0)Method is robust, no outliers calibratedMethod is robust, no outliers calibratedSome individual calibrations have RSE>1NEH medianCalibrated CNs are largest of the three methodsLargest RSE values of the three methodsλ: CN(0.2)>CN(0.05)λ:RSE(0.2)>RSE(0.05)PT: CN(0)>CN(25.4)PT: RSE(0)>RSE(25.4)Method produces outliersMethod produces outliersMany individual calibrationshave RSE>1AsymptoticCalibrated CNs are smallest of the three methodsSmallest RSE values of the three methodsλ: CN(0.2)>CN(0.05)λ: little effectPT: CN(0)>CN(25.4)PT: RSE(25.4)>RSE(0)Method is mostly robust, with a few outliersMethod is mostly robust, with a few outliersA few individual calibrations have RSE>1A universal result was that the calibrated CNs were larger for λ=0.20 than for λ=0.05. This was true regardless of calibration method or precipitation threshold. The effect of precipitation threshold varied in strength as a function of calibration method: for LS, PT essentially had no effect; for the NEH median, PT had a strong effect, with larger CNs resulting from PT=0 mm; for ASYMP, PT had a modest impact, with larger CNs resulting from PT=0 mm. Concerning RSE, λ caused modestly larger RSE values for the LS and NEH median methods, but essentially had no effect for the ASYMP method. Precipitation threshold produced mixed results leading to modestly larger RSE for PT=25.4 mm than for PT=0 mm for both the LS and ASYMP methods. However, the opposite was observed for the NEH median method, for which PT produced much larger RSE values for PT=0 mm than for PT=25.4 mm (Fig. 5).An additional point regarding Fig. 7 is merited. Hawkins et al. (2009) identified three different potential behaviors of the ASYMP fitting method: standard, violent, and complacent. The analyses presented in the present paper did not censor watersheds according to these behaviors, although Hawkins et al. (2009) expressly described complacent watersheds as not being “suitable for CN definition.” A rigorous definition of complacent watershed behavior could not be located, but seven watersheds in the data set were identified that did not reach their respective CN∞ values over the range of observed rainfall values. A reduced 24-watershed data set was created that removed these watersheds and repeated all analyses. A uniform reduction of the RSE values in all distributions in Fig. 7 was observed—indicating that the relative standard error for the 24-watershed data set improved by about 0.1 compared with that of the overall 31-watershed data set. All other changes to the reported findings were small and were not specific to any of the four elements of the calibration processes that were investigated. That the changes in RSE were observed across all experiment permutations indicates that the reported findings with regards to calibration methods, λ values, and precipitation thresholds were insensitive to the absence or presence of the seven censored watersheds. The remainder of this paper presents findings for the full 31-watershed data set.The frequency-matched data analyses summarized in Figs. 8 and 9 mirror the preceding statements for the naturally ordered data and the findings summarized in Table 3. Across calibration methods, precipitation thresholds, and λ values, the calibrated CN values resulting from frequency-matched observations consistently were 5–10 points larger, depending on the experiment.Notable findings followed from comparing like experiments for naturally ordered versus frequency-matched data. Figs. 10 and 11 provide a visual, side-by-side comparison of the LS experiment outcomes. Focusing on the data marked (a) (PT=0. mm, λ=0.05) in Fig. 10, the median CN increased from 56 for the naturally ordered results to 61 for the frequency-matched results. The interquartile ranges similarly shifted from 45

Source link