# Mapping twenty years of corn and soybean across the US Midwest using the Landsat archive

Sep 15, 2020

### Pixel-level crop type labels

Creating a crop type map using satellite imagery and supervised machine learning requires field-level ground truth of crop types on which to train a classifier. For CDL, ground truth labels come from the Farm Service Agency’s (FSA) Common Land Unit dataset, which is available to NASS but not to the public due to FSA confidentiality laws34. However, the user’s and producer’s accuracy of CDL on FSA labels are displayed in the CDL metadata, and generally exceed 95% for corn and soy from 2008 to 2018. Since we do not have access to FSA data, we used CDL 2008–2018 as our ground truth labels to train our corn and soybeans classifier, and validated our maps using a combination of a CDL 2008–2018 hold-out set, CDL 1999–2007 where it exists, NASS county-level crop acreage estimates, and crop rotation data from ARMS. The year 2008 was chosen to mark the beginning of our training set, because CDL is complete across the conterminous US (and therefore our study region) beginning in that year. The quality of our classifier and validation analyses thus depends on the quality of CDL, NASS, and ARMS data, which we discuss in further detail in the Technical Validation section.

Our crop classifier was trained to distinguish between corn, soybeans, and all other crops grouped into a third class; we did not classify cropland from non-cropland. Instead, we used the National Land Cover Database (NLCD) product35,36 to mask out non-cropland pixels. NLCD classifies a variety of land cover types — for example, water, developed, and forest — including a single “cultivated crops” class that amalgamates all crop types. The rasters are available at 30 m ground resolution for the years 1992, 2001, 2004, 2006, 2008, 2011, 2013, and 2016. Each year’s CSDL was masked by the last available NLCD product. For example, the 2008–2010 CSDLs are produced with the 2008 NLCD cropland mask. An exception is that the 1999 and 2000 CSDLs use the 2001 NLCD mask, because the previous NLCD product from 1992 was constructed with very different methods.

### Crop acreage statistics

Every year at the end of the harvest season, NASS conducts the County Agricultural Production Survey (CAPS) in cooperation with individual states to estimate acreage and production of selected crops and livestock species at the county level. Each state has its own CAPS sampling strategy, usually involving mail surveys and follow-ups as needed to obtain adequate coverage and response rates37. Responses from operators are used to allocate state totals previously obtained from NASS surveys to counties.

We acquired data on the planted area of corn and soybeans at a county level from the NASS Quick Stats database38 across the 13 states, and used these county-level acreage estimates as a form of validation on our maps from 1999–2018. Noteworthy for our validation is that, starting in 2009, the CDL program began to be incorporated into the NASS crop acreage estimates through a regression model4, so CDL and NASS acreages are not entirely independent datasets. We observed that the number of counties reported in NASS in these 13 states dropped from 1093 in 1999 to 885 in 2018.

### Crop rotation statistics

The Agriculture Resource Management Survey (ARMS), co-sponsored by the USDA’s Economic Research Service and NASS, is a multi-phase series of interviews with farm operators about cropping practices, farm business, and farm households39. Within our study period, it was collected in 2001, 2005, and 2010. We used the survey’s crop rotation statistics for the purpose of validating our corn-soybean maps. In each of these three years and for each of the 13 states, we obtained (1) the number of acres that grew corn in the survey year and soybeans in the previous year (a soybean-corn rotation) and (2) the number of acres that underwent a corn-corn rotation. Data were downloaded from the ARMS Tailored Report on Crop Production Practices40. We compared the fraction of soybean-corn rotation, defined as (frac{{rm{soybean}},{rm{to}},{rm{corn}},{rm{acreage}}}{{rm{soybean}},{rm{to}},{rm{corn}},{rm{acreage}}+{rm{corn}},{rm{to}},{rm{corn}},{rm{acreage}}}), predicted by CSDL against that reported by ARMS.

### Multi-temporal satellite imagery

To perform crop type classification back to 1999, we used annual multi-temporal satellite imagery from the Landsat archive. The Landsat Program is a series of Earth-observing satellites jointly managed by the USGS and NASA, beginning with Landsat-1 in 1972 and continuing with Landsat-7 and -8 in the present day. Its archive was made freely available in 200825, and each satellite offers moderate spatial resolution (30 m) imagery taken every 16 days (every 8 days when two satellites are operating).

We used Google Earth Engine to obtain imagery over our study region in the period 1999–2018 from Landsat 5, 7, and 8 Surface Reflectance Tier 1 collections27. We chose to start in 1999 because Landsat 7 was launched that year. The study region spans 2.2 million km2, which corresponds to 2.5 billion Landsat pixels and a total of 87,648 images from January 1, 1999 to December 31, 2018. We used Landsat imagery from January 1 to December 31 to capture crop phenology; in the Midwest, this time window encompasses a single growing season for most crop types. Clouds and other occlusions were masked out at the pixel level using the pixel_qa band that is provided with the Landsat Surface Reflectance products. The median cloud-free image count at a pixel during key crop growing months June–August was 7, with 2012 a notably lower year with a median of 4 (Fig. 2). The dip in 2012 occurred because Landsat 5 Thematic Mapper ended operations in November 2011 and Landsat 8 was not launched until 2013, combined with data gaps from the failed scan-line corrector aboard Landsat 7.

Although the Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI instruments measure slightly different wavelengths due to differing radiometric resolutions, we consider them to share six surface reflectance bands: blue, green, red, near infrared, shortwave infrared 1, and shortwave infrared 241. From these, we derived the green chlorophyll vegetation index (GCVI)42,

$${rm{GCVI}}={rm{NIR}}/{rm{Green}}-1$$

(1)

Unlike NDVI, GCVI does not saturate at high values of leaf area and has previously been shown to aid in distinguishing corn from soybeans43. We ultimately included NIR, SWIR1, SWIR2, and GCVI features in our classifier; for an explanation of how these were selected for corn-soybean classification among various bands and vegetation indices, please see ref. 43.

### Feature extraction

We summarized the Landsat time series at each pixel using a Fourier transform, or harmonic regression (Fig. 3). Harmonic analysis has previously been shown to successfully capture the different phenology and spectral reflectance across vegetation44,45 and crop types46,47, allowing corn, soybean, and other crops to be distinguished from each other in the US Midwest43. Each spectral band or vegetation index was viewed as a time-dependent function f(t), so that the harmonic regression takes the form

$$f(t)=c+mathop{sum }limits_{k=1}^{n}left[{a}_{k},cos (2pi komega t)+{b}_{k},sin (2pi komega t)right]$$

where ak are cosine coefficients, bk are sine coefficients, c is the intercept term, n is the order of the harmonic series, and ω controls the period of the function. The independent variable t represents the day of year that an image is taken expressed as a fraction between 0 (January 1) and 1 (next January 1). We used a second order harmonic (n = 2) with ω = 1.5, shown in previous work to result in good features for crop type classification in the study area43. This yields a total of 5 features per band or VI (Table 2).

Our feature extraction procedure therefore requires at least 5 points to fit a second-order harmonic regression and obtain coefficients. The fit will be a better summary of crop phenology if more non-cloudy Landsat images are taken in the growing season.

### Ancillary features

Satellite observations of vegetation are a function of crop type along with plant health and stage of development, which are themselves functions of the amount of sunlight, degree days, water, and nutrients available for growth. We hypothesized that, given measurements of plant phenology via Landsat imagery, observations of these other variables might establish constraints that facilitate the deduction of crop type. To see what we mean, consider a simplified example: suppose we know that a GCVI time series with a max value of 0.7 could be either corn grown under a large number of growing degree days (GDD) or soybean grown under a moderate number of GDD. Then knowing the value of GDD would help us deduce the crop type. We added features that capture some of these weather and climate variables to our classifier and tested to see whether they aid in crop type classification.

We used the University of Idaho’s Gridded Surface Meteorological Dataset (gridMET)48 and TerraClimate49 to find the growing degree days (GDD), vapor pressure deficit (VPD), mean precipitation, temperature extrema, aridity, soil moisture, and climate water deficit at each pixel in the study region (Table 2). We found that these features did not significantly improve corn and soybean classification in the study region, so our final map is created using only the Landsat-derived harmonic coefficients. However, we note that these ancillary features may still help in a different setting, such as one where weather varies more dramatically across space and time.

### Training set sampling

To sample a set of points that is geographically representative of our study area, we created a 50km-by-50km grid over the 13 states and sampled 250 points uniformly at random from each grid cell. The study area was covered by 839 grid cells; after filtering for points falling inside the boundaries of the 13 states, we were left with a set of 205,821 points. For every year between 1999 and 2018, we used Google Earth Engine to extract (1) the coefficients from harmonic regression fit to the annual Landsat time series at each point and (2) the CDL label at each point if it existed for that year. We removed points from this dataset whose CDL label was nonexistent or a non-crop class, leaving us with 841,028 samples.

A summary of the number of samples for each crop type can be found in Table 3. Across the 13 states, corn is the most numerous class with 329,375 samples, or 39.2% of the total. Soybeans and other crops have 281,411 (33.5%) and 230,215 (27.4%) samples, respectively. The distribution of the three classes varies by state; Iowa, Illinois, Indiana, and Nebraska grew mostly corn and soybean and very little other crop, while North Dakota and Kansas grew mostly other crops.

Eighty percent of the samples from 2008–2018 were placed in the training set, while the remaining 20% of 2008–2018 and the labeled samples from 1999–2007 were placed in the test set.

### Classification algorithm

We used random forests for classification, as they are well-documented in the field of remote sensing to perform well on land cover and crop type tasks50,51, and usually better than maximum likelihood classifiers, support vector machines, and other methods for crop type mapping52,53,54,55. Random forests are an ensemble machine learning method comprised of many decision trees in aggregate56, and offer ease of use, high performance, and interpretability at the same time. Throughout method development, we used Python’s scikit-learn implementation of a random forest classifier with 500 trees and otherwise default parameters. To create the Corn Soy Data Layer product, we used Google Earth Engine’s ee.Classifier.randomForest, similarly with 500 trees and otherwise default parameters. Figure 4 shows an example decision boundary learned by a random forest to classify corn, soybean, and other crops in Iowa, illustrating the possibility of distinguishing among the three classes.

### Final map creation

To create the Corn-Soy Data Layer (CSDL)57, we first used Google Earth Engine to compute the harmonic features for Landsat NIR, SWIR1, SWIR2, and GCVI across the entire study region for each year in 1999–2018. We then used our training samples from 2008–2018 to train a random forest classifier (ee.Classifier.randomForest) on harmonic features to predict crop type labels derived from CDL (corn, soybean, or other). A separate model was trained for each state. Lastly, we applied each classifier to predict crop type on all pixels in the relevant state from 1999–2018.

### Evaluation metrics

In evaluating our CSDL product against CDL, NASS, and ARMS data, we used the following metrics.

First, CSDL and CDL were compared in each state and year using the overall accuracy metric. Accuracy is a commonly used metric for classification and is defined as the fraction of correct predictions. While CDL itself is not a perfectly accurate ground truth of crop type, the accuracy metric computed between CSDL and CDL tells us how closely the two datasets agree. Note that an interpretation of accuracy must take into account the distribution of crop types in an area, as a more skewed distribution biases accuracy upward. To aid interpretation, crop type distributions by state are provided in Table 3.

Second, we computed the user’s accuracy and producer’s accuracy for each crop type in each state and year, using CDL again as “ground truth” for CSDL. If we let TPc stand for the number of true positives for crop type c, FPc the number of false positives, and FNc the number of false negatives, then the user’s and producer’s accuracies for crop type c are defined as

$$begin{array}{ccc}{rm{usermbox{‘}s}},{{rm{accuracy}}}_{c} & = & frac{{{rm{TP}}}_{c}}{{{rm{TP}}}_{c}+{{rm{FP}}}_{c}}\ {rm{producermbox{‘}s}},{{rm{accuracy}}}_{c} & = & frac{{{rm{TP}}}_{c}}{{{rm{TP}}}_{c}+{{rm{FN}}}_{c}}end{array}$$

User’s accuracy is also known as precision, and producer’s accuracy is also known as recall. We have included these metrics with the CSDL map product to aid users in their assessment of how to use CSDL for their applications, and for congruency with the CDL product, which reports user’s and producer’s accuracy in each state’s metadata.

Lastly, to compare aggregated CSDL or CDL to county-level NASS data or state-level ARMS data, we used the coefficient of determination (R2) metric. The coefficient of determination is defined as

$${R}^{2}=1-frac{{sum }_{i}{({y}_{i}-{widehat{y}}_{i})}^{2}}{{sum }_{i}{({y}_{i}-bar{y})}^{2}}$$

where yi are “ground truth” areas (NASS or ARMS), ({widehat{y}}_{i}) are the area predictions under the classifier (CSDL or CDL), and (bar{y}) is the mean of ground truth areas. In words, R2 measures the proportion of the variance in captured by the predictions.

Because we care about absolute error in CSDL predictions, the R2 between CSDL and NASS is computed not for a simple linear regression between y and (widehat{y}), but on y and (widehat{y}) values themselves. In this case, R2 is bounded in the interval (−∞, 1]. It is possible for an R2 to be negative if the (widehat{y}) are worse predictions for y than the sample mean (bar{y}). In order for R2 to be close to 1, the predictions must be both positively correlated with ground truth and unbiased. R2 equals 1 only if ({sum }_{i}{({y}_{i}-{widehat{y}}_{i})}^{2}=0), which requires ({y}_{i}={widehat{y}}_{i}) for all i.