Cross-biome field survey and soil sample collection
Soil and vegetation data were collected using standardized protocols between 2016 and 2017 from 16 soil chronosequences (also known as substrate age gradients) located in nine countries and six continents (Fig. 1 and Supplementary Table 1). Soil chronosequences are often used to evaluate the changes in ecosystem structure and function over millennia because soil age for these locations is frequently known from geological surveys, models, and isotopic dating techniques (Fig. 1 and Supplementary Table 1). In these soil chronosequences, all other soil-forming factors except substrate age are kept relatively constant (i.e., current climate, vegetation, topography, and parent material), which permits the separation of the effects of time on ecosystem development from other ecosystem development state factors1,2,3.
Field surveys were conducted according to a standardized sampling protocol. We surveyed a 50 m × 50 m plot within each chronosequence stage, and within each quadrat, collected five composite surface soil samples from the surface 10 cm soil under the dominant vegetation types (e.g., trees, shrubs, grasses, etc.). Given the cross-biome nature of our study, we do not expect the timing (season) of sample collection to affect our results. Following field sampling, soils were sieved (<2 mm) and separated into two portions. One portion was air-dried and used for soil biochemical analyses. The other portion of the soil was immediately frozen at −20 °C for molecular analyses. This storage approach is commonly used in global surveys21,22. Our study includes 16 chronosequences, 87 plots, 261 transects, and 435 soil samples (Supplementary Table 3).
State factors of ecosystem development
We used a total of 30 predictors to describe the five state factors of ecosystem development: soil age, climate, organisms, parent material, and topography (Supplementary Table 4). Parent material was meant to represent the geological material at each location, and was characterized using the information on chronosequence origin (e.g., volcanic, sand dunes, or sedimentary), lithology, and soil type (Supplementary Table 4). Climate was represented by annual mean values and seasonality of current temperature and precipitation (Supplementary Table 4). Vegetation was represented by whether a site was grassland, shrubland, coniferous or angiosperm forest (Supplementary Table 4). Topography was represented by elevation, slope, and aspect. Soil age was represented by semi-quantitative and quantitative categories (Supplementary Table 4). A complete list of these predictors, detailed information on their units, and a rationale on their significance for ecosystem structure and function are in Supplementary Table 4.
There is no single accepted way to describe soil age in ecological studies. Because of this, we characterized soil age using three complementary metrics: a quantitative index of soil age (years; log10-transformed), a semi-quantitative index of age (where samples were given three soil age categories: thousands of years, hundreds of thousand years and millions of years) and a qualitative soil age index (standardized soil age range from 0 to 1 calculated for each chronosequences) (see ref. 13 for a similar approach).
The vegetation type (coniferous and angiosperm forest, shrubland, grassland) and the geological substrate origin (glacier, sand dunes, sedimentary, or volcanic) were annotated at each location in situ. We included vegetation type (e.g., Angiosperm forest) as a predictor, and plant functional type composition (e.g., % of grasses; see below) as a response variable because all vegetation types include mixtures of plant functional types (Supplementary Table 1), and vegetation type explains only a minor proportion of the functional type composition of plant communities (Fig. 2).
We used information on substrate origin, lithology, and soil orders to describe the parent material in our soils (i.e., the geological material in which soil develops). We used the three most common substrate origins (volcanic, sand dunes, and sedimentary; Supplementary Table 4) in our global survey for downstream analyses. Lithology information was obtained from the PANGAEA database23. Soil US Department of Agriculture (USDA) class information was retrieved from the SoilGrids250m database24 at 250 m resolution. We used the top five most common lithology and USDA class soil types (Supplementary Table 3) in our global survey for downstream analyses.
We classified all chronosequences included in this study as cold (continental and polar weather), arid, temperate, and tropical using information of the Köppen climate classification25. Climatic variables were collected from the Worldclim database26. Climatic information included maximum and minimum temperature, temperature seasonality, mean diurnal temperature range, mean annual precipitation, precipitation seasonality, and climatic biome type (dryland and mesic ecosystems) (Supplementary Table 3).
Regarding topography, information on topographic elevation, slope, and aspect was retrieved from ref. 27 at 30 m resolution. We retrieved the averaged slope and aspect class for each location using this database (Supplementary Table 3).
Current vs. past climates
Note that we used current climate26 information as our surrogate of climate. However, we also cross-validated this approach using paleoclimatic data (Worldclim database) from 64 soil chronosequences across the globe including the 16 chronosequences from this study (Fig. 1) and 48 other comparable soil chronosequences, using a global assessment of published data (Supplementary Fig. 1). Note that the current climate provided a good representation of the existing climate in multiple globally distributed soil chronosequences during hundreds of thousands of years, suggesting that locations where these chronosequences were developed experienced relatively small changes in precipitation and temperature patterns (Supplementary Fig. 5).
Ecosystem functional and structural properties
We measured 32 functional and structural properties, which we collectively refer to as ecosystem properties. We included a wide range of above- and belowground pools and fluxes that represent both rapidly and slow-changing ecosystem properties. This allowed us to provide a comprehensive collection of variables representing the structure and function of terrestrial ecosystems worldwide. A description of the units, data availability, and data resolution (sample, plot, transect) for functional and structural properties is given in Supplementary Table 3. See Fig. 2 for a list of the considered structural and functional properties.
The concentrations of dissolved inorganic N and P were determined for all soil samples. The concentration of dissolved inorganic N was obtained from 0.5 M K2SO4 extracts using colorimetric analyses21. The concentration of Olsen P was determined from bicarbonate extracts using colorimetric analyses21. Soil Olsen P concentrations were positively correlated with other commonly used methods28 for estimating available P pool sizes (resin-P; ρ = 0.72, P < 0.001, n = 87).
We determined the water-holding capacity and potential infiltration of all soil samples. Soil water-holding capacity was determined in the lab using the funnel method as described in ref. 29. Potential water infiltration was determined in the lab using a similar method to that described in ref. 30.
We determined the relative abundance of soil ectomycorrhizal and arbuscular mycorrhizal fungi via amplicon sequencing using the Illumina MiSeq platform14. Soil DNA was extracted using the Powersoil® DNA Isolation Kit (MoBio Laboratories, Carlsbad, CA, USA) according to the manufacturer’s instructions14. A portion of the eukaryotic 18S (V9 region) rRNA genes31 was sequenced using the Euk1391f ((5′-GTACACACCGCCCGTC-3′)/EukBr (5′-TGATCCTTCTGCAGGTTCACCTAC-3′) primer set14. Bioinformatic processing was performed using a combination of QIIME32, USEARCH33, and UNOISE334. Phylotypes (i.e., Operational taxonomic units; OTUs) were identified at the 100% identity level. Fungal guilds for fungal communities were identified using the FUNGUILD database35 focusing on probable and highly probable matches. The relative abundance (%) of ectomycorrhizal and arbuscular mycorrhizal fungi were calculated, from a 2000 reads/sample rarefied fungal OTU table14, as the proportion of 18S reads classified as mycorrhizal fungi (from unique matches) out of all fungal 18S reads. Taxa with mixed lifestyles were discarded.
We determined the amount of C stocks and C fluxes in all soil samples as surrogates of climate regulation (soil-atmosphere CO2 feedbacks). The concentration of soil total organic C was determined by colorimetry after oxidation with a mixture of potassium dichromate and sulfuric acid36. Bulk density was determined in the field at the plot level as the average from three soil cores. Using this information, we calculated C stocks (kg C m−2) to 10 cm depth. Note that the concentrations of total organic C were strongly correlated with those of total N (ρ = 0.90; P < 0.001, n = 435) across samples, so we kept only soil C for statistical modeling. Soil respiration was determined on composite soil samples per plot by quantifying the CO2 released during 16 days from 1 g of soil sample incubated at 28 °C at 50% of water-holding capacity in 20-mL glass vials in the dark, following a 1-week pre-incubation37.
We measured three extracellular enzyme activities in all soil samples: activity of β-glucosidase (sugar degradation), N-acetylglucosaminidase (chitin degradation), phosphatase (phosphorus mineralization). Extracellular soil enzyme activities were measured using 1 g of soil by fluorometry38. We also determined substrate-induced respiration rates for lignin (lignin degradation) using the Microresp protocol39. In brief, samples with and without (basal respiration) lignin were incubated for 6 h and read at 570 nm. Lignin degradation was calculated as respiration in lignin less the basal respiration. Finally, the respiration of glucose was assayed through the incubation of soils in the above conditions for soil respiration (28 °C and 50% of water-holding capacity) and addition of 240 µg of 13C-glucose (99 atom% U-13C, Cambridge Isotope Laboratories, Tewksbury, Massachusetts, USA) dissolved in water to each vial40. CO2 production and its isotopic composition were then measured as described in ref. 40.
Plant production and vegetation composition
We used the Normalized Difference Vegetation Index (NDVI) as our proxy for net plant primary productivity. This index provides a global measure of the “greenness” of vegetation across Earth’s landscapes for a given composite period, and thus acts as a proxy of photosynthetic activity and large-scale vegetation distribution. NDVI data were obtained from the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard NASA’s Terra satellites (http://neo.sci.gsfc.nasa.gov/). We obtained plant productivity information (averaged monthly values between 2008 and 2017) at a resolution of 250 m. Within each 50 × 50 m plot, three 50-m parallel transects were established, spaced 25 m apart. The relative abundance of woody vs. non-woody plants was calculated using vegetation transect information, and used as a surrogate of the availability of wood resources (Supplementary Table 3). The percentage of cover by trees, shrubs, open areas, forbs, and grasses was measured in each of the three transects located within each plot (see above) using the line-intercept method21. The dominant vegetation in each location is available in Supplementary Table 1.
Microbial biomass of soil was estimated from phospholipid fatty acids (PLFAs) extracted from freeze-dried soil samples and measured using the method described in ref. 41, as modified by ref. 42, to achieve high throughput analysis. The extracted fatty acid methyl esters were analyzed on an Agilent Technologies 7890B gas chromatograph with an Agilent DB-5 ms column (Agilent Technologies, CA, USA). The biomarkers selected to indicate total bacterial biomass are the PLFAs i15:0, a15:0, 15:0, i16:0, 16:1ω7, 17:0, i17:0, a17:0, cy17:0, 18:1ω7 and cy19:0, and the biomarker to indicate total fungal biomass is the PLFA 18:2ω6. Using the selected PLFA biomarkers, the biomass and the ratio of fungal and bacterial communities43,44 were calculated for each soil sample. Total microbial biomass includes the sum of all bacterial and fungal biomarkers plus that of other soil microbial biomarkers such as the eukaryotic C18:1w9. The fungal-to-bacterial ratio was calculated as total fungal PLFAs/total bacterial PLFAs, while the total microbial biomass for each soil sample was expressed as log10 (sum of all analyzed PLFAs).
Soil pH was measured with a pH meter, in a 1:2.5 mass:volume soil and water suspension. Texture (% of fine fractions: clay + silt) was determined on a composite sample for each chronosequence stage45. Total element concentrations (P, Al, Ca, Na, K, Mg) were determined by inductively coupled plasma atomic emission spectroscopy (IPC-AES) after microwave digestion with aqua regia and hydrofluoric acid46. We refer to soil total P determined using this method as soil P-HF. In the case of soil P (in H2SO4), P was obtained using a SKALAR San++ Analyzer (Skalar, Breda, The Netherlands) after digestion with sulfuric acid (3 h at 415 °C)21. We refer to this soil total P as Soil total P- H2SO4. The chemical index of alteration (CIA) [Al/(Al + Ca+ Na + K)] and total base cation reserves (TBR) [Ca+ Na + K+ Mg])47 provide information on level of weathering. The CIA index is also referred to as aluminum saturation. Note that we used two types of weathering indexes (CIA and TBR) and soil total P forms because these measurements could be more meaningful for certain types of soils (e.g., CIA in volcanic and TBR in sedimentary soils) in providing complementary information on similar concepts.
Soil total N concentrations were obtained using a SKALAR San++ Analyzer (Skalar, Breda, The Netherlands) after digestion with sulfuric acid (3 h at 415 °C)21. Soil C:N, N:P and C:P ratios were calculated using the information on total organic C and soil P- H2SO4 for consistency with ref. 14 which were analyzed as explained above.
Meta-analysis of published data
In addition to the surveyed 16 soil chronosequences, we conducted a meta-analysis (i.e., a statistical synthesis of results combining data from multiple separate studies) including information from several other independent, but comparable (centuries to millennia), soil chronosequences and important ecosystem properties (Supplementary Fig. 1; Table 2 and Supplementary Methods 1 and 2) to further validate some of our conclusions. We used the SCOPUS database (accessed August 2016) to extract data from published reports, articles, and reviews on the effects of ecosystem development on soil properties and vegetation. The following keyword combinations were used: “chronosequence” AND (“carbon” OR “nitrogen” OR “phosphorus” OR “biomass” OR “diversity”). Within these references, studies were included in our analyses only if they represented soil chronosequences that spanned longer time periods (i.e. centuries to millennia or longer). We obtained enough information for soil C stocks (calculated using bulk density information when available), total P, pH, texture (clay + silt) and soil C:N ratios. These variables also had the advantage that they are often measured with very similar methods and contain comparable information in terms of units etc. We used comparable surface soil data from the mineral surface horizons for our analyses (top ~10 cm). We excluded the 16 soil chronosequences surveyed in this study, to ensure independence of both databases. A total of 48 soil chronosequences were retained from our literature search (see Supplementary Fig. 1; Table 2 and Supplementary Methods 1 for a complete list of chronosequences studies).
Variation partitioning modeling
We then used Variation Partitioning Modeling48 to quantify the relative importance of soil age, climate, vegetation type, parent material type, and topography (Supplementary Table 4) in regulating 32 ecosystem properties. Specifically, this analysis allowed us to identify the unique and shared portion of the variation in the distribution of multiple ecosystem functional and structural properties explained by the five state factors that determine ecosystem properties. This approach is recommended specifically to deal with among-group multicollinearity, as it partitions the variance in a given response variable (ecosystem property) that is attributed to a particular group of predictors (a group of variables representing a given state factor; e.g., climate) from that variance shared among all predictors (all-state factors)48. Note that adjusted coefficients of determination in multiple regression/canonical analysis can, on occasion, take negative values48. Negative values in the variance explained for a group of predictors on a given response variable are interpreted as zeros, and correspond to cases in which the explanatory variables explain less variation than random normal variables would48.
We used the varpart function from the “vegan” R49 package to run these analyses. The original package is designed to evaluate the unique portions of variations in a given variable (e.g., function 1) explained by four explanatory matrices (groups of statistical predictors). We bypassed this issue by running each model two times. First, we ran our models with two explanatory tables (soil age and the remaining state factors). Using this run, we determined the amount of variation explained by soil age alone and shared with the environment. We then further partition the fraction of variation uniquely explained by the combination of state factors into unique and shared fractions explained by climate, vegetation type, parent material, and topography.
We complemented our Variation Partitioning modeling by conducting further partial (Spearman) correlations to evaluate the associations between 30 single variables within the climate, vegetation type, parent material and topography (Supplementary Table 4) with multiple ecosystem properties (Supplementary Table 4). These analyses were statistically controlled by three complementary soil age metrics (Supplementary Table 4) described above. We used the pcor function from the “ppcor” R50 package to run these analyses.
Changes in ecosystem structure and function within soil chronosequences
We further employed non-parametric Spearman rank correlations to further explore the potential associations between soil age (i.e., chronosequences stage) and 32 ecosystem properties. By using Spearman correlations, we aimed to identify the most important trends in our results. Spearman rank correlations measure the strength and direction of the association between two ranked variables. Spearman rank correlations do not require the normality of data or homogeneity of variances. Moreover, linearity is not strictly an assumption of these correlations (they can be run on a non-monotonic relationship to determine whether there is a monotonic component to the association, and therefore used to identify the most important trends between two variables). In addition, unlike Pearson correlations, Spearman rank correlations can be used to associate two variables regardless of whether they are ordinal, interval, or ratio. Spearman correlations have been used in many studies to identify major associations between soil age and ecosystem properties4,14,51. Again, for consistency with the previous work4,14,51, we used chronosequence stage as our surrogate for soil age in these analyses.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.