Soil Sample Design Model overview

Soil stratification is a fundamental step in designing a sampling strategy for soil carbon projects. Stratification involves dividing a project area into smaller, relatively homogenous sub-areas, known as strata, based on characteristics that are correlated with soil organic carbon (SOC) and bulk density. By accounting for known sources of variability, stratification increases sampling efficiency and reduces the number of samples required to predict a statistically valid estimate of SOC levels.

What you'll learn: This document outlines the methodology used in Regrow's stratification model, including data inputs, stratification logic, sample size calculations, sample location selection, and protocol considerations.

Project stratification & model inputs

The stratification model starts by discretizing the project fields into a 30-meter by 30-meter grid. For each grid cell, a suite of publicly available auxiliary data is collected, including:

soil properties from SoilGrids (non-CONUS) or SSURGO (CONUS) -> SOC, bulk density, soil textures (sand, silt, loam), pH, clay, etc.
IPCC climate zone classifications
elevation data
topographic information
remotely sensed vegetation indices

These auxiliary variables serve as proxies for SOC variability and are used both to define the discrete strata for the project and as inputs to the model to place sample point locations.

The design of the stratification framework involves both discrete and the auxiliary variables.

Discrete strata are first defined using combinations of IPCC climate zone, USDA soil texture class, and country, which is used as a proxy for typical regional land management practices. The project boundary is divided into discrete strata.

The additional auxiliary variables are considered as well, which allows for a more granular representation of variability across the project landscape. Variables such as SOC, bulk density, soil texture, elevation, vegetative indices and topography are used to guide the distribution of sample points within each stratum, intended to ensure that the samples selected are representative of the full range of conditions.

Determining sample size for the project

Verra VM0042 and CAR SEP require stratified random sampling, but offer limited direction regarding how strata should be defined and how many should be created. To address this ambiguity, Regrow’s Sample Design Model suggests sample sizes designed to maximize the representativeness of the sampling plan, while balancing practical constraints such as sampling budgets.

While it is ultimately up to the customer to choose the number of samples they can take for the project, Regrow’s Sample Design Model can suggest three different sample size options designed to support different levels of budget.

Variance-based estimate for the project. Sample size is determined through a calculation based on the variability observed in auxiliary data layers, particularly those related to SOC. Drawing on methods described by Bettigole et al. (2023), the sample size is estimated to control the margin of error for the mean SOC at a 90% confidence level with a 5% allowable error, considering all project fields. This is not a formal power analysis but a method to constrain the uncertainty of the mean estimate. The sample size is calculated across multiple spatial units, including the entire project area, areas with common soil texture classes, and individual fields.
Stratified variance-based estimate. This employs the same methodology described in point 1 above, but considers an appropriate sample count per strata, rather than at a project level.
Area-based estimate. This method follows conservative industry standards, creating a sampling density of one sample per 1.6 hectares (4 acres). It's based on recommendations from scientific studies and widely accepted protocols, such as those referenced in carbon credit methodologies.

For additional information about sample size estimations, and how to choose the most appropriate sample count for your project, see Choosing the right number of Soil Samples. Final sample counts employed for a project are typically based on the available budget.

Sample allocation & point selection

Once the total sample size is defined, sample points are allocated across strata to ensure compliance with protocol requirements and to maximize statistical utility. To start, a minimum of three samples is assigned to each stratum in compliance with Verra VM0042 requirements. After each stratum is allocated 3 samples, the remaining sample counts are distributed proportionally based on a weighting metric that incorporates both the standard deviation of SOC within the stratum (as estimated from auxiliary data) and the field area in the strata.

Within each stratum, specific sample locations are selected using Conditional Latin Hypercube Sampling (cLHS), a method proposed by Minasny and McBratney (2006). This technique optimizes the sample distribution by ensuring that the selected points span the range of SOC percentages and bulk densities observed within each stratum. This approach enhances the robustness of the sampling design and reduces bias in the model inputs.

Model outputs

The outputs of the soil stratification model include a Sample Design Plan that identifies specific locations for soil sampling, and backup sample points to use in the event that a field or suggested point is not accessible at the time of sampling.

Additionally, in some cases fields may need to be sampled for soil pH and clay fraction. These are required inputs for DNDC modeling, and typically are sourced from publicly available soils databases such as SoilsGrid or SSURGO. In some cases, these databases do not contain pH and clay percentages and as such fields must be sampled for these data points in addition to SOC and bulk density. Any fields that require additional sampling will be noted in the sampling plan.

Deviations from protocols

Regrow’s modeling approach introduces a key deviation from the standard methods described in Verra VM0042. While VM0042 typically requires that DNDC modeling be restricted to sampled fields only, Regrow’s approach involves modeling every field in the project, regardless of whether it was sampled. This is aligned with the alternative approach described on page 55 of VM0042, which allows for modeling at the level of a homogeneous management unit (i.e., the field).

Additionally, physical soil data collected from a subset of fields is used not only to initialize the model but also to inform an empirical model that extrapolates these values to the entire project area. This approach enables full-field modeling and facilitates the estimation of project-level structural uncertainty, although some aspects of this uncertainty quantification are still in development.