Skip to content
English
  • There are no suggestions because the search field is empty.

Predictive Soil Model overview

Learn how Regrow extrapolates soil measurements to every field in your project

Collecting soil samples from every field in a large-scale carbon project is rarely practical. Regrow's Predictive Soil Model bridges this gap by taking measurements from a strategically selected subset of fields and extending those values across the entire project area. The result is a field-level digital soil map providing estimates of soil organic carbon (SOC) and bulk density — the key inputs needed to run DNDC simulations and quantify carbon outcomes.

This article explains how the model works, how its outputs are used, and how uncertainty is handled. It covers the third phase in Regrow's soil workflow, following the Soil Sample Design and soil measurements.

What you'll learn:

  • Model inputs

  • Predictive modeling methodology

  • How predicted values initialize DNDC

  • Uncertainty handling

  • Edge cases for non-sampled or out-of-domain fields.

Why a predictive model?

Protocols such as Verra VM0042 or CAR SEP require DNDC to be initialized with field-level measurements of SOC and bulk density. Directly sampling every field is often cost-prohibitive at scale. Regrow's approach uses a stratified sampling design to collect measurements from a representative subset of fields, then fits a statistical model to extrapolate those measurements across all fields in the project.

This approach aligns with the alternative method described on page 55 of VM0042, which permits modeling at the level of a homogeneous management unit — in this case, the individual field. It enables full-field modeling and supports project-level uncertainty quantification when required.

Model inputs

The Predictive Soil Model relies on two categories of inputs:

  • Measured soil samples collected at stratified sample locations as part of the Soil Sample Design Plan. Each sample provides values for:

    • SOC percentage (SOC%)
    • Fine Soil Bulk Density (g/cm³)
  • Auxiliary geospatial data recorded at every point on the 30-meter discretized field grid, including:
    • Soil texture and properties from SoilGrids (or SSURGO for CONUS fields)
    • IPCC climate zone classification
    • Elevation and topographic indices
    • Remotely sensed vegetation indices
    • Country or region (as a proxy for land management practices)

These auxiliary variables are the same data layers used to define strata in the Soil Sample Design Model. They inform the extrapolation indirectly.

Model methodology

Regrow's Predictive Soil Model fits two independent regularized stratum-level mean models: one for SOC% and one for bulk density. The sole predictor in each model is stratum membership: the combination of IPCC climate zone, USDA soil texture class, and country assigned to each field during the sample design phase.

For each stratum, the model estimates a mean SOC% and mean bulk density from the measured samples collected within that stratum. A regularization penalty is applied so that strata with fewer samples are pulled toward the overall project-wide mean, rather than relying entirely on limited local data. This makes estimates for data-sparse strata more stable and conservative.

Tip: Think of it like estimating the typical house price in a neighborhood where only a few homes have recently sold. Rather than treating each neighborhood in complete isolation, you allow sparse neighborhoods to be informed by the broader market — producing more reliable estimates than if you relied solely on one or two local data points. The model does the same for soil: it learns from all available samples across the project, with stratum membership determining how that information is applied to each field.

SOC% is log-transformed before fitting to account for its right-skewed distribution; predictions are then back-transformed to the original scale.

Why this approach is well-suited for this use case:

  • The strata already capture the key sources of variability. Because strata are defined using the auxiliary variables most correlated, and most available, with SOC and bulk density — climate zone, soil texture, and region — assigning predictions by stratum membership is an efficient and transparent way to extend measurements across the project.
  • Soil data is expensive to collect. The model is deliberately simple and data-efficient, extracting reliable signal from a modest number of well-placed samples without requiring a large dataset to function.
  • The model produces interpretable uncertainty estimates. The regularization approach yields 90% prediction intervals alongside each estimate, giving auditors and stakeholders a transparent view of predictive uncertainty.
  • Sparse strata are handled conservatively. The regularization penalty means that fields in understudied strata receive estimates informed by the full project dataset, rather than being driven by a single sample or excluded entirely.

 

Limitations: The model may not work as well on smaller programs & field areas.

Standard fit metrics like R² can be unreliable indicators of model quality, particularly for small programs or sparse datasets. When sample sizes are fewer than roughly 15–20 observations across the project, treat fit metrics with caution and evaluate model outputs in the context of known soil variability in the project area.

Prediction domain

The model is applied to every discretized 30-meter grid point across the entire project area. Its valid domain is limited to points belonging to strata where at least one lab measurement was returned. Points in strata with no usable samples are excluded from model predictions — see the Edge Cases section below for how these fields are handled.

Point-level predictions are averaged by field to produce a single mean SOC% and mean bulk density for each field. These field-level values are what get passed to DNDC.

Model domain and edge cases

The table below summarizes how common scenarios are handled when fields fall outside or at the boundary of the model’s valid domain.

Scenario

How It's Handled

Field is in a stratum with ≥1 usable sample

Field receives predicted soil properties from the GAM.

Field is in a stratum with 0 usable samples

Field is excluded from model domain; SoilsGrid defaults may be used as fallback, or the field may be excluded from quantification.

Field is dropped from the program after sample design

Samples from the dropped field are excluded; remaining fields in the same stratum are unaffected.

New fields added after sample design

If new fields fall within an existing stratum, GAM predictions apply. If in a new or unseen stratum, SoilsGrid may be used as fallback.

Fields spanning multiple strata

A field can belong to multiple strata; it is not at risk as long as at least one of its strata has a usable sample.

Model outputs and interpretation

The main outputs of the Predictive Soil Model are:

  • Field-level estimates of SOC% and bulk density for every field in the project
  • A project-specific digital soil map providing spatially continuous predictions across the project area
  • 90% prediction intervals quantifying predictive uncertainty at the field level

These outputs serve two purposes. First, they initialize DNDC, which produces estimates of SOC change (dSOC), nitrous oxide (N₂O) emissions, and in some cases methane (CH₄). Second, they communicate model assumptions and uncertainty to auditors and stakeholders, supporting the credibility of carbon offset claims.

DNDC output uncertainty is combined with the model’s structural uncertainty, which is based on literature-derived estimates, to produce a final uncertainty range for project outcomes.

Using predicted soil properties to initialize DNDC

The predicted SOC% and bulk density values for each field are used as initialization inputs for the DNDC biogeochemical model, which then simulates changes in soil carbon and greenhouse gas emissions over time.

How these inputs are used depends on whether input uncertainty is considered negligible (de minimis):

When input uncertainty is de minimis — In most protocols, the same SOC and bulk density values are used to initialize DNDC for both the baseline and practice-change scenarios. Because the uncertainty in these inputs cancels out when calculating the difference, it is treated as negligible. In this case, each field runs a single deterministic DNDC simulation using the field mean SOC and bulk density from the digital soil map.

When input uncertainty must be propagated — For some protocols, such as Verra VM0042, the de minimis assumption may not be justified. In these cases, input uncertainty is propagated through DNDC using the 90% prediction interval from the model. Two DNDC simulations are run per field: one initialized at the lower bound and one at the upper bound of the prediction interval. The resulting range brackets the potential effect of input uncertainty on key outputs.