Total soil nitrogen (%)
Background
The first effort to derive national digital soil mapping of total soil nitrogen (expressed as a percentage of fine soil mass) is published and available on the CSIRO Data Access Portal among other places. The present work sort to update this mapping as part of ongoing efforts to expand and improve Australia’s national mapping and characterisation of its soil resources. Collectively these national soil mapping efforts constitute the Soil and Landscape Grid of Australia. The original work has been deemed as Version 1 (completed 2015), while the new work logically is Version 2 (completed 2023). This work has been made possible through support and funding from Australia’s National Collaborative Research Infrastructure Strategy (NCRIS) via the Terrestrial and Ecosystem Research Network
As with the first effort, digital soil mapping is the underpinning framework for the ultimate creation of soil maps in this instance.
As with the other more recent national digital soil mapping efforts, the SoilDataFederator (Searle 2020) has been instrumental in the dynamic collation of disparate soil observational datasets from across the country. These data have been sourced mainly from each State and Territory Government departments tasked with soil survey and collection. Plus there are other data contributions from Universities and to a lessor extent individual research groups. The SoilDataFederator also taps into the larger CSIRO developed Natsoil database (CSIRO 2020) which holds the data related to research projects and field stations that CSIRO has managed.
The improvement in digital soil mapping has come about via several mechanism.
1. A huge expansion of the available library of data corresponding to each of the main soil state factors has been made possible (Searle et al. 2022). This is through acquisition of new data sets and improvement of others compared with those used for version 1.
2. Adoption of machine learning to derive empirical relationships between target variable (total soil nitrogen content) and various data related to the state factors that help determine and control soil variability across landscapes, here the Australian continent and very nearshore islands. While the adoption of ML is not an entirely new advancement, the coupling of it with additional data, and integration of it within a psedo-3D predictive framework permit an improved ability to spatially and vertically characterise soils than Version 1 did.
3. Together with a more powerful and streamlined predictive modelling approach, the quantification of uncertainties draws on the use of the UNEEC (Uncertainty Estimation based on Empirical Errors and Clustering; Shrestha and Solomatine 2006) approach instead of bootstrapping approach so that prediction interval bounds are more custom to the variations in state factor information. Bootstrapping tends to create uniform prediction interval ranges, whereas UNEEC can distinguish areas of relatively lower and higher uncertainties based on differences in soil and landscape characteristics. Therefore, for Version 2, the uncertainties are more custom and tightly defined to the environment they are quantified in.
4. An approach to understand and characterise issues of model extrapolation has been developed. This seeks to highlight areas where there is high confidence that models are going be unreliable, because these areas are outside the range of the underpinning data used in modelling. This issue is addressed via combination of data geometric and distance-based techniques.
Available data and work steps
Figure. Compiled locations where there are measurements of total soil nitrogen.
After several steps of data analysis to process data and make it suitable for model work, including the harmonisation of depth intervals of all data, there were found to be 9928 available sites. Collectively across all depths, the were 38124 cases with total nitrogen information. From the total number of cases a test set of 8000 was removed via random selection from the total case number load. The breakdown of proportions of data cases for each depth and in total, calibration and test sets is provided in the table below.
The sequence of steps below were carried out to develop the Version 2 products
Prepared point and covariate data, including filtering, cleansing, and harmonisation
Point data intersection with covariates.
Creation of model and test data sets.
Ranger model hyperparameter value optimisation
Ranger model fitting with best hyperparameters.
Spatialisation of ranger models
Uncertainty analysis with UNEEC method including rudimentary optimisation of class number size.
Spatialisation of model uncertainties.
Model extrapolation work with count of observation and boundary method (point data).
Ranger model fitting of extrapolation outcomes.
Spatialisation of model extrapolation outcomes.
Model evaluations with both test data and against SLGA Version 1 products.
Delivery of digital soil mapping outputs and computer code to repository.
Evaluation of Version 2 and comparison with Version 1
Map comparisons
First some maps of the 0-5cm depth interval are shown comparing Version 1 and Version 2. There are subtle differences in the overall means, while prediction interval limits for Version 1 appear larger than that for Version 2. Zooming right into a farm scale depiction, here the CSIRO Boorowa Agricultural Research Station, subtle differences are evident, where in general Version 1 estimates may be systematically higher than Version 2 ones. It is not to be expected that either mapping version characterise any granular variation that might exhibit across the research station at the resolution at which SLGA maps are produced (~90m). Spatial patterns at the farm scale for both versions look similar.
Figure. Digital soil mapping predictions and associated uncertainties of soil nitrogen for the 0-5cm depth from both Version 1 and 2 SLGA. Both versions zoomed right into CSIRO Boorowa Farm for farm scale visual comparisons.
Model extrapolation risk likelihood
The maps of model extrapolation risk likelihood as their name suggests highlights areas of concern for model estimate reliability both in the sense of the best estimate and the associated estimates of uncertainty. Maps below show the extrapolation risk likelihood for the 0-5cm and 30-60cm depth intervals. Areas in red are where predictions are probably best not to be relied upon, simply because these areas exist outside the domain of the data used in the modelling. More importantly these maps highlight the increasing risk further down the soil depth. This highlights the important issues of soils information, in that much has been done to characterise and understand what is happening at the soil surface, but not much at appreciable depths below.
Figure. Model extrapolation risk likelihood maps for the 0-5cm and 30-60cm depth intervals. Effectively risk increases with increasing soil depth due to relatively fewer data compared with that at top and near soil surface.
Metrics for model evaluation include R2, Lin’s Concordance correlation coefficient, mean error, and root mean square error. For SLGA Version 2 evaluations are done for the whole test set and for each depth interval. For Version 1, just depth interval specific evaluations are done. A prediction interval test was also done for Version 1, where the observed cases were projected into the associated prediction envelops of the mapped data.
While it is guaranteed that all test cases used in this analysis were excluded form all model fitting work for the development of version 2 products, it can not be guaranteed for Version 1 products. Irrespective of this, these model evaluations point to substantial improvements of Version 2 over Version 1.
Figure. Model evaluations based on test data set for both Version 1 and 2 SLGA total soil nitrogen maps.
Figure. Accuracy plots for selected depth intervals for both SLGA Version 1 and 2 products.
In terms of assessments of uncertainty, the PICP plots indicate pretty good fidelity to the 1:1 line. At the depth specific scale there is some deviation from the assessment made for all depths combined. At the soil surface, uncertainties appear slight underestimated, where are depth (here 100-200cm) they appear over estimated. In hindsight it would appear logical that the uncertainty quantification system be develop separately for each modelled depth interval. However this will probably not have any major bearing on the final outcomes and their reliability of field and downstream modelling purposes.
Figure. PICP plots for all data cases combined, and for 0-5cm, 100-200cm depth intervals.