The detailed methods papers are available at - https://www.publish.csiro.au/sr/SR20283 (Part 1) and https://www.publish.csiro.au/sr/sr20284 (Part 2)
This work focuses on updating the Australian national digital maps of clay, sand and silt content.
The distinguishing aspect of this work is the inclusion of field observations of soil texture classes into the spatial modelling framework in addition to the laboratory measured data. An algorithm was developed to convert field observations into continuous vectors of clay, sand and silt percentage.
Together the approach is based on machine learning whereupon predictions of each of the three soil texture fractions generated across the country at 90m grid cell resolution, at depths corresponding to: 0-5cm, 5-15cm, 15-30cm, 30-60cm, 60-100cm and 100-200cm.
The approach was customised to accommodate the uncertainty in the conversion from field measurement to continuous vectors.
Existing methods of bootstrap resampling were exploited for the estimation of prediction uncertainties which are expressed as 90% prediction intervals about the mean prediction at each grid cell.
The models and the prediction uncertainties were assessed by a completely external validation dataset.
Our efforts are compared against the Version 1 Soil and Landscape Grid of Australia (v1. SLGA) described in Viscarra Rossel et al. (2015).
All predictive and functional accuracy diagnostics demonstrates improvements compared with the v1. SLGA products. The improvements were particularly noted for the sand fraction mapping, followed by the clay fraction mapping. Only marginal improvements were made for the silt fraction mapping, which has proven relatively difficult to predict in general. We also made comparisons with recently released World Soil Grid products (v2.WSG) and arrived at the same conclusions.
While some penalty is paid in terms of increased predictive uncertainty due to using field data, in many cases, particularly for sand and silt content, the uncertainties from v2.SLGA were in fact smaller than those from v1.SLGA.
This work demonstrates the need to continually revisit and if necessary, update existing versions of digital soils maps when new methods and efficiencies evolve. This agility is a key feature of digital soil mapping. However, without a companion program of new data acquisition through strategic field campaigns, continued re-modelling of existing data does have its limits and an eventual model skill ceiling will be reached which may not suit expectations for delivery of accurate national scale digital soils information.
The Australian soil data federator (Searle 2020) publishes API endpoints that allows one to retrieve and integrate soils data from around Australia. The federator is supported by numerous data custodians that currently include CSIRO which manages the Australian National Soil Archive and associated NATSOIL database, and the various State and Territory government soil survey agencies. Programmatic workflows developed in R (R Core Team, 2018) were written to compile all available data regarding measured soil texture fractions. Soil texture fractions are represented as percentage mass of coarse sand (200 - 2000 µm), fine sand (20 – 200 µm), silt (2 – 20 µm) and clay (< 2 µm) particles. Two main search queries were developed. The first was the extraction of laboratory measured data, and the second was the extraction of morphological descriptions of soil texture classes. For the laboratory measured data, additional information regarding site IDs, locational information and depth increments and laboratory method were also retrieved. Across Australia, particle size analyses have primarily been done by either the hydrometer method (Gee and Bauder 1986) or either pipette methods from Coventry and Fett (1979) and Bowman and Hutka (2002). Relatively fewer samples were measured with the Plummet Balance method (Marshall 1956). We noted virtually no cases where multiple methods were used for the same sample, so all data were just compiled as is irrespective of lab method. For the morphological data, in addition to retrieving the soil texture class information, we also got the locational and identifying metadata, depth increments, and where available the observed soil class and texture class modifiers. These latter two extractions were for the purposes of running the STA as described in Malone and Searle (this issue).
After the initial data compilation, pre-processing steps were undertaken to remove entries such as missing or clearly erroneous locational data, clearly spurious data entries and repeated observations. For the laboratory data, where applicable we summed the coarse sand and fine sand fractions to generate a complete dataset of samples with clay, silt, and sand fractions. Some screening of the data entailed removing samples where the sum of the texture fractions was not greater than 90%. For samples where the sum of fractions was between 90% and 100% (non-inclusive) each fraction was normalised to sum to 100%. With all pre-processing done there were 17367 sites with laboratory measurements and 180,498 sites with soil texture class descriptions to work with in this study.
Soil Texture Algorithm
We then developed an algorithm that generates plausible soil texture profiles that at its core is informed by re-calibrated soil texture class centroids of Australian soils. The unique aspect of the algorithm is that simulations are made by sampling from the empirical distribution soil texture fraction data that summarises each soil texture class. This sampling acknowledges the compositional properties of the soil texture information such that the multivariate sampling is done with data transformed via the isometric log-ratio transformation. The algorithm was further customised to accommodate soil contextual information to ensure there was some coherence between field observation and simulated data. This algorithm can accommodate for pedological features such as gradational clay content increases down a soil profile, and adjustments for soil texture qualifiers and sub plastic properties. This algorithm can non-specifically also accommodate for texture contrast soils.
Left Panel above. Simulations derived from the soil profile algorithm for synthetic soil profile representing a gradational increase in clay content down a profile. The soil texture classes for each layer are: Sand, Loamy Sand, Sandy Loam, Loam, Clay Loam, Light Clay, Medium Clay, and Heavy Clay. The top plot is done without consideration of contextual information and the other three are different simulations which do.
Right Panel above. Simulations derived from the soil profile algorithm for synthetic soil profile representing a clear and abrupt texture change . The soil texture classes for each layer are: Sandy Loam over Light Clay over Medium Clay.
We see this algorithm being an important instrument for unlocking the potential of field soil survey information for better understanding of soil heterogeneity across given spatial extents. For soil texture, in Australia, the differential between sites that have laboratory analysed data and only field observed STC information is upwards of 100,000. A significantly high number of field observation relative to lab data exist. The situation would be similar in other parts of the world too. The algorithm that has been developed should be a useful instrument with which to realise the potential of these under-utilised data in future digital soil mapping efforts, and other soil science applications in general where numerical representations of soil texture information is required such as pedo-transfer functions. For example, in the development of realistic inputs for calculation of plant available water content using pedo-transfer function where soil textural datasets will be an input. And realistic inputs for water balance models based on DSM gridded datasets as input datasets.
Spatial Modelling Framework
There are four main stages involved.
The first is the suite of approaches to prepare the data for spatial modelling.
Use of soil texture algorithm
Soil depth increment harmonisation with mass preserving spline
Co variate data preparation and then intersection with observed data
Then methods for model fitting and uncertainty quantification are implemented.
These models are then used to generate maps of clay, silt and sand percentages at specified depth intervals, together with associated prediction intervals.
The models and maps are then validated with an external data set.
External validation diagnostics for each soil texture variable at each studies soil depth interval. These diagnostics are provided for each of the three digital soil mapping products.
v2.SLGA: Updated digital soil mapping
v1.SLGA: Version 1 SLGA
v2.WSG: Version 2 World Soil Grids
Observed vs. Predicted plots for each of the three soil texture variables at the 0-5cm depth interval for v2.SLGA, v1.SLGA and v2.WSG.
Efficacy of the quantified uncertainties
The prediction interval coverage probability (PICP) approach was our selected tool to assess how well the estimated uncertainties perform under testing. This is simply done by assessing the coverage of the prediction intervals at different levels of confidence around an observed value. Plots as those shown here provide a generally good indication of what to expect where there is a relatively close tracking of the coverage probability with confidence level along the 1:1 line. These plots are for the 0-5cm depth interval but are more-or-less the same for the other depths too. There is possibly some argument that the assigned prediction intervals are slightly liberal in the sense they are overpredicted which is apparent by the coverage probability being above the red line in the plots. This warrants some further investigation, but it is probably an outcome of the uncertainty method itself whereby a single value of the systematic model errors is assigned to all uncertainty estimates. On average this will perform as intended but does not take into consideration locally varying estimates of the errors in a way that other methods such as that described in Shrestha and Solomatine (2006) do, where model errors are weighted based on membership to fuzzy classes each of which has an underlying distribution of the model errors. Some comparative work has been done in Malone et al. (2017 pp. 150) which does show some merit to this locally varying error approach compared to the approach used in the present study, where a similar pattern in the coverage probabilities was observed. Possibly the hurdle to overcome in advancing the Shrestha and Solomatine type approach is the implementation, in that requires a slightly more detailed architecture and marginally more compute effort.
Comparisons between mapped uncertainty estimates
Side-by-side comparisons of the estimates of uncertainty are done in a simple manner in this study just to examine the extent of, and where similarities and differences occur in terms of the prediction interval ranges between v1.SLGA and v2.SLGA. The figure below some these comparisons for each soil texture fraction for the 0-5cm and 30-60cm depth interval.
As with other outputs, the result of the maps for each of the other depth intervals is provided in the supplementary material. Given our methodology, the maps show either no change between prediction interval ranges (we allowed a 5% margin of difference to evaluate a difference), or v2.SLGA prediction ranges wider than v1.SLGA prediction ranges, or v1.SLGA prediction ranges wider than v2.SLGA prediction ranges. Clearly from the maps displayed, the prediction intervals for clay content are measurably wider for v2.SLGA.
Of course, there are a few exceptions to this, but this sort of result is to be expected given that much of the data used in the modelling had to some degree a measurable amount of uncertainty given they were field derived estimates. It is encouraging from a modelling perspective that these uncertainties are propagated through to the mapped predictions, but the relatively wider prediction intervals should not detract from the fact that on average, the predictions associated with v2.SLGA are measurably more accurate than v1.SLGA.
The concept of the prediction interval might be confusing for some, but it should not be interpreted as a uniform distribution of plausible values within the ranges of the uncertainty bounds. Rather, uncertainty, at least in the way it is expressed in this study, is commonly expressed either as a normal distribution or some other empirical distribution function where most of the prediction mass is centred about the mean or median of the interval.
What the efforts in v2.SLGA have demonstrated is that there has been an improved estimate of the prediction average, but because of the use of relatively less certain input data, the associated prediction intervals are wider in a lot of cases. This encouragingly has not always been true where the corresponding comparisons between sand and silt show quite distinct areas where v1.SLGA prediction intervals are wider.
The intention of this study was to update v1.SLGA soil texture maps for all of Australia, and in doing do, produce v2.SLGA products. These updated maps are of the spatial resolution, support and extent as v1.SLGA products, but the underlying data used to produce them are significantly different. Most data used in the modelling of v2.SLGA soil texture maps are sourced from field measurement data, as opposed to laboratory measured data. Subsequently, an algorithm was developed – Soil Texture Algorithm (STA) – which transforms field measured soil texture class data into continuous vectors of clay, sand and silt fractions. There is uncertainty built into this transformation which is carried over to the spatial modelling and ultimately propagated through the final produced maps. Our mixture model underpinned by machine learning ultimately yielded:
For all texture fractions and soil depth intervals measurable predictive improvement relative to v1.SLGA and also v2.WSG products.
Particularly for clay content, uncertainties, expressed as 90% prediction intervals were with some exceptions, measurably wider than those derived for v1.SLGA.
There was a spatial dependence on whether the prediction intervals were wider for v1.SLGA or v2.SLGA
Ultimately, despite using a significant number of relatively less precise data, v2.SLGA soil texture products are more accurate and in a lot of cases more certain than v1.SLGA products. From this study, we expect to perform and observe more research instances in DSM that investigate to utility of incorporating field measured data into their workflows. As in our case, these data are often plentiful relative to laboratory data, but from our own experience, underutilised. This study has demonstrated their potential value for successfully updating and improving digital soil maps.
Adhikari, K., Mishra, U., Owens, P.R., Libohova, Z., Wills, S.A., Riley, W.J., Hoffman, F.M., Smith, D.R., 2020. Importance and strength of environmental controllers of soil organic carbon changes with scale. Geoderma 375, 114472.
Aitchison, J., 1986. The Statistical Analysis of Compositional Data. Chapman & Hall, London.
Arrouays, D., McBratney, A., Minasny, B., Hempel, J., Heuvelink, G., Macmillan, R.A., Hartemink, A., Lagacherie, P., McKenzie, N., 2014. The GlobalSoilMap project specifications. In: D. Arrouays, N. McKenzie, J. Hempel, A. Richer-de-Forges, A. McBratney (Eds.), GlobalSoilMap: Basis of the Global Spatial Soil Information System. Proceedings of the 1st GlobalSoilMap Conference. CRS Press/Balkema, The Netherlands, pp. 9-12.
Bishop, T.F.A., McBratney, A.B., Laslett, G.M., 1999. Modelling soil attribute depth functions with equal-area quadratic smoothing splines. Geoderma 91(1), 27-45.
Bivand, R.S., Pebesma, E., Gomez-Rubio, V., 2013. Applied spatial data analysis with R. Springer, New York.
Bowman, G., Hutka, J., 2002. Particle size analysis. In: N.J. McKenzie, K. Coughlan, H.P. Cresswell (Eds.), Soil physical measurement and interpretation for land evaluation. CSIRO Publishing, Melbourne, Vic.
Breiman, L., 2001. Random Forests. Machine Learning 45(1), 5-32.
Carlile, P., Bui, E., Moran, C., Minasny, B., McBratney, A.B., 2001. Estimating soil particle size distributions and percent sand, silt and clay for six texture classes using the Australian Soil Resource Information System point database. CSIRO Land and Water Technical Report 29/01, Canberra.
Chen, S., Mulder, V.L., Heuvelink, G.B.M., Poggio, L., Caubet, M., Román Dobarco, M., Walter, C., Arrouays, D., 2020. Model averaging for mapping topsoil organic carbon in France. Geoderma 366, 114237.
Christensen, W.F., 2011. Filtered Kriging for Spatial Data with Heterogeneous Measurement Error Variances. Biometrics 67(3), 947-957.
Coventry, R.J., Fett, D.E.R., 1979. A pipette and sieve method of particle‐size analysis and some observations on its efficacy. CSIRO Australia, Division of Soils, Divisional Report No. 38, Australia.
Cressie, N.A.C., 1991. Statistics for Spatial Data. Wiley, New York.
Czarnecki, W.M., Podolak, I.T., 2013. Machine Learning with Known Input Data Uncertainty Measure. In: K. Saeed, R. Chaki, A. Cortesi, S. Wierzchoń (Eds.), Computer Information Systems and Industrial Management. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 379-388.
Delhomme, J.P., 1978. Kriging in the hydrosciences. Advances in Water Resources 1(5), 251-266.
Efron, B., Tibshirani, R., 1997. Improvements on Cross-Validation: The .632+ Bootstrap Method. Journal of the American Statistical Association 92(438), 548-560.
Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C., 2003. Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology 35(3), 279-300.
Gallant, J., Read, A., Dowling, T., 2012. Building the national one-second digital elevation model for Australia, Water Information Research and Development Alliance: Science Symposium Proceedings, Melbourne, 1-5 August 2011.
Gee, G.W., Bauder, J.W., 1986. Particle-size analysis. In: A. Klute (Ed.), Methods of soil analysis. Part 1. 2nd ed. ASA and SSSA, Madison, WI, pp. 382-411.
Grundy, M.J., Rossel, R.A.V., Searle, R.D., Wilson, P.L., Chen, C., Gregory, L.J., 2015. Soil and Landscape Grid of Australia. Soil Research 53(8), 835-844.
Harwood, T., Ferrier, S., Harman, I., Ota, N., Perry, J., Williams, K., 2014. Gridded continental climate variables for Australia. CSIRO Land and Water, Canberra.
Hijmans, R.J., 2019. raster: Geographic Data Analysis and Modeling. R package version 2.9-5, https://CRAN.R-project.org/package=raster.
Johnston, R.M., Barry, S.J., Bleys, E., Bui, E.N., Moran, C.J., Simon, D.A.P., Carlile, P., McKenzie, N.J., Henderson, B.L., Chapman, G., Imhoff, M., Maschmedt, D., Howe, D., Grose, C., Schoknecht, N., Powell, B., Grundy, M., 2003. ASRIS: the database. Soil Research 41(6), 1021-1036.
Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., Cooper, T., Mayer, Z., Kenkel, B., Benesty, M., Lescarbeau, R., Ziem, A., Scrucca, L., Tang, Y., Candan, C., Hunt, T., 2019. caret: Classification and Regression Training. R package version 6.0-84. CRAN, https://CRAN.R-project.org/package=caret.
Lark, R.M., Bishop, T.F.A., 2007. Cokriging particle size fractions of the soil. European Journal of Soil Science 58(3), 763-774.
Malone, B.P., McBratney, A.B., Minasny, B., Laslett, G.M., 2009. Mapping continuous depth functions of soil carbon storage and available water capacity. Geoderma 154(1), 138-152.
Malone, B.P., Minasny, B., McBratney, A., 2017. Using R for Digital Soil Mapping. Springer, The Netherlands.
Malone, B.P., Minasny, B., Odgers, N.P., McBratney, A.B., 2014. Using model averaging to combine soil property rasters from legacy soil maps and from point data. Geoderma 232-234, 34-44.
Malone, B.P., Searle, R., 2020. Improvements to the Australian national soil thickness map using an integrated data mining approach. . Geoderma 377, 114579.
Malone, B.P., Searle, R., 2020. Updating the Australian soil texture digital soil maps: Part 1. Re-calibration of field soil texture class centroids. THIS ISSUE.
Marshall, T.J., 1956. A plummet balance for measuring the size distribution of soil particles. Journal of Applied Science 7, 142-147.
McBratney, A.B., Mendonça Santos, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117(1), 3-52.
McKenzie, N., Jacquier, D., Ashton, L., Cresswell, H., 2000. Estimation of soil properties using the atlas of Australian soils. Technical Report 11/00. CSIRO Land and Water, Canberra, ACT.
Minasny, B., McBratney, A.B., 2001. The Australian soil texture boomerang: a comparison of the Australian and USDA/FAO soil particle-size classification systems. Soil Research 39(6), 1443-1451.
Muzzamal, M., Huang, J., Nielson, R., Sefton, M., Triantafilis, J., 2018. Mapping Soil Particle-Size Fractions Using Additive Log-Ratio (ALR) and Isometric Log-Ratio (ILR) Transformations and Proximally Sensed Ancillary Data. Clays and Clay Minerals 66(1), 9-27.
Northcote, K., Beckmann, G., Bettenay, E., Churchward, H., Van Dijk, D., Dimmock, G., Hubble, G., Isbell, R., McArthur, W., Murtha, G., Nicolls, K., Paton, T., Thompson, C., Webb, A., Wright, M., 1960-1968. Atlas of Australian soils, sheets 1 to 10. Melbourne University Press, Melbourne.
Odeh, I., Todd, A., Triantafilis, J., 2003. Spatial Prediction of Soil Particle-Size Fractions As Compositional Data. Soil Science 168, 501-515.
Odgers, N.P., McBratney, A.B., Minasny, B., 2011. Bottom-up digital soil mapping. I. Soil layer classes. Geoderma 163(1), 38-44.
R Core Team., 2018. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Ruddell, B.L., Drewry, D.T., Nearing, G.S., 2019. Information Theory for Model Diagnostics: Structural Error is Indicated by Trade-Off Between Functional and Predictive Performance. Water Resources Research 55(8), 6534-6554.
Searle, R., 2015. The Australian site data collation to support the GlobalSoilMap. In: D. Arrouays, A.B. McBratney, J. Hempel, A.C. Richer-de-Forges (Eds.), GlobalSoilMap: Basis of the global spatial soil information system. CRC Press. CRC Press, London, UK, pp. 127-133.
Searle, R., Grundy, M.J., McBratney, A.B., Gregory, L.J., Wilson, P.L., Malone, B.P., Stenson, M., 2019. Phased development of digital soil infrastructure for Australia and its contribution to global initiatives, 2019 Joint workshop for Digital Soil Mapping and GlobalSoilMap IUSS Working groups, Santiago, Chile.
Searle, R., Stenson, M., Wilson, P.L., Gregory, L.J., Singh, R., Malone, B.P., 2020. Soil data, united, will never be defeated – The SoilDataFederator, Joint Australian and New Zealand Soil Science Societies Conference, Cairns, QLD.
Shrestha, D.L., Solomatine, D.P., 2006. Machine learning approaches for estimation of prediction interval for the model output. Neural Networks 19(2), 225-235.
Solomatine, D.P., Shrestha, D.L., 2009. A novel method to estimate model uncertainty using machine learning techniques. Water Resources Research 45(12).
Somarathna, P.D.S.N., Minasny, B., Malone, B.P., Stockmann, U., McBratney, A.B., 2018. Accounting for the measurement error of spectroscopically inferred soil carbon data for improved precision of spatial predictions. Science of The Total Environment 631-632, 377-389.
Stockmann, U., Austin, J., Gallant, J., Cocks, B., Glover, M., Thomas, M., ., Verburg, K., 2020. Macquarie-Bogan floodplain Plant Available Water Capacity prediction case study. CSIRO Technical Report, Canberra, Australia.
Taylor, J., Minasny, B., 2006. A protocol for converting qualitative point soil pit survey data into continuous soil property maps. Australian Journal of Soil Research 44, 543-550.
van den Boogaart, K.G., Tolosana-Delgado, R., Bren, M., 2018. compositions: Compositional Data Analysis. R package version 1.40-2, https://CRAN.R-project.org/package=compositions.
Viscarra Rossel, R.A., 2011. Fine-resolution multiscale mapping of clay minerals in Australian soils measured with near infrared spectra. Journal of Geophysical Research: Earth Surface 116(F4).
Viscarra Rossel, R.A., Chen, C., Grundy, M.J., Searle, R., Clifford, D., Campbell, P.H., 2015. The Australian three-dimensional soil grid: Australia’s contribution to the GlobalSoilMap project. Soil Research 53(8), 845-864.
Wadoux, A.M.J.C., Samuel-Rosa, A., Poggio, L., Mulder, V.L., 2020. A note on knowledge discovery and machine learning in digital soil mapping. European Journal of Soil Science 71(2), 133-136.
Wiesmeier, M., Urbanski, L., Hobley, E., Lang, B., von Lützow, M., Marin-Spiotta, E., van Wesemael, B., Rabot, E., Ließ, M., Garcia-Franco, N., Wollschläger, U., Vogel, H.-J., Kögel-Knabner, I., 2019. Soil organic carbon storage as a key function of soils - A review of drivers and indicators at various scales. Geoderma 333, 149-162.
Wilford, J., 2012. A weathering intensity index for the Australian continent using airborne gamma-ray spectrometry and digital terrain analysis. Geoderma 183-184, 124-142.
Wright, M.N., Ziegler, A., 2017. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. 2017 77(1), 17.