Australian Soil Classification Map
Overview
Did you know that until now, Australia has not had a fine scale, comprehensive and reliable map of soil types? Long story… but suffice to say, previous national soil maps were either broad scale, inconsistent, or inaccurate. We used Digital Soil Mapping (DSM) technologies combined with the real-time collations of soil attribute data from TERN's recently developed Soil Data Federation System, to produce a map of Australian Soil Classification Soil Order classes with quantified estimates of mapping reliability at a 90m resolution.
Enquiries - ross.searle@csiro.au
- Methods
The workflow for this modelling was implemented in the R programming language. The scripts for this workflow can be downloaded from the AusSoilDSM GitHub site.
We estimated soil classes at the Order level of the Australian Soil Classification (ASC).
1.1 Soil Classification Data
Soil classification data was extracted from the SoilDataFederator (SDF) using this script. A total of 195,383 observations with either an Australian Soil Classification (ASC) or a Principal Profile Form (PPF) classification or a Great Soil Group (GSG) classification were extracted (Figure 1). This data file can be downloaded here. Of these observations 130,570 of them had an ASC directly assigned by a pedologist. Inconsistencies within the raw ASC codes were rectified using this remap table. The remaining 64,813 observations either had a PPF or an ASC assigned to them by pedologists. The PPF and GSG classification where then transformed to an ASC using these remap tables. (GSGtoASC and PPFtoASC).
Figure 1. The distribution of observations across the continent used in the DSM. Shading depicts soil orders as per the legend in Figure 2.
1.2 Raster Covariate Data
The 90m raster covariate data was obtained from TERNs publicly available raster covariate stack. These covariates can be downloaded from here. Metadata for these rasters can be viewed here and a csv metadata file can be downloaded from here. A parsimonious set of these covariates was used in the modelling. The "CovariatesToUse" file as referenced in the scripts can be downloaded here.
1.3 Machine Learning Modelling
We used the R "Ranger" Random Forest package to implement a machine learning model as per standard Digital Soil Mapping (DSM) methodologies. The model generation script is available here.
The observed geographic locations in the ASC data set were used to extract cell values from the raster covariate stack using the R "raster" package. This drilled covariate data set is available here. This data set was then divided into a 90/10% split of training and external validation sets. The training data was then bootstrapped sampled 50 times to create 50 bootstrap training sets. These training sets were then used to generate 50 Random Forest model realisations.
1.4 Mapping the Models
Using the CSIRO Pearcey High Performance Compute (HPC) cluster the Random Forest models were evaluated against the input covariate raster data stack. This was done for each 90m raster cell across the nation for each of the 50 bootstrapped model realisations using this script. The modal ASC value across the 50 realisations for each cell was determined and assigned as the most probable soil type for that cell in the output raster. The ratio of the second most probable soil to the most probable soil was also calculated to generate a model confusion index, an estimate of the structural uncertainty in the Random Forest model.
1.5 Merging the Modelled Map with Existing ASC Mapping
The Australian Soil Resource Information System (ASRIS) contains a product that is a compilation of all existing polygon mapping conducted by state and federal soil survey agencies across all of Australia (Figure 2). This product is made up of a diverse range of field mapping products at a range of mapping scales. From this product we extracted all polygons that were mapped at a scale of 1:100,000 or finer, as defined in the Guidelines For Surveying Soil And Land Resources (Blue Book) (Figure 2). Polygons mapped at this scale are high quality spatial estimates of the distribution of soil attributes. We then rasterised these polygon ASC values and merged these values into our final estimates of ASC, i.e., where an ASRIS 100,000 scale polygon exists it will replace the modelled ASC value. The script for these steps is available here.
Using the held out external validation set we intersected the locations of this data with the final map product to generate an overall mapping accuracy assessment only with a range of other accuracy metrics.
Figure 2. ASRIS best scale polygon map.
Figure 3. ASRIS best scale polygon map where the scale of mapping is 1:100,000 or finer.
2. Results
The merged modelled and fine scale polygon map is shown in Figure 4. Figure 5 shows the Confusion Index generated from the Random Forest bootstrap predictions. A value of 1 is a high confusion, meaning the model is predicting two or more soil types with a similar probability, and a value of 0 means the model is predicting one soil type with a high probability. Using the held out external validation set we calculated an overall mapping accuracy of 60.6% and a kappa vale of 0.55.
Table 1 shows the values for the modelled confusion matrix. Table 2 show the user and producer metrics for the modelling. From a user perspective the Vertosols have the best chance of being located where they are predicted to be with a User Accuracy of 78% while Organosols are the least likely to be located where predicted with a User Accuracy of 16%. Using a binomial test we can see that the classification result is highly significant with a value of 1.965693e-167. If the classification were repeated under the same conditions, it can be assumed that the Overall Accuracy is 95% in the range of 59.86% and 61.35.
Figure 4. The merged modelled and fine scale polygon map.
Figure 5. The Confusion Index generated from the Random Forest bootstrap predictions.
Table 1. Model confusion matrix for modelled ASCs
Table 2. User and Producer scores for the final ASC map product.
3. Comparisons with Previous Maps
3.1 Background
In Australia, the state and territory government agencies are primarily responsible for the collection and management of soils data. For the last 70 years these agencies have been collecting soil site data to meet the needs of state requirement. Soil site data and polygon map data are managed in data systems tailored to the requirements of the individual agencies. The systems were typically purpose built and supported the operating requirements of each state agency under state legislation and regulation.
Thus, in Australia, there are eight independent and unique soil data management systems running in a diverse range of software and hardware environments, but fortunately, most comply with a common semantic model as described in the Australian Soil and Land Survey Field Handbook (National Committee on Soil and Terrain, 2009).
Given this situation, it has proved difficult over the years to produce nationally consistent soils information uniformly across the continent that is readily accessible and useful at reasonably fine scale. Several attempts have occurred in the last 60 years to produce a nationally consistent soil type map.
The first attempt, the Atlas of Australian Soils (Northcote et al, 1960-68), was compiled by CSIRO in the 1960s to provide a consistent national description of Australia's soils. It comprises a series of ten maps and associated explanatory notes, compiled by K.H. Northcote and others. The maps are published at a scale of 1:2,000,000, but the original compilation was at scales from 1:250,000 to 1:500,000. Mapped units in the Atlas are soil landscapes, usually comprising several soil types. The explanatory notes include descriptions of soils landscapes and component soils. Soil classification for the Atlas is based on the Factual Key (Northcote 1979). In 1991, a digital version of the Atlas was created by the Bureau of Rural Science from scanned tracings of the published hardcopy maps. The Digital Atlas of Australian Soils is available as a shapefile. Additionally, there is a reliability map available, with a descriptive legend. The source of the reliability data is unknown (ASRIS 2018). In 2002 Ashton and McKenzie, interpreted the original Atlas data to generate for the first time a nationally consistent soil map, classified using the Australian Soil Classification (Isbell , 2002) (Figure 6.). Whilst this map and the descriptive data associated with it was useful for many purposes, it was produced at a very broad scale and mostly qualitative in nature, being generated from aerial photo interpretations with very little ground-truthing.
The second attempt at producing nationally consistent soils information was The Australian Soil Resource Information System (ASRIS) (Johnston et al. 2003). ASRIS was a collaboration of all the state and territory soil survey agencies designed to bring together all the existing soil site and polygon map data into a nationally consistent format. A range of products was developed within the ASRIS program including a “best scale” soil type map once again based on the Australian Soil Classification System (Figure 7). This polygon map was a compilation of all the existing soil mapping data across a range of scale from 1:25,000 to 1: 2,000,000 scales. Thus, in some areas the data was very accurate and in others not so accurate. The quality of the attached soil attribute data also ranged from highly quantitative to highly qualitative, and there were numerous inconsistencies across state and territory borders. As can be seen in Figure 2 the compilation map does not completely cover the entire continent.
In 2018 Teng et. al. produced the first quantitative update of a national soil map at the continental scale at a 1 km grid cell resolution (Figure 8). They produced a map of Australian Soil Classification Orders with data derived from a combination of traditional soil profile classifications and soil classifications made with visible–near infrared (vis–NIR) spectroscopy, totaling 38 756 individual observations. Digital soil class mapping (DSM) using a Random Forest model was performed. The overall error rate of the DSM model, tested on an independent validation set, was 55.6%. A detailed description of the methodology used can be found here and the raster map data can be downloaded from the CSIRO Data Access Portal.
3.2 Accuracy Comparisons
It is difficult to do a truly unbiased comparison of map accuracies for different products as previous map versions are based on expert knowledge and a large qualitative component. The Atlas of Australian Soils map also contains non mapped component information within each polygon. There is also no statistically valid external validation set available to apply to the older maps. In this work we did not have access to the independent validation data set used in the Teng et al. (2018) work, thus confirmatory accuracy assessment is not possible. However, if we take the existing observed dataset upon which this new map is based and compare the ASC classifications at these locations with those of the previous mapping efforts, we can get a general understanding for how the various maps compare.
Using this approach we calculated that the:
Atlas of Australian Soils map has an overall accuracy of 36%,
ASRIS "best scale" polygon map has an overall accuracy of 47%
Teng et al. (2018) report a mapping accuracy of 55.6% using an external validation set of 12,919 observations. We calculated a map accuracy of 42% using our entire observed data set of 180,915 observations. It is possible that differences in map resolutions explain this discrepancy.
The updated DSM modelled Australian Soil Classification map produced in this study has an overall accuracy of 61%.
Figure 6. Atlas of Australian Soils (1968) reinterpreted to ASC classes in 2002
Figure 7. ASRIS best scale polygon map.
Figure 8. Modelled Australian Soil Classification Orders (2018)
4. Data Access
A complete metadata record for this dataset is available here
The raster data sets for both the ASC Classification map and the Confusion Index map can be accessed via a range of methods
OGCWeb Mapping Services (WMS) and Web Coverage Services (WCS ) at -
The WMS web service root is at - https://www.asris.csiro.au/arcgis/rest/services/TERN/ASC_ACLEP_AU_NAT_C/MapServer
The WCS web service root is at - https://www.asris.csiro.au/arcgis/services/TERN/ASC_ACLEP_AU_NAT_C/MapServer/WCSServer
or as Cloud Optimised GeoTiffs at - https://swift.rc.nectar.org.au/v1/AUTH_05bca33fce34447ba7033b9305947f11/landscapes-csiro-slga-public/NationalMaps/SoilClassifications/ASC/90m/
5. Acknowledgements
State and Territory soil survey agencies - for collecting, maintaining and making available the data which makes this DSM modelling possible
CSIRO - for , maintaining and making available the data which makes this DSM modelling possible and developing and maintaining the Australian Soil Resource Information System
TERN - for supporting this work
Harry Goodman - for developing the initial DSM methodology upon which this modelling is based.
6. References
Isbell, R. F. (2002) The Australian Soil Classification. Revised Edition. CSIRO Publishing, Melbourne.
Johnston, R. M., S. J. Barry, E. Bleys, E. N. Bui, C. J. Moran, D. A. P. Simon, P. Carlile, N. J. McKenzie, B. L. Henderson, G. Chapman, M. Imhoff, D. Maschmedt, D. Howe, C. Grose, N. Schoknecht, B. Powell and M. Grundy. (2003) "ASRIS: the database." Soil Research 41(6):1021-1036.
National Committee on Soil and Terrain. 2009. Australian Soil and Land Survey Field Handbook: CSIRO Publishing.
Northcote, K.H. (1979) A Factual Key for the Recognition of Australian Soils. 4th edn., Rellim Technical Publishers, Glenside, SA
Northcote, K. H. with Beckmann, G. G., Bettenay, E., Churchward, H. M., Van Dijk, D. C., Dimmock, G. M., Hubble, G. D., Isbell, R. F., McArthur, W. M., Murtha, G. G., Nicolls, K. D., Paton, T. R., Thompson, C. H., Webb, A. A. and Wright, M. J. (1960-1968) Atlas of Australian Soils, Sheets 1 to 10. With explanatory data (CSIRO Aust. and Melbourne University Press: Melbourne).
Teng, H.; Viscarra Rossel, R. A.; Shi, Z. & Behrens, T. (2018) Updating a national soil classification with spectroscopic predictions and digital soil mapping. Catena, 164:125-134.