Mercedes Román Dobarco, Jose Padarian Campusano

Introduction and objectives

This document reports the methodology followed to map 1370 pedogenon classes for Australia and provides a list of the scripts (in R and python) used for modelling.

Pedogenon classes are a conceptual taxa that aim to define groups of homogeneous environmental variables. These groups are created applying unsupervised classification to a set of state variables, proxies of the soil-forming factors for a given reference time. The assumption is that the soil-forming processes within these classes (i.e., pedogenons) have been relatively similar over pedogenetic time and thus have developed soils with similar properties. Pedogenon classes can afterwards be divided into subclasses along a gradient from less (i.e., remnant pedogenons) to more anthropogenic pressure on soils (i.e., pedophenons), in an analogous way to the concept of genoform and phenoform (Rossiter and Bouma, 2018) (Figure 1). The assessment of changes in soil condition can be done with a space for time substitution within and across pedogenon classes (Figure 1). The conceptualization and methodology for pedogenon mapping and using the classes as basis to assess changes in soil condition are explained with more detail in two publications (Román Dobarco et al., 2021a; Román Dobarco et al., 2021b)

Figure 1: Simplified methodology for pedogenon and pedophenon mapping for identifying reference soils (remnant pedogenons) to assess changes in soil condition caused by contemporary anthropedogenesis.

The goals of pedogenon mapping are:

  • Stratify the landscape into classes of homogeneous soil-forming factors (pedogenons) under the assumption that they delineate areas of similar pedogenesis up to a given reference time, that can be

  • Overlayed with information of contemporary drivers of soil change to define subclasses across a hypothetical gradient of contemporary anthropogenic pressure, ranging from least affected and representative of natural and historic anthropedogenesis (genosoils) to variants by soil management that have seen modified their functions (phenosoils).

  • These subclasses can be used to assess changes in soil condition across chronosequences.

  • The reference soils can be used to set targets for restoration, if contemporary land use has caused degradation of overall soil functions, assuming the soil is moderately resilient and has not reached a point of no return in degradation.

  • Pedogenon classes, genosoils and phenosoils can be used as strata for soil monitoring.

Overview of the product

The number of pedogenon classes for Australia was chosen as 1370 to compare with a map of pedogenon classes created by Brendan Malone and Senani Karunaratne with a slightly different method. The choice of the optimal number of pedogenon classes is not definite, as we do not have sufficient validation data. The optimal number of classes is hard to estimate and changes with the purpose of the mapping, the spatial scale in which the classes are defined, etc. The classes can be aggregated with different methods, e.g., based on distances between centroids. The resulting pedogenon map is shown in Figure 2, where random colours have been assigned to each class.

Figure 2: Pedogenon map of Australia (1370 classes). The colours are assigned randomly.

An alternative to visualize the map is to assign similar colours to groups of pedogenon classes closer in the feature space of the environmental covariates. Similarly as in Román Dobarco et al. (2021b), a hierarchical clustering was applied to the centroids as individuals, to visualize the organization of pedogenon classes into groups. The choice of the number of “families” or branches to divide the dendrogram can be supported by objective indices of clustering quality. In this case, we compared the Dunn and Silhouette indices (Figure 3) but finally decided the colour scheme by trial and error.

Figure 3: Optimal number of clusters were estimated with the Dunn and Silhouette indices in the hierarchical clustering. The centroids of the 1370 classes were treated as individuals.

In figure 5, we assigned 25 colour palettes to the dendrogram of pedogenon classes after hierarchical clustering. The dendrogram indicated that there are few, very dissimilar classes (Figure 4). The purpose of assigning a colour palette to classes of the same branch in the dendrogram rather than assigning random colours is to indicate which classes are related to each other. However, this does not allow to differentiate well individual classes. In figure 5 we can observe larger areas are covered by groups of pedogenon classes across Australia, and these are spatially compact, except in south-western Australia, where there is higher heterogeneity of pedogenon classes.

Figure 4: Hierarchical clustering of the pedogenon centroids generated with Ward’s method.

Figure 5: Pedogenon classes for Australia (1370 classes). The colour palette is assigned following a dendrogram where similarities among classes are estimated with Ward’s method.

The hierarchical dendrogram indicated that there are some classes very dissimilar from the rest, based on the height of the dendrogram (Figure 4). Therefore, the hierarchical dendrogram was repeated excluding seven classes considered outliers. A different colour palette was assigned to each of 21 branches in the dendrogram (Figure 6). These classes were afterwards coloured in grey in the maps but were hardly visible (Figure 7).

Figure 6: Hierarchical dendrogram of pedogenons centroids after excluding seven very dissimilar classes from the analysis.

Figure 7: Pedogenon maps of Australia (1370 classes). Seven classes considered very dissimilar from the rest are coloured in grey.

Another way of checking the proximity of the classes is to perform a dimension reduction using the centroids as individuals and plot the projected scores. In this case, we applied the Umap algorithm (McInnes et al., 2018). The centroids projected into the two umap dimensions show that many classes that are separated into different branches in the dendrogram are relatively close in this space. In figure 8, each of the points represents a centroid and the colours follow the same pattern as in figures 6 and 7. In some cases, the proximity of the centroids in the umap space is similar to the pattern of proximities in the geographical space (Figure 7). For example, the area near Broome (in lilac-bluish) and the areas around Prince Regent River (darker blue) are closer together in the top-left of figure 8.

Figure 8: Centroids of pedogenon classes projected in two dimensions of the umap dimension reduction.

Even when using 21 different colour palettes, it is not easy to differentiate these groups of pedogenon classes. In figure 9 each group has a different colour, and we can differentiate general geographic regions (e.g., coastal southeast, south, central southeast, etc.), although in the umap dimension reduction there are different groupings that emerge (figure 10).

Figure 9: Groups of pedogenon classes.

Figure 10:Centroids of pedogenon classes projected in two dimensions of the umap dimension reduction coloured by branch in the dendrogram.

Constraints, appropriate use and general uncertainty of the product

This product has high uncertainty because there is not enough data to assess the differences in soil properties between pedogenon classes, and hence that they represent individual soil entities. This assessment could be done for some classes in al local scale in the Namoi-Edgeroi district, New South Wales (Jang et al., under review), where we could evaluate the soil properties of pedogenon classes and differences between land uses subclasses (genosoils and phenosoils). The concept of pedogenon is still at its initial stage of development and future versions will have refined the appropriate number of classes that describe the pedodiversity of Australia. Another aspect that requires improvement is the selection of environmental covariates with an objective method and not based on expert knowledge. The use of the pedogenon classes, depending on the analysis, would require that the users include the information of the classes centroids in their analyses (e.g., to identify which classes are more similar). Otherwise, the maps can be used as a stratification of the landscape, which at this level is more similar to what this is, since the classes have not been populated with soil data yet.

Future pathways and opportunities for the product

Pedogenon mapping is a method for stratifying the landscape (similar to soil-landscape units), which can be used to assess past soil change with a space-for-time substitution approach. This research line has several methodological aspects and applications that need to be further investigated and improved in the short and mid-term. The current maps are the first element for this assessment. The classes have to be divided into subclasses depending on the type, degree of human pressure, and time since land use change.

Some elements of the framework must be redesigned according to the land use history and local pedogenetic context. For example, a question arises with respect to the choice of a reference time and characterization of human forcings: How can we identify reference soils when these ecosystems are the result of long-term intensive human influence? If the reference is the mid XXth century, cadastral maps, historic aerial photographs, and maps of estimated natural vegetation can be included as proxies for organisms. But all these elements are not available or recorded for Australia. In this context, human-shaped ecosystems are assumed to define the soil system in quasi-steady state for that reference time (Kuzyakov and Zamanian, 2019).

Further work can consist of developing an index of cumulative human pressures on soil, that will be used to assign the degree of modification of soils across the landscape and for identifying a reference state for each pedogenon class (i.e., soils with least change). These soils can be used to assess changes in soil condition and soil functions caused by contemporary human activities.

Stakeholder/industry/engagement/user uptake.

The pedogenon mapping approach had a good reception in the field of soil monitoring. The next version of pedogenon maps of Australia will be used to design a soil monitoring program for Australia using pedogenon classes as strata. This project (starting July 1st, 2022) is funded through the Soil Science Challenge and will be done in collaboration with researchers from CSIRO. This gives a measure of the scientific impact this methodology can have for improving the knowledge on soil condition at national scale.

Pedogenon mapping also interested private environmental consulting companies that aim to incorporate pedogenon maps for optimizing sampling strategies for measuring SOC over large rangelands (> 40,000 ha) in Australia. Thus, this framework can have a positive economic impact on farmers, by reducing the costs for soil sampling, quantifying carbon credits and setting targets for soil organic carbon sequestration.

Sequence of scripts

The scripts for R, python, and Google Earth Engine (GEE) used to generate the pedogenon maps can be found at -

In the attached directory are stored. The scripts make reference to the paths in the local desktop at the University of Sydney for accessing the input data but can be adapted for reproducibility. The steps are described in the scripts.

Ÿ 1_PedogenonsAustralia.R: In these script the 27 selected covariates (Table 1) are processed to the same extent, coordinate reference system and alignment. A regular sample is taken from the covariates (the sample is set to 20,000,000 pixels of which 9,854,024 have data). The inverse Cholesky transformation is applied to the covariates to decorrelate them (Román Dobarco et al., 2021b). The sampled pixels are saved into the file PedogenonsAustraliadata.feather that can be opened with python. The values of the covariates in the sampled > 9M pixels in the original scale, decorrelated, their coordinates and the triangular matrix (Cholesky decomposition of the variance-covariance matrix) are saved into the file CLORPT.27_vl.RData.

Ÿ This script was run with python in the supercomputer GADI. The script runs a parallelised version of k-means (scalable k-means by Bahmani et al. (2012)) to reduce the computation time. This function is called with the module dask_ml.cluster from the package Dask ( The algorithm was set for defining 1370 classes, with a maximum of 30,000 iterations, convergence tolerance set to 0.000001, and initialization method k-means ++. We run 50 different initializations and retained the result with lower inertia. In this case, this corresponded to the initialization 21, with the centroids saved in the file centroids_21.npy. All the results from GADI are saved into the folder “Output/ Output_GADI”.

Ÿ 3_GEE script: Mapping the pedogenon classes was done with the Google Earth Engine platform, assigning each pixel to the closest centroid after rescaling with the inverse Cholesky transformation. The map was afterwards downloaded and processed with the next script.

Ÿ 4_GEE_Pedogenons_vizCentroidsAustralia.R: The pedogenon map downloaded from GEE was mosaicked and processed: nodata values (-9999 and 0) were masked, projected to EPSG:3857 and resampled to 800 m resolutions for visualizing with leaflet. The TIFF files PdgnAus_k1370.tif (WGS84) and PdgnAus_k1370.3857.tif is the resulting pedogenon map. This script also predicts the cluster membership on the +9M pixels, in the original scale and calculates the summary statistics of the covariates by pedogenon class. The script also: 1) generates a hierarchical dendrogram using the cluster centroids as individuals to explore the similarities between pedogenon classes, 2) calculates the optimal number of branches in which the dendrogram can be divided using the Silhouette and Dunn indices. Once the optimal number of classes is determined, a colour palette is assigned with the aim of facilitating the visualization of the 1370 classes. Another way of visualizing how close the pedogenon classes are amongst themselves, is to perform a dimension reduction, e.g., umap.

Ÿ FunctionsMapsStudyAreas.R: Additional functions that can be used and modified to explore characteristics of the classes (e.g., area occupied, closest pedogenon in the feature space with Mahalanobis distance, etc.) and visualize them with leaflet.

Table 1: Covariates used to describe clorpt (Jenny, 1941) (Jenny, 1941) or scorpan (McBratney et al., 2003) factors and generate pedogenon classes. P: parent material; S: soil; T: time; R: relief; Cl: climate; O: organisms

Table 2: Summary of RData and python files produced with the modelling scripts


  • Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S., 2012. Scalable k-means++. arXiv preprint arXiv:1203.6402.

  • Gallant, J., Austin, J., Van Niel, T., 2014. Mean monthly total shortwave radiation on a sloping surface modelled using the 1" DEM-S - 1" mosaic. v2. In: C.D. Collection (Ed.).

  • Gallant, J.C., Dowling, T.I., 2003. A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resour Res 39(12).

  • Geoscience Australia, 2019. Total Magnetic Intensity (TMI) Grid of Australia 2019 - seventh edition - 80 m cell size

  • Harwood, T., Donohue, R., Harman, I., McVicar, T., Ota, N., Perry, J., Williams, K., 2016. 9s climatology for continental Australia 1976-2005: Summary variables with elevation and radiative adjustment. v3. In: C.D. Collection (Ed.).

  • Jang, H.J., Román Dobarco, M., Minasny, B., McBratney, A., Jones, E., Developing and testing of pedogenons in the lower Namoi valley, NSW, Australia. Geoderma.

  • Jenny, H., 1941. Factors of Soil Formation: A System of Quantitative Pedology. Dover Publications, New York.

  • Kuzyakov, Y., Zamanian, K., 2019. Reviews and syntheses: Agropedogenesis - humankind as the sixth soil-forming factor and attractors of agricultural soil degradation. Biogeosciences 16(24), 4783-4803.

  • Malone, B., Searle, R., 2021. Updating the Australian digital soil texture mapping (Part 2*): spatial modelling of merged field and lab measurements. Soil Research 59(5), 435-451.

  • McInnes, L., Healy, J., Melville, J., 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.

  • Minty, B., Franklin, R., Milligan, P., Richardson, L.M., Wilford, J.R., 2009. Radiometric Map of Australia. Geoscience Australia, Canberra dataset/ga/68851.

  • Minty, B.R.S., 2019a. Radiometric Grid of Australia (Radmap) v4 2019 filtered pct potassium grid. Geoscience Australia, Canberra 5dd48d628f4f6.

  • Minty, B.R.S., 2019b. Radiometric Grid of Australia (Radmap) v4 2019 filtered ppm thorium. Geoscience Australia, Canberra 5dd48e3eb6367.

  • Minty, B.R.S., 2019c. Radiometric Grid of Australia (Radmap) v4 2019 ratio uranium over thorium. Geoscience Australia, Canberra 5dd4a63603704.

  • Quinn, P., Beven, K., Chevallier, P., Planchon, O., 1991. The Prediction of Hillslope Flow Paths for Distributed Hydrological Modeling Using Digital Terrain Models. Hydrol Process 5(1), 59-79.

  • Román Dobarco, M., McBratney, A., Minasny, B., Malone, B., 2021a. A framework to assess changes in soil condition and capability over large areas. Soil Security 4, 100011.

  • Román Dobarco, M., McBratney, A., Minasny, B., Malone, B., 2021b. A modelling framework for pedogenon mapping. Geoderma 393, 115012.

  • Wilford, J., 2012. A weathering intensity index for the Australian continent using airborne gamma-ray spectrometry and digital terrain analysis. Geoderma 183, 124-142.

  • Xu, T., Hutchinson, M.F., 2013. New developments and applications in the ANUCLIM spatial climatic and bioclimatic modelling package. Environmental Modelling & Software 40, 267-279.