Data on total organic carbon (TOC) concentration (%) was extracted with the SoilDataFederator managed by TERN. The Soil Data Federator is a web API that compiles soil data from diﬀerent institutions and government agencies throughout Australia. The laboratory methods for total organic carbon included in the study are 6A1, 6A1_UC, 6B2, 6B2b, 6B3, 6B3a. We selected TOC data from the period 1970-2020 to get a compromise between representativity of current TOC concentration and spatial coverage. The data was cleaned and processed to harmonize units, exclude duplicates and potentially wrong data entries (e.g. missing upper or lower horizon depths, extreme TOC values, unknown sampling date). Additional TOC measurements from the Biome of Australian Soil Environments (BASE) contextual data (Bisset et al., 2016) were also included in the analyses. TOC concentration for BASE samples was determined by the Walkley-Black method (method 6A1). Upper limits for TOC concentration by biome and land cover classes were set according to published literature, consistent datasets (Australian national Soil Carbon Research Program (SCaRP) and BASE, and data exploration to exclude unrealistic TOC values (e.g. maximum TOC = 30% in temperate forests, maximum TOC = 14% in temperate rainfed pasture). Since TOC concentration in Australian ecosystems has been underestimated by previous SOC maps, we did not set conservative TOC upper limits, knowing that machine learning model would likely underestimate high SOC values.
The equal-area quadratic spline function were fitted to the whole collection of pre-processed TOC data, and then values extracted for the 0-5 cm, 5-15 cm, 15-30 cm, 30-60 cm, 60-100 cm, and 100-200 cm depth intervals, following GlobalSoilMap specifications (Arrouays et al., 2014}. Boxplots with TOC values by biome and land cover after data cleaning and depth standardization are shown in Figure 1.