Bacteria and Fungi Beta Diversity

Continental-scale mapping of soil bacteria and fungi beta diversity

Mercedes Roman Dobarco, Alexandre Wadoux, PeiPei Xue

Key Points

  • Soil microorganisms mediate a wide range of key processes and ecosystem services on which humans depend. In this study, we report on the biogeography and spatial pattern of soil biota for the Australian continent.

  • We used as basis the DNA sequences from the Biome of Australia Soil Environments (BASE) which were collected over a range of different sites across Australia. We calculated the beta diversity of abundant taxa of soil bacteria and fungi, treating representative sequence data (OTUs) as individual taxa.

  • Two ordination methods were applied to investigate the dissimilarities in microbial community composition, non-metric multidimensional scaling (NMDS) and Uniform Manifold Approximation and Projection (UMAP) for dimension reduction.

  • The NMDS and UMAP used the weighted UniFrac distance for bacteria and Bray-Curtis dissimilarity for fungi on taxa relative abundance. The results of the NMDS for bacteria indicated that the structure of the data was captured fairly well, with a stress of 0.09. However, the stress of the fungi NMDS was 0.16, indicating that the fungi community composition was moderately well explained.

  • We further collected a large set of environmental covariates that control the biogeography of soil biota, such as soil properties terrain attributes of vegetation indices, and of which maps are available.

  • We fitted a quantile regression forest machine learning model to exploit the quantitative relationship between point-estimated values of beta diversity and environmental covariates, and used to model to predict beta diversity across Australia along with an estimate of uncertainty.

  • Soil property and vegetation are the dominant controls of soil biota.

  • The resulting maps also reveal the pattern of soil biota which can further be used for regional assessment of soil biodiversity and from which degradation induced by global changes can be monitored.

Soil Bacteria and Fungi– Abundant taxa

We transformed the abundance data into relative abundance and calculated the average relative abundance of each OTU across all sites. The abundant taxa were defined as those from the top 1%, after ranking the OTUs by relative abundance, that occur in at least 10 % sites of sites or those OTUs that occurred in 50% of sites. The dominant taxa accounted mostly between 10-30 % of the OTUs present in the samples. These criteria resulted in 2063 taxa for bacteria and 184 taxa in fungi across 1373 samples.

We tested three methods for analyzing the biodiversity: non-metric multidimensional scaling (NMDS), Copula ordination, and Umap. Hereafter NMDS is reported as it is the most common approach. The NMDS is performed using Bray-Curtis dissimilarity metric on abundant taxa.

Figure 3: First two axes of NMDS for a) bacteria and b) fungi – abundant taxa. The colour represents the phylum.

The stress of the NMDS is 0.11 for bacteria and 0.16 for fungi, suggesting that the structure of the data has been captured relatively well and moderately well, respectively.

Figure 4: Maps of the NMDS axes value at point location for bacteria and fungi.

Mapping of NMDS axes

Covariates: We collected a set of 59 spatially exhaustive environmental covariates covering Australia and representing proxies for factors influencing bacteria and fungi spatial distribution: soil properties, climate, organisms/vegetation, relief and parent material/age. The covariates were reprojected to WGS84 (EPSG:4326) projection and cropped to the same spatial extent. All covariates were resampled using billinear interpolation or aggregated to conform with a spatial resolution with grid cell of 90 m x 90 m.

Mapping: The spatial distribution of NMDS axes for bacteria and fungi is driven by the combined influence of climate, vegetation, relief and parent materials. We thus modelled the NMDS axes for bacteria and fungi as a function of environmental covariates representing biotic and abiotic control of bacteria and fungi. The point values of NMDS axes for bacteria and fungi and their corresponding value of environmental covariate at same measurement locations were used to fit the mapping model. For the mapping we used a machine learning model called quantile regression forest.

Mapping is made with Quantile regression forest, which is similar to the popular random forest algorithm for mapping. Instead of obtaining a single statistic, that is the mean prediction from the decision trees in the random forest, we report all the target values of the leaf node of the decision trees. With QRF, the prediction is thus not a single value but a cumulative distribution of the NMDS axes prediction at each location, which can be used to compute empirical quantile estimates.


Validation of predictions: Each depth-specific model of the NMDS axes for fungi and bacteria was validated based on the results of a K-fold cross validation. The whole dataset was randomly split into K=10 approximately same size folds. Each fold was kept apart for the validation and the remaining K-1 folds were used as calibration dataset. Models were compared using the mean error (ME), the root mean square error (RMSE), the squared Pearson's r correlation coefficient (r2), and the modelling efficiency coefficient (MEC).