Skip to main content

Estimating topsoil texture fractions by digital soil mapping - a response to the long outdated soil map in the Philippines

Abstract

Digital soil mapping for soil texture is mostly an understanding of how soil texture fractions vary in space as influenced by environmental variables mainly derived from the digital elevation model (DEM). In this study, topsoil texture models were generated and evaluated by multiple linear regression (MLR), ordinary kriging (OK), simple kriging (SK) and universal kriging (UK) using free and open-source R, System for Automated Geoscientific Analyses, and QGIS software. Comparing these models is the main objective of the study. The study site covers an area of 124 km2 of the Municipality of Barili, Cebu. A total of 177 soil samples were gathered and analyzed from irregular sample points. DEM derivatives and remote sensing data (Landsat 8) were used as environmental variables. Exploratory analyses revealed no outlier in the data. Skewness and kurtosis values of the untransformed data vary greatly between –3.85 to 7.20 and 1.8 to 70.7, respectively; an indication that variables are highly skewed with heavy tails. Thus, Tukey’s ladder of powers transformation was applied that resulted to normal or nearly normal distribution having skewness values close to zero and kurtosis values have lighter tails. All data analysis from MLR modeling, variography, kriging, and cross-validations of models were implemented using the transformed data. Forward selection, backward elimination, and stepwise selection methods were adapted for predictors selection in MLR. The MLR, OK, SK, and UK were applied and cross validated for topsoil texture prediction. Likewise, exponential, Gaussian, and spherical models were fitted for the experimental variograms. Backward elimination method for clay, sand, and silt have the lowest MAE and highest R2 in MLR. The UK fitted with exponential variogram model has the highest R2 of 0.878, 0.821, and 0.893 for clay, sand, and silt, respectively. These models can be adapted as a decision support for agricultural land use planning and crop suitability development in the area.

Introduction

Dealing with global and regional challenges in land degradation, food security, water scarcity, and climate change, an accurate and updated geospatial soil information is imperatively needed [1, 2]. These problems are directly related to soil functions particularly to agricultural productivity, loss of biodiversity, and provision of water [3]. Producing accurate and reliable soil maps is indispensable in watershed management [4, 5], rangeland management, and landscape ecology [6].

The traditional soil survey process is tediously difficult to update rapidly and accurately. This process has associated significant limitations. First, significant changes in environmental conditions are not readily observed, especially when processing several variables simultaneously; secondly, the entire process must be repeated for each update that makes soil survey updates very inefficient [7]. Conventional soil survey adapts the manual process of producing a polygon-based soil map, whereby, without the computer-based approach, the map cannot be updated rapidly and accurately as the entire production procedure must be repeated [7]. Such method is time-consuming, requires numerous soil samples, and expensive [8]. Geographic information systems (GIS) can overcome this problem with the application of digital soil mapping (DSM). The DSM estimates soil properties by establishing interrelationships between soil properties and the environmental variables derived mainly from the digital elevation model (DEM) and remotely sensed images [8, 9]. Thus, the direction of DSM is toward the generation of dynamic and replicable geospatial soil information [10].

GIS algorithms have been adapted for an efficient spatial interpolation technique in land resources inventories [11, 12] in addressing the limitation of the traditional soil survey. GIS is a tool for data input, handling, analyzing and output process. It plays a significant role in spatial decision-making that involves information collection for DSM. GIS can perform several tasks using both spatial and attribute data and can integrate a variety of geographic technologies like Global Positioning System (GPS) and Remote Sensing (RS). GIS integrates spatial and geostatistical analysis, and the efficient management, storage, and retrieval of geographic data [8, 9, 13]. Thus, GIS as a tool plays a significant role in the implementation of a computer-based spatial decision making support system.

The slow progress in agricultural production and the steadily increasing population in the Philippines require the applications of this computer-based decision support system. The Philippine Department of Agriculture is still using over 40-year-old soil information in its programs for climate change mitigation and land use plan [14]. Bureau of Soils and Water Management is still in traditional soil survey method and is yet to implement DSM [15]. With these enormous challenges, there is an urgent necessity that soil maps in agricultural areas in the country be updated applying the modern technology of DSM.

Like most of the parametric test, multiple linear regression (MLR) requires normally distributed and homoscedastic variables [16, 17]. It is helpful for data analysis and inferences that both dependent and independent variables that are highly skewed [17, 18], and where standard deviations among variables significantly differ [16] should be transformed to normal or nearly normal distribution [19]. Thus, in this study, all variables were transformed to meet the conditions for normal or nearly normal distribution, minimum error and unbiased estimate [20].

Hence, this study aimed in establishing relationships between topsoil texture fractions (clay, sand, and silt) and the environmental variables by applying and comparing MLR, ordinary kriging (OK), simple kriging (SK) and universal kriging (UK) using transformed data integrating the use of free and open source software (FOSS) of R, System for Automated Geoscientific analyses (SAGA) GIS, and QGIS.

This study presents a successful approach in DSM using FOSS that are of best cost advantage to be adapted by any GIS users from a developing country like the Philippines. The achieved methodology can lead to valuable outcome in achieving a more comprehensive land use plan since the generated results are useful for watershed management particularly for ecological, hydrological, and crop suitability modeling.

Materials and methods

Materials

Synthetic Aperture Radar (SAR) DEM was acquired from MacDonald, Dettwiler and Associates, British Columbia, Canada and post-processed by the UP Training Center for Applied Geodesy and Photogrammetry, through the DOST-GIA funded Disaster Risk and Exposure Assessment for Mitigation Program which is downloadable through https://lipad.dream.upd.edu.ph/. This SAR DEM has 10 m resolution and a projection of WGS84 UTM Zone 51. A cloud-free Landsat 8 image in the study area was selected and downloaded from the United States Geological Survey data archive (https://earthexplorer.usgs.gov) for RS data. The study makes use of an adequate hardware setup with 32 GB multi-core processor, 64-bit operating system, and solid-state drive to avoid hanging or crashing [21] necessary for GIS and RS processing.

The study area

Barili, the study area, is a second income class municipality, with an annual income of ₱45 M (0.87 M USD) or more but less than ₱55 M (1.06 M USD), in the southern part of Cebu Province, Philippines. The municipality has a population of 73,862 based on 2015 census. Barili is located 60 km southwest of Cebu City (Fig. 1). It has an area of about 124 km2 (about 9.5 × 13 km) with mostly agricultural lands. The elevation ranges from sea level to 540 m. It is bordered to the northeast with Carcar City and Sibonga; to the southwest with Dumanjug and the west lies the Tañon Strait; and to the northwest with Aloguinsan. Mountain ridgeline separates the watersheds of Barili in the west and Carcar City and Sibonga in the east.

Fig. 1
figure 1

The location of the Municipality of Barili (natural color composite) as the study area in the Province of Cebu. The distribution of samples (magenta), and the mountain ridgeline (red)

Soil sampling, preparation, and analysis

A total of 177 soil samples were collected within the Municipality of Barili, Cebu, Philippines. A pit measuring approximately 1 × 1 × 1 m was dug manually in each physiographic position to examine and sample the soil profile. About 2 kg of composite soil sample were obtained from every horizon of each soil profile. Soil sampling was done randomly as shown in Fig. 1. Soil samples were air dried, freed of rocks and plant materials, ground using a wooden mallet and allowed to pass through a 2-mm sieve. Soil texture analyses were done at the Regional Soils Laboratory, Department of Agriculture, Regional Field Office VII, Cebu City, Cebu, Philippines.

Environmental predictors

A total of 37 parameters were generated and analyzed to understand their relationship with soil texture properties. Environmental variables or predictors were the preprocessed sink-filled or depressionless DEM and its derivatives, geographic coordinates, and RS data as shown in Table 1. Majority of the predictors were generated from the preprocessed sink-filled DEM using SAGA GIS modules for basic terrain analysis, terrain analysis–channels, terrain analysis–hydrology, terrain analysis–morphometry, and terrain analysis–lighting. The predictors include (i) 7 parameters by basic terrain analysis module such as analytical hillshading, aspect, channel network base level, slope length and steepness (LS) factor, profile curvature, slope, and valley depth; (ii) 4 parameters by terrain analysis–channels such as overland flow distance, horizontal overland flow distance, vertical overland flow distance, and vertical distance to channel network; (iii) 3 parameters by terrain analysis–hydrology such as catchment area, flow accumulation, and SAGA wetness index; (iv) 7 parameters by terrain analysis–morphometry such as curvature classification, mass balance index, mid slope position, multiresolution index of valley bottom flatness (MRVBF), normalized height, slope height, and terrain ruggedness index; (v) 1 parameter by terrain analysis–lighting such as sky view factor. Furthermore, slope steepness (S) factor for universal soil loss equation (USLE) was generated applying r.watershed module of geographic resources analysis support system in QGIS. Mountain ridge proximity and stream proximity were generated using proximity grid module of SAGA GIS. Location of samples by their latitude and longitude, determined by GPS, were also included as predictors.

Table 1 Environmental variables, description, descriptive statistics, and references

Remotely sensed image of operational land imager (OLI) and thermal infrared sensor (TIRS) instruments of Landsat 8 on the study area dated July 27, 2017 were also considered as additional predictors and was pre-processed using the semi-automatic classification plugin (SCP) using QGIS. The remotely sensed predictors include 7 bands (bands 1 to 7) of OLI and 2 bands (band 10 and 11) of TIRS.

Preprocessing of remotely sensed data

The SCP for QGIS was applied for Landsat 8 conversion to top of atmospheric reflectance and brightness temperature, and pansharpening [27]. Multispectral image analysis requires conversion of its “quantized and calibrated scaled digital numbers” [23] to top of atmosphere reflectance in order to achieve clear Landsat scenes [28] which is packaged in SCP. Pre-processing of Landsat 8 image was discussed in details by Congedo [27] and in the study by Mondejar and Tongco [29].

Exploratory data analysis

The exploratory data analysis consisted of three steps and was performed in R with contributing packages ‘rcompanion’ for plotting histogram and performing Tukey’s ladder of power transformations; ‘moments’ for determining skewness and kurtosis; ‘nortest’ for determining different normality tests. Step 1: statistical data distribution was checked by plotting its histogram. Step 2: determination of skewness and kurtosis before data transformation. And Step 3: perform Tukey’s ladder of power transformation.

Transformation of data

Normal distribution of data is an assumption in geostatistical analyses similar to several statistical techniques [16, 17, 30, 31]. In instances of non-normality, especially for strongly skewed data, data transformation to normality or at least symmetric distribution is needed [30, 31]. Normality test of the data can be implemented by simple evaluation of its skewness (close to 0) and kurtosis (close to the range of 1 to 3), normal q-q plot, and inferential test for normality such as Shapiro-Wilk or Kolmogorov-Smirnov tests [32].

Soil texture fractions and all of the predictors were transformed applying Tukey’s ladder of powers transformation in R to ensure normality or near normality of dependent and independent variables. In this study, skewness and kurtosis evaluation were applied. The Tukey’s ladder of powers transformation is defined as:

$$ y=\left\{\begin{array}{c}{x}^{\lambda}\kern0.1em \mathrm{if}\kern0.1em \lambda >0\\ {} logx\kern0.1em \mathrm{if}\kern0.1em \lambda =0\\ {}-\left({x}^{\lambda}\right)\mathrm{if}\lambda <0\end{array}\right\} $$
(1)

where λ is the power coefficient parameter in transforming values of a parameter to follow a normal distribution as close as possible [17, 33]. In the rcompanion package, Shapiro-Wilk tests were performed iteratively and determine lambda value “that maximizes the W statistic” [17]. Furthermore, loop function in R was adapted to determine a particular transformation that had a distribution with skewness as close to zero and kurtosis as close to the range between 1 to 3.

Predictive models

The procedures tested to predict soil properties (sand, silt, and clay) were MLR, OK, SK and UK. The geostatistical procedures were implemented in SAGA GIS software [34]. MLR has been widely used to predict the response of a dependent variable from a set of independent variables, as a function of the correlations between them. The MLR analyses were executed in SAGA GIS applying forward selection, backward elimination, and stepwise selection methods, fitting the model by identifying variables which have most significance according to 95% confidence level.

MLR for prediction

Multiple regression seeks to determine the equation that best predicts the dependent variable Y as a linear function of a set of independent variables X where the linear relationship is expressed as:

$$ \hat{Y}=a+{b}_1{X}_1+{b}_2{X}_2+{b}_3{X}_3\dots .+\epsilon $$
(2)

where \( \hat{Y} \) is the estimated value of Y for a given set of X values; a is the intercept; b1 is the estimated slope (partial regression coefficient) of a regression of Y on X1, considering the rest of X variables to be held constant, likewise for b2, b3, and so on; and ϵ is the error term [16]. In this study, the coefficient of multiple determination (R2) is considered in the goodness-of-fit of a linear model for cross validation of models determined by the following equation as:

$$ {R}^2=\frac{\sum_{i=1}^n{\left({\hat{y}}_i-\overset{\hbox{'}}{y}\right)}^2}{\sum_{i=1}^n{\left({y}_i-\overset{\hbox{'}}{y}\right)}^2} $$
(3)

where \( {\hat{y}}_i \) as the predicted or fitted values of the dependent variable, \( \overset{\hbox{'}}{y} \) is the mean of the observed values of the dependent variable, yi as the observed values of the dependent variable, and n is the number of observed values of the data set.

Relative importance of environmental predictors

Relative importance is the “quantification of an individual regressor’s contribution to a multiple regression model” [35]. In this study, relative importance metrics are forced to percentages or summed to 100%. Gromping [35] calls this “lmg” as proposed by Lindeman et al. [36]. This is achieved using package “relaimpo” in R by the following code: calc.relimp (linmod, type = c(“lmg”), rela = TRUE).

Geostatistical analyses

Geostatistical analyses in this study were done making use of SAGA GIS 7 adapting the spatial and geostatistics-kriging module for grid interpolation from irregular sample points with variogram dialog. DSM is one of the many useful applications in SAGA GIS for soil science [34]. Variogram models were first generated before kriging techniques for the predictive models of soil fractions.

Semivariogram

Kriging is based on the idea of spatial correlation or dependence that the degree of influence of observed points to unobserved points is inversely related to distance [20, 37, 38]. The first essential step in kriging is the calculation of empirical variogram before any geostatistical interpolation. In this study, isotropic semivariogram is determined considering only on the effect of h [39]. A semivariogram model estimates the relationship between semivariances or the differences between neighboring values and separation distance [40], such as the spherical or exponential models. The empirical semivariogram formula is as follows:

$$ \gamma (h)=\frac{1}{2N(h)}{\sum}_{i=1}^{N(h)}{\left[Z\left({x}_i+h\right)-Z\left({x}_i\right)\right]}^2 $$
(4)

where γ(h) is the semivariance of the lag distance between two sample points; N(h) is the number of observation pairs of random variables separated by distance h; and Z(xi) and Z(xi + h) are the values of variables located at xi and xi + h, respectively [39, 41, 42]. The empirical semivariogram for each soil texture fraction was fitted with theoretical semivariogram models in SAGA GIS using its variogram module. In this study, the spherical, exponential and Gaussian models were fitted into the empirical variograms for the dependent soil fraction variables (clay, sand, silt).

Spherical:

$$ \gamma (h)=0\ \mathrm{if}\ h=0 $$
(5)
$$ \gamma (h)=n+\left(s-n\right)\left[\frac{3h}{2r}-\frac{1}{2}{\left(\frac{h}{r}\right)}^3\right]\mathrm{if}\ 0<h<r $$
(6)
$$ \gamma (h)=n+\left(s-n\right)\ \mathrm{if}\ h>r $$
(7)

Exponential:

$$ \gamma (h)=n+\left(s-n\right)\left[1-\mathit{\exp}\left(-\frac{h}{r}\right)\right] $$
(8)

Gaussian:

$$ \gamma (h)=n+\left(s-n\right)\left[1-\mathit{\exp}\left(-{\left(\frac{h}{r}\right)}^2\right)\right] $$
(9)

where n is the nugget effect at zero distance h, s is the sill, and r is the range [39, 43,44,45].

SK

Generally, weighted average of observed data points is an approach of spatial interpolation that is expressed as:

$$ \gamma (h)=\frac{1}{2N(h)}{\sum \limits}_{i=1}^{N(h)}{\left[Z\left({x}_i+h\right)-Z\left({x}_i\right)\right]}^2 $$
(10)

The empirical semivariogram for each soil texture fraction was fitted with theoretical semivariogram models in SAGA GIS using its variogram module. In this study, the spherical, exponential and Gaussian models were fitted into the empirical variograms for the dependent soil fraction variables (clay, sand, silt). The approximation of SK is based on the equation:

$$ {\hat{z}}_{sk}\left({x}_o\right)={\sum \limits}_{i=1}^n{\lambda}_iz\left({x}_i\right)+\left[1-{\sum \limits}_{i=1}^n{\lambda}_i\right]\mu $$
(11)

where μ is a given stationary mean of the observed values that is assumed to remain constant throughout the domain, λi is the kriging weight assigned at sampled locations [31, 46, 47] wherein the kriging weights are estimated by minimizing the variance [46].

OK

The OK is analogous to SK but OK involves that the summation of weights equals to one, such that \( \left[1-{\sum}_{i=1}^n{\lambda}_i\right]=0 \), with an accompanying Lagrange parameter ψ. Additionally, μ is not constant but it is recalculated within the search window across the modeled area of interest [46, 47]. Eventually, OK is estimated as:

$$ {\hat{z}}_{ok}\left({x}_o\right)={\sum \limits}_{i=1}^n{\lambda}_iz\left({x}_i\right) $$
(12)

In order to obtain unbiased estimations for OK, the following equations are solved simultaneously:

$$ {\sum \limits}_{j=1}^n{\lambda}_j\gamma \left({x}_i,{x}_j\right)+\psi =\gamma \left({x}_i,{x}_0\right) $$
(13)
$$ {\sum \limits}_{j=1}^n{\lambda}_j=1 $$
(14)

where γ(xi, xj) is the value of the variogram between two points xi and xj, ψ and is the Lagrange parameter for the minimization of kriging variance [31, 37, 41].

UK

OK and SK assume stationary process considering the mean. However, in other cases, spatial data are not stationary in the mean [31, 48] which is being considered in UK. Geostatistical literatures have used the terms UK, kriging with external drift, and regression kriging that are essentially adapting similar techniques [40, 49]. In UK, prediction at unvisited location is estimated by combining the predicted drift and residuals as:

$$ \hat{z}\left({x}_0\right)=u\left({x}_0\right)+\hat{e}\left({x}_0\right) $$
(15)

where drift or trend u(x0), a linear regression mean, is fitted by linear regression analysis, and \( \hat{e}\left({x}_0\right) \) is the interpolated residuals [40, 44, 49]. The trend can be expressed as a functional form:

$$ u(x)={\sum}_{k=0}^K{\beta}_k{f}_k(x) $$
(16)

where βk, k = 0, 1, …, K are unknown coefficients while fk(x) are known functions of x [31, 48]. In this study, the best model resulted from MLR analyses for each soil fraction was applied and predictions at unobserved locations are estimated by the following equation:

$$ {\hat{z}}_{uk}\left({x}_o\right)={\sum}_{i=1}^n{\lambda}_i\left({\sum}_{k=0}^K{\beta}_k{f}_k\left({x}_i\right)+\hat{e}\left({x}_i\right)\right) $$
(17)

Cross validation

Cross validation technique was adopted for evaluating and comparing the performance of models. The models were evaluated applying k-fold cross validation due to the absence of validation data set and small size of observed samples for model validation [50, 51]. In this study, 5-fold cross validation was adapted. There is no strict rule on the choice of k, but 5 or 10 is usually applied [51]. Cross validation of models was done in R applying caret package. The mean absolute error (MAE), root mean squared error (RMSE), and R2 were determined to evaluate the accuracy of models [8, 52, 53].

$$ MAE=\frac{1}{n}{\sum}_{i=n}^1\left|{y}_i-{\hat{y}}_i\right| $$
(18)
$$ RMSE=\sqrt{\frac{\sum_{i=1}^n{\left({y}_i-{\hat{y}}_i\right)}^2}{n}} $$
(19)

where yi is observed value, \( {\hat{y}}_i \) is the predicted value, and n is the number of samples.

Additionally, MLR models require homoscedasticity and normality of residuals [6, 54]. Thus, a scatter plot of residuals and predicted values were visually evaluated for homoscedasticity using plot function where a horizontal line at zero was added using abline function in R. Additionally, its normality test (skewness, kurtosis, Lilliefors test) along with its quantile-quantile (QQ) plot was determined in R applying plot diagnostics of plot function. Hence, homoscedasticity and normality of residuals were critical validation criteria as being meet by each model in this study.

Results and discussion

Exploratory data analyses

In this study, geostatistical analysis of data started with summary statistics, data transformation (Table 2), MLR modeling, variography, kriging, and mapping. The exploratory analysis revealed no outlier in the soil texture fractions (clay, sand, silt) and environmental variables. Thus, all 177 samples were used for the spatial interpolation. A statistical summary of the untransformed environmental variables (Table 1) shows high coefficient of variation between −3609 to 224.7%. Skewness and kurtosis values vary greatly between −3.85 to 7.20 and 1.77 to 70.7, respectively. Indicating that the variables are highly skewed with heavy tails, thus, do not fit the normality assumption. The Tukey’s ladder of powers transformation was applied in transforming the data that resulted in normal or nearly normal distribution having skewness values close to zero and kurtosis values have thinner tails (Table 2). Consequently, all data analyses from MLR modeling, variography, kriging, and cross validations of models were implemented using the transformed data.

Table 2 Descriptive statistics of transformed variables and the corresponding transformation

MLR modeling

The MLR analyzed the relationships between the transformed topsoil texture fractions and the transformed environmental variables. Table 3 shows the MLR results where forward selection, backward elimination, and stepwise selection methods were applied and cross-validated. At this stage, MLR established correlation between transformed dependent and independent variables. The dependent variables were the transformed values of soil texture fractions and the independent variables were the transformed values of environmental predictors. All predictors selection methods in MLR, in this study, revealed Lilliefors test values greater than 0.05. These were indications that the transformation of values both for dependent and independent variables resulted in an improvement of normality and variance stability [19], minimum error and unbiased estimate [20]. The predictors of selection methods with most homogeneity of variance (homoscedasticity) in terms of residuals were adapted as predictor grids, respectively, for clay, sand and silt fractions for UK as implemented in SAGA. The cross validation of MLR models are shown in Table 3 and their relative importance will be subsequently discussed.

Table 3 Cross validation of models by MLR applying different predictors selection methods

Cross validation of MLR models

Prior to kriging, evaluation of regression models was implemented. Table 3 shows the cross validation of models applying different regression selection methods and the normality statistics of residuals. The MAE, RMSE, homoscedasticity and normality (skewness, kurtosis and Lilliefors test) test of residuals were considered in assessing the accuracy performance of linear predictive models. Homoscedasticity and normality test of residuals were applied in order to verify and satisfy the assumption of homogeneity of variance [54] towards choosing the best linear unbiased estimator. All results of the regression models were coupled with kriging for the implementation of UK to further evaluate their performance and be able to determine the best unbiased models. Stepwise selection method has the advantage of having the least number of significant predictors, while forward selection method had the lesser number of significant predictor compared to backward elimination method.

Based on error metrics, performance of forward and stepwise selection methods for clay fraction were the same because they revealed the same significant predictors; this was also observed for sand fraction. Backward elimination method had the least MAE and RMSE for each of the soil texture fractions, and, consequently, it has the highest R2 (Table 3). Figure 2 shows scatter plots of residuals and predicted values for homoscedasticity examination. As can be observed in any of the residuals plots, homoscedasticity was not violated by any of the regression models applying forward selection, backward elimination, and stepwise selection methods for clay, sand, and silt texture fractions as linearity (red lines) assumption was met having no obvious pattern wherein the residuals scattered randomly and symmetrically clustering towards the middle of the plot. For a graphic examination of the normality of residuals, Fig. 3 shows QQ-plots of residuals of the MLR models applying forward selection, backward elimination, and stepwise selection methods. Graphically, different selection methods revealed similar distribution of residuals where, generally noticeable, most of the points fall closely either below or above the reference line (red), and no obvious presence of outliers. The regression models for sand fraction have the most points that lie very close to the reference (red) line. This was also statistically confirmed in cross validation statistics (Table 3) where sand fraction regression models have the least skewness and kurtosis values. Nevertheless, none of the residuals distribution deviates greatly from normality and that residuals of all regression models are statistically normally distributed based on Lilliefors normality test (Table 3).

Fig. 2
figure 2

Residuals plots of multiple linear regression models applying forward (F) selection, backward (B) elimination, and stepwise (S) selection methods

Fig. 3
figure 3

QQ plots of residuals of the multiple linear regression models applying forward (F) selection, backward (B) elimination, and stepwise (S) selection methods

Relative importance of predictors

Figure 4 shows that stream proximity, elevation, mountain ridge proximity, channel network base level, flow accumulation, band 4, band 11, catchment area, vertical overland flow distance, overland flow distance, slope, and curvature classification were the significant predictors in this study.

Fig. 4
figure 4

Relative importance of predictors

Stream proximity was found to be the most important predictor in this study. Pahlavan-Rad and Akbarimoghaddam [8] found stream proximity or distance from the river as the most important variable in the spatial variability of soil texture fractions in a flood plain. For clay fraction, elevation, mountain ridge proximity, and channel network base level were about equally second most important predictors. Likewise, slope and stream proximity were about equally important for silt. Mountain ridge proximity or distance to the mountain ridge is a predictor probably first introduced by this study for DSM for soil texture fractions prediction. Mountain ridgeline is a peak elevation line that delineates watersheds at opposite sides of a mountain (leeward and wayward). The results suggest that stream proximity, mountain ridge proximity, and slope are the most important predictors that influence topsoil texture spatial variation, particularly for clay and sand fractions, in the study area. Remotely sensed data, such as Landsat 8, where band 4 and band 11 were found important, pointed out that optical imagery are potential auxiliary variables in enhancing prediction of topsoil texture fractions [37]. Flow accumulation, as in the study conducted by Castro-Franco et al. [22], was also the least important in the spatial variability of silt fraction at the southern Argentine Pampas.

Geospatial interpolation

After knowing significant predictors for each of the soil texture fractions (clay, sand, and silt), predictions of soil texture fractions were further evaluated applying kriging techniques. Likewise, linear regression was enhanced by combining regression with kriging. Prior to kriging, for every soil fraction, the experimental semivariogram was determined to examine the spatial dependence of dependent variables within the observed data points [6]. As a variogram model was required for geo-statistical interpolation, the variogram models that related to the best results in cross validation were the best fitted variogram models correspondingly for clay, sand, and silt. Hence, predefined models such as exponential, Gaussian, and spherical were fitted to the empirical semivariograms (Table 4 and Fig. 5). The mentioned variography was done in SAGA applying its variogram module.

Table 4 The variogram parameters of soil texture fractions applying exponential, Gaussian and exponential models
Fig. 5
figure 5

Best fitted variogram models (red) for clay, sand, and silt

Semivariogram

The nugget to sill (N:S, C0/[C0 + C]) ratio or its complementary structured part to sill ratio is considered as a spatial dependency criterion of a semivariogram [6]. The smaller the N:S ratio or the higher the structured part to sill ratio, the stronger the spatial dependency or autocorrelation [6, 55]. The N:S ratios for clay, sand, and silt varies within 0.11–0.22, 0.14–0.24, and 0.08–0.17, respectively, for the three variogram models (Table 4). These indicated that topsoil texture fractions have a strong spatial dependency, especially for silt, considering that N:S ratios are lesser than 0.25 [6, 55, 56]. Nugget effects of exponential and spherical models were lesser than the Gaussian model. Effective range of exponential, Gaussian, and spherical were ascending in order. Effective ranges of exponential and Gaussian models do not differ greatly, while effective range of spherical model is more or less twice the Gaussian model.

Performance of kriging techniques

The variogram parameters, as shown in Table 4, were adapted for OK, SK, and UK. All linear regression models considered in this study were coupled with UK. In the same manner, the performance of OK, SK, and UK applying exponential, Gausian, and spherical variaogram models were cross validated in terms of MAE, RMSE, and R2 separately for clay, sand, and silt texture fractions (Tables 5, 6 and 7). Since forward and stepwise selection methods revealed the same significant predictors for clay and, likewise, for sand for the MLR (Table 3), UK-Stepwise in Tables 5 and 6 also represent UK-Forward for clay and sand, respectively.

Table 5 Cross validation of OK, SK, and UK applying exponential, Gaussian, and spherical variogram models applied to Clay texture fraction
Table 6 Cross validation of OK, SK, and UK applying exponential, Gaussian, and spherical variogram models applied to Sand texture fraction
Table 7 Cross validation of OK, SK, and UK applying exponential, Gaussian, and spherical variogram models applied to Silt texture fraction

Generally, kriging techniques (Tables 5, 6 and 7) have better prediction accuracy than the linear regression models (Table 3) in terms of MAE, RMSE, and R2. The performance of kriging techniques fitted with exponential or spherical variogram models was comparable, while kriging techniques fitted with Gaussian models were likewise at par (Tables 5, 6 and 7). As observed, kriging techniques fitted with Gaussian models have low prediction accuracy compared to kriging techniques fitted with exponential or spherical variogram models. Comparing MLR with kriging techniques, the respective MAE of clayMLR, sandMLR, and siltMLR were about 2.5, 2, and 2.8 times as high compared to kriging techniques fitted with exponential or spherical variogram models. Similarly, MAEs of clayMLR, sandMLR, and siltMLR were about 1.4, 1.2, and 1.6 times as high compared to kriging techniques fitted with Gaussian variogram model. In other words, R2 values of kriging techniques with exponential or spherical variogram model were about 3, 3.8, and 3.8 times as high compared to clayMLR, sandMLR, and siltMLR, respectively, while, R2 results of kriging techniques with Gaussian variogram model were about 2.1, 2, and 3.1 times as high compared to clayMLR, sandMLR, and siltMLR. Kriging technique with Gaussian variogram model had the least performance among the different kriging techniques applied in this study.

For clay texture fraction, UK with predictors by backward elimination method and exponential variogram model (ClayUK-back-expo) revealed the lowest MAE and RMSE of 0.126 and 0.178, respectively. Consequently, Clay-UK-back-expo exhibited the highest R2 of 0.878 among the models presented in (Table 5). Moreover, graphical evaluation of residuals revealed significant improvement from MLR to UK. Residuals by UK reduced significantly closer towards the red line (Fig. 6). Thus, ClayUK-back-expo was considered the final model for transformed clay texture fraction as shown in Fig. 7, while Fig. 8 represents the final model of clay after back transformation.

Fig. 6
figure 6

Homoscedasticity and improvement of residuals from multiple linear regression (MLR) with backward selection method to universal kriging (UK) with exponential variogram applied to Clay, Sand, and Silt texture fractions

Fig. 7
figure 7

Final Model of Transformed Clay by UK with predictors by backward elimination method and exponential variogram

Fig. 8
figure 8

Final Model of Clay after back transformation

Likewise, for sand texture fraction, UK with predictors by backward elimination method and exponential variogram model (SandUK-back-expo) revealed the lowest MAE and RMSE of 0.138 and 0.183, respectively. Consequently, SandUK-back-expo exhibited the highest R2 of 0.821 among the models presented in (Table 6). Furthermore, graphical visualization of residuals has shown significant improvement from MLR to UK. Residuals by UK reduced significantly closer towards the red line (Fig. 6). Thus, SandUK-back-expo was considered the final model for sand texture fraction as shown in Fig. 9, while Fig. 10 represents the final model of sand after back transformation.

Fig. 9
figure 9

Final Model of Transformed Sand by UK with predictors by backward elimination method and exponential variogram

Fig. 10
figure 10

Final Model of Sand after back transformation

Similarly, for silt texture fraction, UK with predictors by backward elimination method and exponential variogram model (SiltUK-back-expo) revealed the lowest MAE and RMSE of 0.141 and 0.221, respectively. Consequently, SiltUK-back-expo exhibited the highest R2 of 0.8929 among the models presented in (Table 7). Moreover, graphical evaluation of residuals revealed significant improvement from MLR to UK. Residuals by UK reduced significantly closer towards the red line (Fig. 6). Thus, SiltUK-back-expo was considered the final model for silt texture fraction as shown in Fig. 11, while Fig. 12 represents the final model of silt after back transformation.

Fig. 11
figure 11

Final Model of Transformed Silt by UK with predictors by backward elimination method and exponential variogram

Fig. 12
figure 12

Final model of Silt after back transformation

For the three soil texture fractions of clay, sand, and silt, UK with predictors by backward elimination method and exponential variogram model performed the highest predictive accuracy. This is an indication that the datasets are characterized by a linear trend, thus, UK is recommended to attain more fitting mapping results [44]. Nevertheless, performance of UK with predictors by stepwise selection method and exponential variogram model for clay, sand, and silt texture fractions was at par as shown from Tables 5, 6 and 7. Thus, UK can be considered a hybridized geospatial interpolation technique that coupled significant predictors from regression model and variography of regression residuals [57, 58]. However, in general, performance of OK and SK was comparable to UK (Tables 5, 6 and 7). Hence, for a straightforward and immediate estimate, OK and SK can be adapted [49]. Furthermore, it is observed that the stronger the spatial dependency or autocorrelation, the higher the R2. Hence, high R2 values of OK, SK, and UK were strongly attributed to high spatial dependency or autocorrelation.

Conclusions

Comprehending geographical distribution and accurate predictive mapping of topsoil texture fractions at a municipal scale are essential for watershed management, soil and water conservation, hydrological and crop suitability modeling. This fundamental soil texture geo-information processing approach is among the vital steps towards attaining a comprehensive land use plan. Spatial and geostatistics modeling, specifically MLR and kriging techniques (OK, SK, and UK), were fitted to predict topsoil texture fractions particularly for clay, sand, and silt. Data from sink-filled DEM derived environmental variables combined with remotely sensed images of Landsat 8 generated satisfactory results. Mountain ridge proximity, not considered among the predictors in DSM before but now, was found to be a significant predictor in this study. Further investigation should be conducted on the other side of the mountain ridge to understand better its predictive importance. Geospatial interpolations by kriging techniques were more accurate than MLR. The OK, SK, and UK fitted with Gaussian variogram model were less suitable for predicting soil texture fractions whereas, UK coupled with predictors by backward elimination method and fitted with exponential variogram portrayed the most accurate in predicting topsoil texture fractions for clay, sand, and silt compared to any of the spatial and geostatistical modeling applied in this study. The results show that the transformation of dependent and independent variables provides homoscedasticity and normality of residuals to ensure unbiased estimation. With the significant cost advantage of this study on the use of FOSS, the models obtained by DSM revealed several benefits particularly in storing, retrieving and reproducing reliable geospatial information and ease of updating data and analysis. This study presents a successful approach in DSM with the use of SAGA GIS, QGIS, and R that are of best cost advantage to be adapted by any GIS users from a developing country like the Philippines. Moreover, additional soil samples and field validations of results are recommended for future research with the aim of improving accuracy and reliability of the models. Likewise, application of DSM should be extended in other municipalities throughout the central Visayas region and the rest of the country. The achieved methodology in this study is of great importance to decision makers at municipal, provincial, and regional scale for a very valuable outcome in achieving a more comprehensive land use plan since the generated results are useful for watershed management particularly for ecological, hydrological, and crop suitability modeling.

Availability of data and materials

The datasets generated and analyzed in this study are submitted together with the manuscript. Such datasets are available from the corresponding author on reasonable request.

References

  1. McBratney A, Field DJ, Koch A. The dimensions of soil security. Geoderma. 2014;213:203–13.

    Google Scholar 

  2. Zhang GL, Liu F, Song XD. Recent progress and future prospect of digital soil mapping: a review. J Integr Agr. 2017;16:2871–85.

    Google Scholar 

  3. Koch A, McBratney A, Adams M, Field D, Hill R, Crawford J, et al. Soil security: solving the global soil crisis. Glob Policy. 2013;4:434–41.

    Google Scholar 

  4. Pinheiro HSK, de Carvalho W, Chagas CD, dos Anjos LHC, Owens PR. Prediction of topsoil texture through regression trees and multiple linear regressions. Rev Bras Cienc Solo. 2018;42:1–21.

    Google Scholar 

  5. Khalil RZ, Khalid W, Akram M. Estimating of soil texture using landsat imagery: a case study of Thatta tehsil, Sindh. In: IEEE International Geoscience and Remote Sensing Symposium. Beijing; 2016 Jul 10–15.

    Google Scholar 

  6. Hosseini SZ, Kappas M, Bodaghabadi MB, Chahouki MAZ, Khojasteh ER. Comparison of different geostatistical methods for soil mapping using remote sensing and environmental variables in Poshtkouh rangelands. Iran Pol J Environ Stud. 2014;23:737–51.

    Google Scholar 

  7. Zhu AX, Hudson B, Burt J, Lubich K, Simonson D. Soil mapping using GIS, expert knowledge, and fuzzy logic. Soil Sci Soc Am J. 2001;65:1463–72.

    Google Scholar 

  8. Pahlavan-Rad MR, Akbarimoghaddam A. Spatial variability of soil texture fractions and pH in a flood plain (case study from eastern Iran). Catena. 2018;160:275–81.

    Google Scholar 

  9. de Carvalho W, Lagacherie P, Chagas CD, Calderano B, Bhering SB. A regional-scale assessment of digital mapping of soil attributes in a tropical hillslope environment. Geoderma. 2014;232–4:479–86.

    Google Scholar 

  10. Moller M, Volk M. Effective map scales for soil transport processes and related process domains - statistical and spatial characterization of their scale-specific inaccuracies. Geoderma. 2015;247–8:151–60.

    Google Scholar 

  11. Bhunia GS, Shit PK, Maiti R. Comparison of GIS-based interpolation methods for spatial distribution of soil organic carbon (SOC). J Saudi Soc Agric Sci. 2018;17:114–26.

    Google Scholar 

  12. Shit PK, Bhunia GS, Maiti R. Spatial analysis of soil properties using GIS based geostatistics models. Model Earth Syst Env. 2016;2:1–6.

    Google Scholar 

  13. Ließ M, Glaser B, Huwe B. Uncertainty in the spatial prediction of soil texture comparison of regression tree and random Forest models. Geoderma. 2012;170:70–9.

    Google Scholar 

  14. Galvez JK. DA to produce new soils map in 45 days. The Manila Times; 2016.

    Google Scholar 

  15. Tejada SQ, Carating RB. Status of digital soil mapping in the Bureau of Soils and Water Management. In: Advancing the Science and Technology of Soil Information in Asia — Launch of the Global Soil Partnership’s Asia Soil Science Network and GlobalSoilMap.net East Asia Node. Nanjing; 2012 Feb 8–11.

  16. McDonald JH. Handbook of biological statistics. 3rd ed. Baltimore: Sparky House Publishing; 2014.

    Google Scholar 

  17. Mangiafico SS. Summary and analysis of extension program evaluation in R, version 1.18.1. New Brunswick: Rutgers Cooperative Extension; 2016.

  18. Schmider E, Ziegler M, Danay E, Beyer L, Bühner M. Is it really robust? Reinvestigating the robustness of ANOVA against violations of the normal distribution assumption. Methodology Eur. 2010;6:147–51.

    Google Scholar 

  19. Tsai AC, Liou M, Simak M, Cheng PE. On hyperbolic transformations to normality. Comput Stat Data An. 2017;115:250–66.

    MathSciNet  MATH  Google Scholar 

  20. Zhang SW, Shen CY, Chen XY, Ye HC, Huang YF, Lai S. Spatial interpolation of soil texture using compositional kriging and regression kriging with consideration of the characteristics of compositional data and environment variables. J Integr Agr. 2013;12:1673–83.

    Google Scholar 

  21. Dempsey C. Computer specs for GIS work. Santa Clara: GIS Lounge; 2013.

  22. Castro-Franco M, Domenech MB, Borda MR, Costa JL. A spatial dataset of topsoil texture for the southern argentine pampas. Geoderma Reg. 2018;12:18–27.

    Google Scholar 

  23. USGS. Landsat 8 (L8) data users handbook. Version 4.0. Reston: US Geological Survey; 2019.

  24. Duarte L, Teodoro AC, Goncalves JA, Soares D, Cunha M. Assessing soil erosion risk using RUSLE through a GIS open source desktop and web application. Environ Monit Assess. 2016;188:351.

    Google Scholar 

  25. Dobarco MR, Orton TG, Arrouays D, Lemercier B, Paroissien JB, Walter C, et al. Prediction of soil texture using descriptive statistics and area-to-point kriging in region Centre (France). Geoderma Reg. 2016;7:279–92.

    Google Scholar 

  26. de Smith MJ, Goodchild MF, Longley PA. Geospatial analysis: a comprehensive guide to principles, techniques and software tools. 6th ed. Edinburgh: The Winchelsea Press; 2018.

    Google Scholar 

  27. Congedo L. Semi-automatic classification plugin documentation. Release 5.0.0.1. 2016.

  28. USGS. Landsat 7 (L7) data users handbook. Reston: US Geological Survey; 2018.

  29. Mondejar JP, Tongco AF. Near infrared band of Landsat 8 as water index: a case study around Cordova and Lapu Lapu City, Cebu, Philippines. Sustain Environ Res. 2019;29:16.

  30. Samira I, Ahmed D, Lhoussaine M. Soil fertility mapping: comparison of three spatial interpolation techniques. Int J Eng Res Technol. 2014;3:1635–43.

    Google Scholar 

  31. Webster R, Oliver MA. Geostatistics for environmental scientists. Chichester: John Wiley & Sons, Ltd; 2007.

  32. Osborne JW. Best practices in data cleaning: a complete guide to everything you need to do before and after collecting your data. Los Angeles: SAGE Publications, Inc.; 2013.

    Google Scholar 

  33. Scott DW. Tukey ladder of powers. In: lane DM, editor. Introduction to statistics. Online Statistics Education: A Multimedia Course of Study. http://onlinestatbook.com/Online_Statistics_Education.pdf. Accessed 21 Dec 2018.

  34. Conrad O, Bechtel B, Bock M, Dietrich H, Fischer E, Gerlitz L, et al. System for automated Geoscientific analyses (SAGA) v. 2.1.4. Geosci Model Dev. 2015;8:1991–2007.

    Google Scholar 

  35. Gromping U. Relative importance for linear regression in R: the package relaimpo. J Stat Softw. 2006;17:1–27.

    Google Scholar 

  36. Lindeman RH, Merenda PF, Gold RZ. Introduction to bivariate and multivariate analysis. Glenview: Scott Foresman & Co; 1980.

  37. Liao KH, Xu SH, Wu JC, Zhu Q. Spatial estimation of surface soil texture using remote sensing data. Soil Sci Plant Nutr. 2013;59:488–500.

    Google Scholar 

  38. Wang Z, Shi WJ. Mapping soil particle-size fractions: a comparison of compositional kriging and log-ratio kriging. J Hydrol. 2017;546:526–41.

    Google Scholar 

  39. Sari RKN, Pasaribu US. The comparison of isotropic and anisotropic semivariogram for gauss model. AIP Conf Proc. 2014;1589:508–11.

    Google Scholar 

  40. Hengl T. A practical guide to geostatistical mapping of environmental variables. Luxembourg: European Communities; 2007.

  41. Delbari M, Afrasiab P, Loiskandl W. Geostatistical analysis of soil texture fractions on the field scale. Soil Water Res. 2011;6:173–89.

    Google Scholar 

  42. de Menezes MD, Silva SHG, de Mello CR, Owens PR, Curi N. Spatial prediction of soil properties in two contrasting physiographic regions in Brazil. Sci Agr. 2016;73:274–85.

    Google Scholar 

  43. Shaffer JM. The effects of spatial resolution on digital soil attribute mapping [Master’s thesis]. Columbus: The Ohio State Univ; 2013.

    Google Scholar 

  44. Kiš IM. Comparison of ordinary and universal Kriging interpolation techniques on a depth variable (a case of linear spatial trend), case study of the Šandrovac Field. Min Geol Petrol Eng Bull. 2016;31:41–58.

    Google Scholar 

  45. Haldar SK. Statistical and geostatistical applications in geology. In: Haldar SK, editor. In: mineral exploration: principles and applications. Waltham: Elsevier; 2013. p. 157–82.

    Google Scholar 

  46. Daya AA, Bejari H. A comparative study between simple kriging and ordinary kriging for estimating and modeling the cu concentration in Chehlkureh deposit. SE Iran Arab J Geosci. 2015;8:6003–20.

    Google Scholar 

  47. Mpanza M. A comparison of ordinary and simple Kriging on a PGE resource in the eastern limb of the Bushveld complex [Master’s thesis]. Johannesburg: Univ of the Witwatersrand; 2015.

    Google Scholar 

  48. Lichtenstern A. Kriging methods in spatial statistics [Bachelor’s thesis]. Munich: Technical Univ of Munich; 2013.

    Google Scholar 

  49. Hengl T, Heuvelink GBM, Stein A. A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma. 2004;120:75–93.

    Google Scholar 

  50. Ballabio C, Panagos P, Monatanarella L. Mapping topsoil physical properties at European scale using the LUCAS database. Geoderma. 2016;261:110–23.

    Google Scholar 

  51. Kuhn M, Johnson K. Applied predictive modeling. 1st ed. New York: Springer; 2013.

    MATH  Google Scholar 

  52. Yang RM, Liu F, Zhang GL, Zhao YG, Li DC, Yang JL, et al. Mapping soil texture based on field soil moisture observations at a high temporal resolution in an oasis agricultural area. Pedosphere. 2016;26:699–708.

    Google Scholar 

  53. Zhang LM, Liu YL, Li XD, Huang LB, Yu DS, Shi XZ, et al. Effects of soil map scales on simulating soil organic carbon changes of upland soils in eastern China. Geoderma. 2018;312:159–69.

    Google Scholar 

  54. Heil K, Schmidhalter U. Improved evaluation of field experiments by accounting for inherent soil variability. Eur J Agron. 2017;89:1–15.

    Google Scholar 

  55. Omran ELE. Improving the prediction accuracy of soil mapping through geostatistics. Int J Geosci. 2012;3:574–90.

    Google Scholar 

  56. Hu XS, Xu HQ. A new remote sensing index for assessing the spatial heterogeneity in urban ecological quality: a case from Fuzhou City. China Ecol Indic. 2018;89:11–21.

    Google Scholar 

  57. Ma YX, Minasny B, Wu CF. Mapping key soil properties to support agricultural production in eastern China. Geoderma Reg. 2017;10:144–53.

    Google Scholar 

  58. Wackernagel H. Multivariate geostatistics: an introduction with applications. 3rd ed. Berlin: Springer; 2003.

    MATH  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the U.S. Geological Survey Earth Resources Observation and Science Center (http://earthexplorer.usgs.gov/) for providing free images; to the UP Training Center for Applied Geodesy and Photogrammetry for providing free access of SAR DEM which is downloadable through https://lipad.dream.upd.edu.ph/. The authors also acknowledge the graduate scholarship funding from DOST-ERDT, Philippines that have made this research endeavour possible; to the University of San Carlos – Talamban Campus as the delivering institution and to the Cebu Technological University – Barili Campus as the sending institution.

Funding

The Department of Science and Technology – Engineering Research and Development Technology (DOST-ERDT) provided the graduate scholarship fund that is course through the University of San Carlos – Talamban Campus as the delivering institution. The Cebu Technological University through its General Appropriations Act provided the funds for data collection and soil analysis. Interpretation of data, modeling, and writing of the manuscript were done by the authors.

Author information

Authors and Affiliations

Authors

Contributions

JPM perceived and implemented the idea of applying digital soil mapping using open source software. JPM supervised soil sampling and facilitated soil analysis. AFT initiated image pre-processing. All authors performed the exploratory data analyses, data transformation, predictive modeling, and writing and revising of manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jeremy P. Mondejar.

Ethics declarations

Competing interests

The authors declare they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mondejar, J.P., Tongco, A.F. Estimating topsoil texture fractions by digital soil mapping - a response to the long outdated soil map in the Philippines. Sustain Environ Res 29, 31 (2019). https://doi.org/10.1186/s42834-019-0032-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s42834-019-0032-5

Keywords