TY - CONF
T1 - Comparing Different Approaches - Data Mining, Geostatistic, and Deterministic Pedology - to Assessthe Frequency of WRB Reference Soil Groups in the Italian Soil Regions
AU - Fantappie', Maria
PY - 2014
Y1 - 2014
N2 - The assessment of class frequency in soil map legends is affected by uncertainty, especially at small scales, where generalization islarger. The aim of this study was to test the hypothesis that data mining or geostatistic techniques provide better estimation of classfrequency than traditional deterministic pedology in a national soil map.In the map of Italian soil regions compiled at 1:5,000,000 reference scale, soil classes were the WRB Reference Soil Groups(RSGs). Different data mining techniques, namely neural networks, random forests, boosted tree, classification and regression tree,supported vector machine (SVM), were tested and the last one gave the best RSGs predictions, using selected auxiliary variablesand 22,015 classified soil profiles. Given the categorical target variable, the multi-collocated indicator cokriging was the algorithmchosen for the geostatistic approach. The first five more frequent RSGs resulting from the three methods were compared. Theoutcomes were validated with a Bayesian approach on a subset of 10% of geographically representative profiles, kept out beforethe elaborations.The most frequent classes were uniformly predicted by the three methods, which instead differentiated notably for the classes witha lower occurrence. The Bayesian validation indicated that the SVM method was as reliable as the multi-collocated indicatorcokriging, and both more than the deterministic pedological approach. An advantage of the SVM was the possibility to use numericand categorical variable in the same elaboration, without any previous transformation, which notably reduced the processing time.
AB - The assessment of class frequency in soil map legends is affected by uncertainty, especially at small scales, where generalization islarger. The aim of this study was to test the hypothesis that data mining or geostatistic techniques provide better estimation of classfrequency than traditional deterministic pedology in a national soil map.In the map of Italian soil regions compiled at 1:5,000,000 reference scale, soil classes were the WRB Reference Soil Groups(RSGs). Different data mining techniques, namely neural networks, random forests, boosted tree, classification and regression tree,supported vector machine (SVM), were tested and the last one gave the best RSGs predictions, using selected auxiliary variablesand 22,015 classified soil profiles. Given the categorical target variable, the multi-collocated indicator cokriging was the algorithmchosen for the geostatistic approach. The first five more frequent RSGs resulting from the three methods were compared. Theoutcomes were validated with a Bayesian approach on a subset of 10% of geographically representative profiles, kept out beforethe elaborations.The most frequent classes were uniformly predicted by the three methods, which instead differentiated notably for the classes witha lower occurrence. The Bayesian validation indicated that the SVM method was as reliable as the multi-collocated indicatorcokriging, and both more than the deterministic pedological approach. An advantage of the SVM was the possibility to use numericand categorical variable in the same elaboration, without any previous transformation, which notably reduced the processing time.
UR - http://hdl.handle.net/10447/106041
UR - https://www.researchgate.net/publication/271132326_Comparing_Different_approaches_-_Data_mining_Geostatistic_and_Deterministic_pedology_-_to_assess_the_Frequency_of_WRB_reference_soil_groups_in_the_Italian_Soil_Regions
M3 - Other
ER -