Flood probability zonation using a comparative study of two well-known random forest and support vector machine models in northern Iran

Document Type : Research Paper

Authors

1 Ph.D. Candidate, Department of Water Engineering, College of Agriculture, Kermanshah Branch, Islamic Azad University, Kermanshah, Iran.

2 Associate Professor, Department of Water Engineering, College of Agriculture, Kermanshah Branch, Islamic Azad University, Kermanshah, Iran

3 Assistant Professor, Department of Water Engineering, College of Agriculture, Kermanshah Branch, Islamic Azad University, Kermanshah, Iran

Abstract

The current study is aimed to zoning flood probability map in the Saliantapeh catchment is located in the Golestan Province. To this aim, two well-known data mining models namely Random Forest (RF) and Support Vector Machine (SVM) were applied due to their robust computational algorithm. Flood inventories were gathered through several field surveys using, local information and available organizational resources and corresponding map was created in the geographic information system. Reviewing several worldwide studies, 13 predisposing variables including proximity to stream, soil texture, lithological units, land use/cover, slope percent, elevation/DEM, slope aspect, plan curvature, profile curvature, stream power index and topographic wetness index were chosen and the corresponding maps were generated in the geographic information system. In this study, after preparing the predictor maps, SPSS software was used to analyze this data and testing Multi-collinearity. In order to evaluate models’ results the area under the receiver operating were used. Three different sample data sets (s1, s2, s3) including 70% for training and 30% for validation were randomly gathered to evaluate the robustness of the applied models. Results showed that the RF model with the area under curve value of 0.96 and robustness of 0,001 in validation step had better performance on flood probability zonation over the study area.

Keywords

Main Subjects


  1. Abdi, P. (2006). Investigation of flood potential of Zanjan River basin by SCS method and GIS. National Irrigation and Drainage Committee. Technical workshop on coexistence with floods. (In Persion)
  2. Akgün, A., & Bulut, F. (2007). GIS-based landslide susceptibility for Arsin-Yomra (Trabzon, North Turkey) region. Environment Geology, 51(8), 1377-1387.
  3. Albers, S. J., Déry, S. J., & Petticrew, E. L. (2016). Flooding in the Nechako River Basin of Canada: A random forest modeling approach to flood analysis in a regulated reservoir system. Canadian Water Resources Journal/Revue canadienne des ressources hydriques, 41(1-2), 250-260.
  4. Angileri, S.E., Conoscenti, C., Hochschild, V., Märker, M., Rotigliano, E., & Agnesi, V. (2016). Water erosion susceptibility mapping by applying Stochastic Gradient Treeboost to the Imera Meridionale River basin (Sicily, Italy). Geomorphology. 262, 61-76.
  5. Bui, D.T., Khosravi, K., Shahabi, H., Daggupati, P., Adamowski, J.F., Melesse, A., Pham, B.T., Pourghasemi, H.R., Mahmoodi, M., Bahrami, S., Pradhan, B., Shirzadi, A., Chapi, K., & Lee, S. (2019). Flood Spatial Modeling in Northern Iran Using Remote Sensing and GIS: A Comparison between Evidential Belief Functions and Its Ensemble with a Multivariate Logistic Regression Model. Remote Sensing, 11(13), 1589.
  6. Chen, W., Li, Y., Xue, W., Shahabi, H., Li, S., Hong, H., & Ahmad, B.B. (2020). Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Science of The Total Environment, 701, 134-979.
  7. Conoscenti, C., Angileri, S., Cappadonia, C., Rotigliano, E., Agnesi, V., & Märker, M. (2014). Gully erosion susceptibility assessment by means of GIS-based logistic regression: a case of Sicily (Italy). Geomorphology, 204, 399-411.
  8. Dickie, J.A., & Parsons, A.J. (2012). Eco‐geomorphological processes within grasslands, shrublands and badlands in the semi‐arid Karoo, South Africa. Land Degradation Dev., 23(6), 534-547.
  9. Daoud, J.I. (2017). Multicollinearity and regression analysis. J. Phy, Conference Series (949(1), 012009). IOP Publishing.
  10. Felicĺsimo, Á., Cuartero, A., Remondo, J., & Quirόs, E. (2013). Mapping landslide susceptibility with logistiv regression, multiple adaptive regression splines, classification and regression tress, and maximum entropy methods: a comparative study. Landslides, 10, 175-189.
  11. Gayen, A., Pourghasemi, H.R., Saha, S., Keesstra, S., & Bai, S. (2019). Gully erosion susceptibility assessment and management of hazard-prone areas in India using different machine learning algorithms. Science of the Total Environment, 668, 124-138.
  12. Guzzetti, F., Cardinali, M., Reichenbach, P., & Carrara, A. (2000). Comparing landslide maps: A case study in the upper Tiber River Basin, central Italy. Environmental Management, 25(3), 247-263.
  13. Glenn, E., Morino, K., Nagler, P., Murray, R., Pearlstein, S., & Hultine, K. (2012). Roles of saltcedar (Tamarix spp.) and capillary rise in salinizing a non-flooding terrace on a flow-regulated desert river. Journal of Arid Environment, 79, 56-65.
  14. Hall, A. J. (1981). Flash flood forecasting. World Meteorological Organization (WMO (Series); no. 577.), Operational hydrology report (World Meteorological Organization); 18, 48.
  15. Hosmer, D. W., & Lemeshow, S. (2000). Multiple Logistic Regression. Hoboken, NJ: John Wiley & Sons, Inc. doi: 10.1002/0471722146.ch2.
  16. Jafarian, Z., & Kargar, M. (2017). Comparison of Random Forest (RF) and Boosting Regression Tree (BRT) For Prediction of Dominant Plant Species Presence in Polour Rangelands, Mazandaran Province. Iranian Journal of Applied Ecology, 6(1), 41-55.
  17. Kheyrizadeh, M., J. Maleki and H. Amounia. 2012. Flood hazard zoning using ANP model in watershed, case study: Mardaghchay Watershed. Quantitative Geomorphological Researches, 3(2), 39-56. (in Persian)
  18. Khosravi, K., Nohani, E., Maroufinia, E., & Pourghasemi, H.R. (2016). A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique. Natural Hazards, 83(2), 947-987.
  19. Lee, S., & Pradhan, B. (2007). Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides, 4(1), 33-41.
  20. Marmion, M., Hjort, J., Thuiller, W., & Luoto, M. (2008). A comparison of predictive methods in modelling the distribution of periglacial landforms in Finnish Lapland. Earth Surface Processes and Landforms, 33(14), 2241-2254,
  21. Mojaddadi, H., Pradhan, B., Nampak, H., Ahmad, N., & Ghazali, A.H.B. (2017). Ensemble machine-learning-based geospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS. Geomatics, Natural Hazards and Risk, 8(2), 1080-1102.
  22. Nouri Boroujerdi, P., & Eskandi, V. (2009) Introduction to Quantitative Studies in Management (Case Study: Data Mining in Management Studies). Quarterly Journal of Quantitative Studies in Management, 3(2) 1-13 (In Persion)
  23. Poudyal, C.P., Chang, C., Oh, H.J., & lee, S. (2010). Landslide susceptibility maps comparing frequency ratio and artificial neural networks: a case study from the Nepal Himalaya. Environmental Earth Sciences, 61(5), 1049-1064.
  24. Pourghasemi, H.R., Jirandeh, A.G., Pradhan, B., Xu, C., & Gokceoglu, C. (2013). Landslide susceptibility mapping using support vector machine and GIS at the Golestan Province, Iran. Journal of Earth System Science, 122(2), 349-369.
  25. Pourtaghi, Z.S., & Pourghasemi, H.R. (2014). GIS-based groundwater spring potential assessment and mapping in the Birjand Township, southern Khorasan Province, Iran. Hydrogeology Journal, 22(3), 643-662
  26. Rahi, G.h. (2018). Prediction of trench erosion sensitivity using spatial data mining methods. Ph.D. thesis, Faculty of Natural Resources Engineering. Sari University of Agricultural, Sciences and Natural Resources. (In Persion).
  27. Rahmati, O., Pourghasemi, H. R., & Zeinivand, H. (2015). Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto International, 31(1), 42-70
  28. Rahmati, O., Zeinivand, H., & Besharat, M. (2016a). Flood hazard zoning in Yasooj region, Iran, using GIS and multi-criteria decision analysis. Geomatics, Natural Hazards and Risk, 7(3), 1000-1017.
  29. Rahmati, O., Pourghasemi, H. R., & Melesse, A. M. (2016b). Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: a case study at Mehran Region, Iran. Catena, 137, 360-372.
  30. Rahmati, O., & Pourghasemi, H. R. (2017). Identification of critical flood prone areas in data-scarce and ungauged regions: A comparison of three data mining models. Water Resources Management, 31(5), 1473-1487
  31. Rotigliano, E., Martinello, C., Agnesi, V., & Conoscenti, C. (2018). Evaluation of debris flow susceptibility in El Salvador (CA): a comparison between Multivariate Adaptive Regression Splines (MARS) and Binary Logistic Regression (BLR). Hungarian Geogr. Bull, 67, 361-373.
  32. Servati, M.R., Ghahrodi Tali, M., Golkarami, A., & Njafi, E. (2014). Geomorphological thresholds for gully erosion in Kchick watershed, NE Golestan Province. Applied researches in geographical sciences, 32, 231-249, (in Persian)
  33. Tehrany, M.S., Pradhan, B. & Jebur, M.N. (2013). Spatial prediction of flood susceptible areas using rule-based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. Journal of Hydrology, 504, 69-79.
  34. Tehrany, M.S., Pradhan, B., Mansor, S., & Ahmad, N. (2015). Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. Catena, 125, 91-101.
  35. Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York, Springer-Verlag, pp. 122.
  36. Wilson, J.P., & Gallant, J.C. (Eds). (2000). Terrain analysis: principles and applications. John Wiley and Sons.
  37. Walter, S.D. (2002). Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Stat Med. 21, 1237-1256.
  38. Wang, L. (2005). Support Vector Machines: Theory and Applications. New York, Springer-Verlag, pp.412.
  39. Woznicki, S.A., Baynes, J., Panlasigui, S., Mehaffey, M., & Neale, A. (2019). Development of a spatially complete floodplain map of the conterminous United States using random forest. Science of the total environment, 647, 942-953.
  40. Yalcin, A. (2008). GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): comparisons of results and confirmations. Catena, 72)1), 1-12.