Evaluation of the Accuracy of Different Machine Learning Algorithms in Predicting Greenhouse Cucumber Crop Evapotranspiration

Document Type : Research Paper

Authors

1 Department of Irrigation and Reclamation Engineering, Faculty of Agriculture, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran.

2 Department of Irrigation and Reclamation Engineering, Faculty of Agriculture, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

3 Department of Horticultural Science and Landscape Architecture, Faculty of Agriculture, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran.

4 Department of Soil Science, Faculty of Agriculture, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran.

10.22059/jwim.2025.399994.1251

Abstract

In this study, the crop evapotranspiration (ETc) of greenhouse cucumber was modeled during two distinct growing seasons: autumn–winter 2022–2023 and spring–summer 2023, within a controlled greenhouse environment located at the College of Agriculture and Natural Resources, University of Tehran. ETc was estimated using soil water balance equations informed by data from Time Domain Reflectometry (TDR) sensors installed at a soil depth of 0–30 cm. Reference evapotranspiration (ETo) was measured with a micro-lysimeter containing turfgrass. The model’s input variables included ten features, such as air temperature, relative humidity, solar radiation, ETo, and days after transplanting (DAT). Correlation analysis revealed that solar radiation, mean temperature, and DAT exhibited the strongest positive relationships with ETc. To predict ETc, six machine learning algorithms—Principal Component Regression (PCR), Partial Least Squares (PLS), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGB)—were implemented in Python. The hyperparameters of each model were optimized using the Tree-structured Parzen Estimator (TPE) algorithm from the Optuna library. Model performance was evaluated through five-fold cross-validation using R², RMSE, MAE, and NSE as performance metrics. Results indicated that the GB algorithm achieved the highest predictive accuracy, with average R², RMSE, MAE, and NSE values of 0.90, 0.59 mm/day, 0.41 mm/day, and 0.89, respectively. XGB, RF, and SVM followed closely with no statistically significant difference compared to GB. SHAP analysis identified DAT, ETo, and solar radiation as the most influential features across models., This study demonstrates that tree-based machine learning models offer robust and accurate tools for managing the water requirements of greenhouse cucumber cultivation.

Keywords

Main Subjects