Publication Type

Journal Article

Journal Name

Ecological Informatics

Publication Date

7-1-2024

Abstract

Vector-borne diseases, like those transmitted by tsetse flies, pose a significant global public health threat. Reducing vector populations is a promising strategy for disease control, especially in the case of tsetse-transmitted African trypanosomiasis. However, the cost-effective implementation of large-scale vector surveillance and control measures face challenges due to the lack of spatially explicit and reliable maps identifying vector hotspots. In this study, we assessed the accuracy of predicting Glossina pallidipes relative densities across Kenya by linking constrained in-situ tsetse catch data from 660 traps across three Kenyan regions with readily available gridded satellite information (human population, land cover, soil properties, elevation, precipitation, and land surface temperature) using a classical random forest algorithm. To enhance predictive performance, we employed two feature elimination techniques specifically designed for machine learning algorithms, i.e., Recursive Feature Elimination (RFE) and Variable Selection Using Random Forests (VSURF). For each set of retained variables, we trained a Random Forest model using a spatial cross-validation technique. Our findings showed that tsetse fly relative densities decreased with mean annual precipitation, and soil moisture, and conversely increased with higher tree cover. Based on the cross-validated R2, 41% of the spatial variability in relative densities of tsetse flies could be explained. For spatial extrapolation, only the set of predictors retained by VSURF closely matched known tsetse fly distributions in Kenya. This more accurate performance of VSURF may be attributed to its approach of assessing variables for both importance and their contribution to reducing prediction error. Our study demonstrates the potential of using a random forest method to upscale tsetse relative abundance predictions to the national level. However, the reliability of the current extrapolated map remains uncertain. We recommend: 1) increasing tsetse fly sampling efforts, particularly in the data-limited northern and eastern regions of Kenya, and 2) developing a more precise and accurate land cover map with classes that directly associate with known habitat characteristics of the target tsetse species.

Keywords

Machine learning, Random forest, Satellite data, Spatial extrapolations, Tsetse abundance, Vector borne diseases

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.