This section is to give more information on which models were created. How they were assessed and trained and how they performed.
For this project I considered two types of machine learning models. ANN and RFR. Below is a list of a few reasons why these were chosen for consideration.
I created both models and ran it along with a grid search method to determine the best hyperparameters.
A grid search method revealed that the following resulted in the pest model performance:
A grid search method revealed that the following resulted in the pest model performance:
Overall accuracy: 0.9611
Precision | Recall | f1-score | Support | |
---|---|---|---|---|
0 | 0.97 | 0.93 | 0.95 | 622 |
1 | 0.96 | 0.95 | 0.97 | 1051 |
Overall accuracy: 0.9647
Precision | Recall | f1-score | Support | |
---|---|---|---|---|
0 | 0.96 | 0.94 | 0.95 | 622 |
1 | 0.97 | 0.98 | 0.97 | 1051 |
Because the two models performed similarly, I chose to make my future occurrence predictions using the RFR model.
This was mainly because the futured data had a resolution of 30s, leading to incredibly large datasets. In order to reduce the time needed to extract all future climatic variables, the feature importances provided by RFR can reduce the amount that I had to focus on when creating my future datasets.
After examining the feature importances (graph shown below) I decided to include everything above 0.05 as below this point there is a substantial drop in importance per feature. For more information on what each of the chosen features are, you can look at the data page.