All about the data.

This section aims to give more information on where the data for this project came from. From the occurrence data to the climate data, everything you would want to know about the data can be found right here.

Data on species occurance

The species occurrence data is a set of coordinates of where the species were reported to be present. This will act as the training dataset for the models.

Presence data
  • Obtained from the Global Biodiversity Information Facility (GBIF)
  • Used R package Coordinate Cleaner to extract the data using the taxon key for S. frugiperda
  • Pulled 11 120 occurrence points
Absence data
  • Made use of pseudoabsences
  • Used R package sp to create these points
  • Chose to generate 20 000 data points at a resolution of 30s as this will later be cleaned to only contain points on land masses
Presence cleaning
  • Filtered to contain only points with both latitude and longitude present
  • Coordinate Cleaner was then used to test for centroids, equal, zeros, gbif, institutions, seas, and duplicates
  • The final presence dataset had 5379 records
Absence cleaning
  • The specific CC_sea package from Coordinate Cleaner was then used to test for  seas. The others were omitted because the pseudoabsences were randomly created.
  • The final absence dataset was reduced to 3032 records

Data on current and future climate

I chose the GFDL-ESM4 model from the CMPI6 scenario as this covers Africa, Asia, and (whatever it was in that paper) better than the other scenarios available currently in the CHELSA-BIOCLIM+ database. This model was created by the National Oceanic and Atmospheric Administration, Geophysical Fluid Dynamics Laboratory, Princeton, NJ 08540, USA and is at a native resolution of 288x180. Point values were extracted at a resolution of 30s.

All climate data used in this project were obtained from the Climatologies at High resolution for the Earth’s Land Surface Areas (CHELSA) database

Logo

Current data

Comprehensive information on historical climate conditions across various regions. It includes datasets derived from observations and reanalysis data, offering insights into parameters such as temperature, precipitation, and other climatic variables over past time periods.

Researchers and practitioners utilize this data to analyze past climate trends, understand regional climate variability, and assess the impacts of climate change on ecosystems, agriculture, water resources, and human societies

Future data

Consists of projections and simulations of climate conditions under different greenhouse gas emission scenarios and climate models. These projections offer insights into potential future climate scenarios, including changes in temperature, precipitation patterns, and extreme weather events.

Future data allows researchers and policymakers to anticipate and plan for potential climate impacts, develop adaptation strategies, and evaluate the effectiveness of mitigation measures to address climate change challenges

Variables

The CHELSA Climate Database provides a wide range of climatic variables that are essential for understanding and analyzing climate patterns and trends.

Due to the large volume of variables available on CHELSA, only the following were used in the final model training, as explained further in the Modelling section

Variable Name Explanation
bio1 Mean annual air temperature (°C) Mean annual daily mean air temperatures averaged over 1 year
bio6 Mean daily minimum air temperature of the coldest month 
(°C)
The lowest temperature of any monthly daily mean maximum temperature
bio9 Mean daily mean air temperatures of the driest quarter
(°C)
The driest quarter of the year is determined (to the nearest month)
bio11 Mean daily mean air temperatures of the coldest quarter
(°C)
The coldest quarter of the year is determined (to the nearest month)
bio12 Annual precipitation amount (kg m-2 year-1) Accumulated precipitation amount over 1 year
gdd10 Growing degree days heat sum above 10°C 
(°C)
Heat sum of all days above the 10°C temperature accumulated over 1 year.
ngd10 Number of growing degree days (number of days)  Number of days at which mean daily air temperature > 10°C
Why these variables?

This was determined by analysing the feature importances given after training a model on the presence and absence datapoints.

For more info on this, check out the Modelling section

References for this project