NicheToolBox third stage functions

In this tutorial I am going to show how to use the third stage functions of nichetoolbox. The work was done for GSOC 2016.

Summary

The third stage functions are related to model species niches and to estimate species distributions. To do the above I devolped methods to run algorithms that predict species niches and estimate species potential distributions (Elliposid models, bioclim and MaxEnt). There are also some methods that convert the potential distribution map into a binary map which attempts to show where does the species is distributed. This last part includes methos to evaluate the species distribution maps.

6. Ecological niche modeling

Species Distribution Modeling (SDM) also known as Ecological Niche Modeling (ENM) is a growing field of ecology which aims to estimate the geographical distribution of the species. ENM uses a set of mathematical and statistical tools to study the relationship between some environmental variables and species occurrences to estimate species niches and predict potential areas where the species can survive. This models have proved to have a huge impact in ecology and conservation plans because they are used to find geographic localities that can be used to relocate endangered species, to study the impacts of climate chenge in biodiversity, to find biodiversity hotspots or in other context, localities that are vulnerable to invasive species and pathogens (Peterson 2003; Peterson & Vieglais 2001).

In nichetoolbox you can model ecological niches by using one of following modeling algorithms:

  1. Ellipsoid models
  2. Bioclim models
  3. MaxEnt models

6.1 Ellipsoid models

Ellipsoid models use the multinormal probability density function (equation 1) to compute the habitat suitability index; the PDF is rescaled in order to have a suitability index defined in the inerval \([0,1]\).

\[f\,(x_{1},x_{2},x_{3},..,x_{k})=\frac{1}{\left(2\pi\right)^{k}\mid\mathbf{\sum}\mid}\exp\left(-\frac{1}{2}\left(\mathbf{x-\mathbf{\mathbf{\mu}}}\right)^{\mathbf{T}}\mathbf{\sum}^{-1}\left(\mathbf{x-\mathbf{\mathbf{\mu}}}\right)\right)\,\,(1)\]

\[f\,(x_{1},x_{2},x_{3},..,x_{k})=1\,\exp\left(-\frac{1}{2}\left(\mathbf{x-\mathbf{\mathbf{\mu}}}\right)^{\mathbf{T}}\mathbf{\sum}^{-1}\left(\mathbf{x-\mathbf{\mathbf{\mu}}}\right)\right)\]

where \(\mathbf{x}\) is the vector containing of the enviromental variables such that each \(x_i\) represents an observation of the environmental variable \(i\). \(\Sigma\) is the covariace matrix of the occ data. \(\mu\) is the vector of means (centroids).

The \(({\mathbf x}-{\boldsymbol\mu})^\mathrm{T}{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})\) is the square of the Mahalanobis distance.

In nichetoolbox to make an ellipsoid model you just neeed the environmental information of your ocurrence points and select which layers you are going to use to model the niche.

The model can be trained either by all ocurrence data or by the ocurrence points that lie inside your polygon of M.

Similary you can project the model by using either all raster extent or the extent of the polygon of M.

Using all extent

Select the your niche variables and run your model…

Using the polygon of M extent

Download ellipsoid metadata

Download ellipsoid raster model

Download distance to the centroid table

6.2 Bioclim models

The way that bioclim models are implemented in nichetoolbox is just the same as ellipsoid models:

  • The model can be trained either by all ocurrence data or by the ocurrence points that lie inside your polygon of M.
  • Similary you can project the model by using either all raster extent or the extent of the polygon of M.

6.3 MaxEnt models

You can run MaxEnt within nichetoolbox. nichetoolbox call the maxent function from dismo package. In order to use MaxEnt within nichetoolbox you need to install rJava and paste the .jar file of maxent in the java folder of dismo. To test if maxent is aviable run the following comand:

jar <- paste(system.file(package="dismo"), "/java/maxent.jar", sep='')
# Ask if necessary files are in java folder of dismo 
file.exists(jar) 
## [1] TRUE
# test if rJava is installed
"rJava" %in% installed.packages()
## [1] TRUE

If everithing its ok you can make maxent models within nichetoolbox by using your own data or the data that you have downloaded from GBIF and by chossing between all raster extent layer or the M layers.

Most of MaxEnt features and setting are implemented in the app

Main features
Basic settings
Advanced settings
Experimental settings

Runing maxent

Once you have configured your maxent settings press the run button. A window with the basic statitics of maxent will be displayed

Download maxent results

To download maxent results click on Download complete results link

Download maxent raster model

ENM projection in Geographic space

Once you have modeled you species niche using one or all modeling algorithms, you can explore them in geographic space by using the model visualizer. The visualizer is interactive (you can zoom on map) and uses leaflet library.

7. Species distribution model performance

The last part of the project deals with species distribution model evaluation and performance. nichetoolbox has two ways to evaluate models:

  1. Partial Roc: uses the function implemented on ENMGadgets package that does Partial Roc (Peterson et al. 2008).
  2. Confusion matrix metrics: You can compute the prevalence, specificity, sensitivity, TSS, Kappa, correct classification rate, misclassification rate, negative predictive power, positive predictive power, omission error fraction, commission error fraction, false negative rate, false positive rate (Fielding and Bell, 1997).

7.1 Partial ROC

To do Partial ROC analysis on nichetoolbox go up loas your continuos map model and your validation data.

The validation data must be in the following format:

sp_name longitude latitude
Ambystoma tigrinum -107.08333 51.08333
Ambystoma tigrinum -102.41667 44.41667
Ambystoma tigrinum -99.75000 45.91667
Ambystoma tigrinum -85.75000 45.25000
Ambystoma tigrinum -91.75000 45.75000
Ambystoma tigrinum -91.41667 39.75000
Partial ROC output

7.2 Binary maps

Binary maps section has functions to transfor continuos models into binary maps of presences and absences. The conversion can be done by using one of following methods: 1) Confusion matrix optimization: By using true presences and absences the algortihm search for the cut-off threshold that optimices the value of Kappa and/or TSS statistic. 2) Minimum training presence: Uses the lowest suitability value where a presences has occured as cut-off threshold. 3) User defined threshold: The user specifies the cut-off threshold.

7.2.1 Confusion matrix optimization

The user upload both the continuos map (.asc) and the presences/anbesences data (.csv). The presences/anbesences data has to be in the following format

longitude latitude presence_absence
-111.25000 36.91667 0
-106.20000 35.30000 1
-98.08000 47.74000 1
-93.27306 45.21076 1
-112.64406 36.58329 1
-101.85097 35.18559 1

Once uploaded press specify the range of thresholds to look for and press Search threshold button The output looks like this

7.2.2 Minimum training presence

Just upload your continuos model (.asc) and your training data (.csv).

The validation data must be in the following format:

sp_name longitude latitude
85 Ambystoma tigrinum -100.58333 31.91667
86 Ambystoma tigrinum -91.08333 38.91667
87 Ambystoma tigrinum -113.41667 42.75000
88 Ambystoma tigrinum -121.41667 39.75000
89 Ambystoma tigrinum -114.58333 42.91667
90 Ambystoma tigrinum -94.41667 45.41667

7.2.3 User defined threshold

Specify a cut-off threshold

References