NicheToolBox

In this tutorial I show how the application works. The application was developed for GSOC 2016

Installation

Go to the github repo of the project nichetoolbox repo.

Then copy and run the installation instructions in R:

if (!require('devtools')) install.packages('devtools')
devtools::install_github('luismurao/nichetoolbox')

Launching the app

library(nichetoolbox)
run_nichetoolbox()

First look

The AppSettings section

In this section you need to specify the folder that contains the niche layers that you will use for the modeling process, as well as the folder where you will save your workflow.

Loading niche layers

On the left panel go to Niche layers section and select the folder where your niche raster layers are. Remeber that they need to have the same spatial extent and resolution (raster formats accepted: .asc, .bil, .sdat, .rst, .nc, .tif, .envi, .img).

Press the Load niche layers button and wait. In a few seconds appears a plot showing one of the layers contained in the folder

Workflow

To get track of your work in NicheToolBoox, you need to specify the folder where you want to save your analysis, data, maps etc. Go to Workflow section and select the folder.

The Data section

Now, we are ready to work with NicheToolBoox. First, we need some georeferenced records of the species we want to model. NicheToolBoox can work with two source of longitude/ latitude data: a) GBIF records, which you can search, download and clean GBIF records, b) you can upload and clean your own occurrence data from a local file.

Seaching GBIF records

Go to Data -> GBIF data. Enter species genus, species name where corresponds and optinonally specify the number of records that you want to search (occ search limit). Press Search GBIF button and wait. If the species is in the GBIF portal a data table will be displayed, if the species is not in GBIF, it will display the following menssage: No ocurrences found

In the example we searched for the species Ambystoma tigrinum which generated 480 records.

GBIF data cleaning

You can remove duplicate records using a separation distance in decimimal degrees (the default is 0). For Ambystoma tigrinum I had 480 records before cleaning, and after clicking Clean duplicates with a XX distance it remained 154, so there were 326 duplicate records!!!

Clean duplicates by group

Suppose that your species has a huge geographic range and you want to work only with the records that match certain criteria, for example records that lie within Canada. You can curate duplicate records using a grouping variable; in this example the grouping variable must be country. Go to Clean duplicates by group section and select the grouping variable in this case country, then select the country (Canada) and click Clean duplicates by group.

From 154 records only 2 are in Canada.

GBIF visualizations

The GBIF dataset has some fields that can be used to get some exciting visualizations, particulary fields related to observation date (year, month, day) and country. In Data -> GBIF data -> GBIF visualizations tab you can play with interactive plots, create animated visualizations and display a calendar of the reported records by year.

User data

You can use and clean your own latitude and longitude data for the modeling process. Go to Data -> User data and upload your data. The data cleaning process is exactly the same as the GBIF data.

Geographic explorations using Dynamic Maps

We have seen how to curate data using threshold distances and grouping varibales in NicheToolBoox. Now let’s see how to use leaflet maps: 1) to display longitude/latitude data, 2) to clean data and 3) to define our study area (M data referring to the M concept, which in the niche modeling world is the accesible area where the species has been able to reach even if has not established), 4) Clean data using the M polygon. The above can be done for either the GBIF dataset or the User dataset.

Display longitude and latitude data

Go to Data -> Dynamic Map and on the right panel Select a dataset that you want to work with, in this case I will show the work with GBIF data.

Data curation using dynamic map

On the right side panel there is an option where you can specify the data point id to clean it from the dataset. Clic on pop-up to see the point id, select it in the select input form from the right panel and press Clean data points button to clean.

Define an ‘M’ map

You can use NicheToolBoox to define your study area. Go to Data -> Dynamic Map and in the right-side panel turn-on the button Define and work with polygon M, when activated you can either draw a polygon using the drawing tools (topright corner) from NicheToolBoox or select a local shapefile. If you prefer to define the M polygon using NicheToolBoox press the polygon tool and draw it:

Once defined, the polygon can be saved. In the right panel there is a form where you can give a name for your polygon.

Data curation using the M polygon

We can filter the data points that lie inside the polygon. In the right panel just press the button Points in polygon

Saving the workflow

Once specified the workflow directory (AppSettings “go to the first section of the tutorial”“) which is the directory where all the information generated in the app is stored, we just need to press the Save state button in order to save everything!! (in this stage only the geographic data related work).

The workflow report

One of the files generated when you preess Save state is the data_report.html which is an html file with a summary of the geographic data-related information that you have done with NicheToolBoox.

Niche space (steps to make a cluster analysis)

To work in Niche space we need to have loaded our niche raster layers (AppSettings “go to the first section of the tutorial”“) and have a longitude/latitude dataset (GBIF data or User data).

1. Extracting niche values from raster layers

Go to Niche space -> Niche data extraction and select a longitude and latitude dataset. In the example I selected the GBIF dataset. If the dataset is not empty and we have loaded the raster layers the app will not show any message:

On the contrary, if we have not loaded either the raster layers or the longitude/latitude data a messages indicating what to do will be displayed.

When the dataset and the layers are in the App memory we can proceed to the next step. Here you just need to press the Run button and then a data table with the niche values of our longitude and latitude data will be displayed.

2. Niche explorations

We can explore our niche data using some exciting 3-Dimensional plots. Go to Niche space -> Known niche and play with \(x\), \(y\) and \(z\) variables of the ellipsoid plot.

2.1 Niche trends

You can fit a (linear, quadratic, additive, smooth) model to see if your niche data have a trend.

3. Niche clustering

When studing species niches and distributions, one of the great questions that comes to my mind is whether or not the species are adapting to different niche conditions. One way to explore this question is using clustering algorithms (a statistical tool which aims to see if a multivariate data have a cluster structure in such a way that the data belonging to the same cluster are very similar between them and different with respect to other groups). If the clusters are very different we can think that populations of the same species are responding in different ways to the same set of niche variables (they are adapting to local conditions). This is just an exploratory tool.

Go to Niche clustering -> K-means section and select at least 3 niche variables to make the cluster analysis. In my case, as I selected the bios of the WorldClim database as my niche layers, I used 19 niche variables, but if you want to work with few variables just erase some of them (Select at least 3 niche variables section).

Here it is necessary to indicate a number of clusters, the default value is 3 (in the future the app will have algorithms to help you to make this decision). Press the Go!!! button and you will see a 3 dimensional plot with ellipsoids representing the number of clusters you suggested. Bellow this plot you will see a leaflet map with the geographic projection of the points that fall inside each ellipsoid (colors help to identify to which cluster each data point belongs).

Let’s play with the number of cluster (now 4) and see how the results change…

4. Niche correlations

One popular method to select the niche varibles for modeling species niches and distributions is to study correlations among niche variables and filter those varibles that are highly correlated. In Nichetoolbox you can filter the variables that summarize the environmental information of your presences data according to a correlation threshold; this algorithm suggest which variables to use for the modeling part.

Also, you can explore the correlation matrix and download it in .csv format

Another thing that the user can do is to plot a correleogram…

6. Ecological niche modeling

Species Distribution Modeling (SDM), also known as Ecological Niche Modeling (ENM), is a growing field of ecology and biogeography which aims to reconstruct the multidimensional ecological niche of species and from here to approximate the geographical distribution of species. ENM uses a set of mathematical and statistical tools to study the relationship between some environmental variables and species occurrences to estimate species niches and predict potential areas where the species can survive. These models have proved useful in ecology and conservation biology because they have been used to identify geographic localities that can be used to relocate endangered species, to study the impacts of climate change in biodiversity, to find biodiversity hotspots, vulnerability to invasive species and pathogens, among other applications (Peterson & Vieglais 2001; Peterson et al 2011).

In Nichetoolbox you can model ecological niches by using one of following modeling algorithms:

Minimum volume ellipsoid
Bioclim
MaxEnt

6.1 Minimum volume ellipsoid model

Ellipsoid models use the multinormal probability density function (PDF; equation 1) to compute the niche suitability index; the PDF is rescaled in order to have a suitability index defined in the inerval \([0,1]\).

\[f\,(x_{1},x_{2},x_{3},..,x_{k})=\frac{1}{\left(2\pi\right)^{k}\mid\mathbf{\sum}\mid}\exp\left(-\frac{1}{2}\left(\mathbf{x-\mathbf{\mathbf{\mu}}}\right)^{\mathbf{T}}\mathbf{\sum}^{-1}\left(\mathbf{x-\mathbf{\mathbf{\mu}}}\right)\right)\,\,(1)\]

\[f\,(x_{1},x_{2},x_{3},..,x_{k})=1\,\exp\left(-\frac{1}{2}\left(\mathbf{x-\mathbf{\mathbf{\mu}}}\right)^{\mathbf{T}}\mathbf{\sum}^{-1}\left(\mathbf{x-\mathbf{\mathbf{\mu}}}\right)\right)\]

where \(\mathbf{x}\) is the vector of enviromental variables such that each \(x_i\) represents an observation of the environmental variable \(i\). \(\Sigma\) is the covariace matrix of the occ data. \(\mu\) is the vector of means (centroids).

The \(({\mathbf x}-{\boldsymbol\mu})^\mathrm{T}{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})\) is the square of the Mahalanobis distance.

In Nichetoolbox, to make an ellipsoid model you just neeed the environmental information of your ocurrence points and select which layers you want to model the niche.

The model can be trained either with all ocurrence data or with the ocurrence points that lie inside your polygon of M.

Similary, you can project the model to the geography by using either the full extent of rasters or the extent of the ‘M’ polygon.

Using full extent

Select the niche variables and run your model…

Using the extent of the ‘M’ polygon

Download ellipsoid metadata

Download ellipsoid raster model

Download distance to the centroid table

6.2 Bioclim model

The way that bioclim model is implemented in Nichetoolbox is the same as the ellipsoid model:

The model can be trained either with all ocurrence data or with ocurrence points that lie inside your ‘M’ polygon.

Similary, you can project the model to the geography using either the full extent of rasters or the extent of the ‘M’ polygon.

6.3 MaxEnt model

You can run MaxEnt within Nichetoolbox. Nichetoolbox call the Maxent function from dismo package. In order to use MaxEnt within Nichetoolbox you need to install rJava and paste the .jar file of maxent in the java folder of dismo. To test if maxent is aviable run the following comand:

jar <- paste(system.file(package="dismo"), "/java/maxent.jar", sep='')
# Ask if necessary files are in java folder of dismo 
file.exists(jar)

## [1] TRUE

# test if rJava is installed
"rJava" %in% installed.packages()

## [1] TRUE

If everithing is fine, you can make Maxent models within Nichetoolbox with your own data or the data that you have downloaded from GBIF and by chossing between full raster extent or the ‘M’ extent.

Most of MaxEnt features and setting are implemented in the app

Main features

Basic settings

Advanced settings

Experimental settings

Runing maxent

Once you have configured your Maxent settings press the run button. A window with the basic statitics of maxent will be displayed

Download Maxent results

To download maxent results click on Download complete results link

Download Maxent raster model

ENM projection in geographic space

Once you have modeled your species’ niche using one or all modeling algorithms, you can explore them in geographic space by using the model visualizer. The visualizer is interactive (you can zoom in/out a map) and uses the leaflet library.

7. Species distribution model performance

The last part of the project deals with species distribution model evaluation and performance. Nichetoolbox has two ways to evaluate models:

Partial Roc: uses the function implemented on ENMGadgets package that does Partial Roc (Peterson et al. 2008).
Confusion matrix metrics: You can compute prevalence, specificity, sensitivity, TSS, Kappa, correct classification rate, misclassification rate, negative predictive power, positive predictive power, omission error fraction, commission error fraction, false negative rate, and false positive rate from the confusion metrics (Fielding and Bell, 1997).

7.1 Partial ROC

To do Partial ROC analysis in Nichetoolbox upload your continuous niche model aoutput map (e.g., Maxent) and your validation dataset.

Validation data must be in the following format:

sp_name	longitude	latitude
Ambystoma tigrinum	-107.08333	51.08333
Ambystoma tigrinum	-102.41667	44.41667
Ambystoma tigrinum	-99.75000	45.91667
Ambystoma tigrinum	-85.75000	45.25000
Ambystoma tigrinum	-91.75000	45.75000
Ambystoma tigrinum	-91.41667	39.75000

Partial ROC output

7.2 Binary maps

The ‘Binary maps’ section has functions to transform continuos models into binary maps (i.e., presence and absence). The conversion can be done by using one of following methods: 1) Confusion matrix optimization: By using true presences and absences the algortihm search for the cut-off threshold that optimizes the value of Kappa and/or TSS statistic.

Minimum training presence: Uses the lowest suitability value where a presence occurres as cut-off threshold.
User defined threshold: The user specifies the cut-off threshold.

7.2.1 Confusion matrix optimization

The user uploads both the continuous map (.asc) and the presences/anbesences data file (.csv). The presences/anbesences data have to be in the following format

longitude	latitude	presence_absence
-111.25000	36.91667	0
-106.20000	35.30000	1
-98.08000	47.74000	1
-93.27306	45.21076	1
-112.64406	36.58329	1
-101.85097	35.18559	1

Once uploaded press specify the range of thresholds to look for and press Search threshold button The output looks like this

7.2.2 Minimum training presence

Just upload your continuos model (.asc) and your training data file (.csv).