NicheToolBox: An example of the model selection protocol for ellipsoid models

1 The example
2 Get the data for the example
3 Split the data in train and testing
4 Remove strongly correlated variables
5 Ellipsoid calibration and selection
6 Project the best model
7 Binarize maps
References

1 The example

We demonstrate some of the main functions of ntbox by modeling the potential distribution of Leopardus wiedii, a near-threatened small cat that lives in the Neotropics (Fig. 1).

We modeled the ecological niche of L. wieddi using ellipsoids and show the performance and speed of ntbox model calibration and selection protocol for MVEs by using environmental information from North America at three different spatial resolutions (10’, 5’ and 2.5’).

Figure 1. Leopardus wiedii. The image is taken from (Sanchez et al. 1998)

First, we set the random seed to make our the example reproducible.

set.seed(1111)
library(ntbox)

2 Get the data for the example

2.1 Environmental data

ntbox has 4 different native functions to get environmental data, each one is related to the following databases:

CHELSA (?ntbox::get_chelsa)
ENVIREM (?ntbox::get_envirem_clim and ?ntbox::get_envirem_elev for climatic and elevation data respectevly)
Bio-Oracle (?ntbox::get_bio_oracle).

Here, we use the function getData from the package raster (Hijmans 2017) to download the WorldClim at 10, 5, and 2.5 ArcMinutes of resolution.

wc10 <- raster::getData('worldclim', var='bio', res=10)
wc5 <- raster::getData('worldclim', var='bio', res=5)
wc2.5 <- raster::getData('worldclim', var='bio', res=2.5)
plot(wc10[["bio1"]])

2.2 Crop and mask environmental data

Reading a shapefile for America

amp <- normalizePath("../america_sin_islas")
am <- rgdal::readOGR(dsn =amp,layer="america_sn_islas")

Cut the layers using America as a mask

am10 <- raster::crop(wc10,am)
am10 <- raster::mask(am10,am)
am5 <- raster::crop(wc5,am)
am5 <- raster::mask(am5,am)
am2.5 <- raster::crop(wc2.5,am)
am2.5 <- raster::mask(am2.5,am)
plot(am10[["bio1"]])

plot(am10[["bio1"]])

2.3 Geographic data

From ntbox, we downloaded available occurrences for L. wiedii from the Global Biodiversity Information Facility (https://www.gbif.org) and explore what is the provenance and date of collecting these points.

dAll_a <- ntbox::searh_gbif_data(genus = "Leopardus",
                                 species = "wiedii",
                                 occlim = 5000,
                                 leafletplot = TRUE)

cat("Total number of occurrence data found:",nrow(dAll_a))

## Total number of occurrence data found: 859

We select those records starting in 1950 as we will use the bioclimatic layers from WorldClim for the modeling process.

dAll_b <- ntbox::clean_dup(dAll_a,longitude = "longitude","latitude",
                           threshold = 0)
cat("Total number of occurrence data affter cleanining spatial duplicates:",
    nrow(dAll_b))

## Total number of occurrence data affter cleanining spatial duplicates: 526

dAll_c <- dAll_b %>% dplyr::filter(year>=1950)
cat("Total number of occurrence data for periods >=1950:",
    nrow(dAll_c))

## Total number of occurrence data for periods >=1950: 384

m <- leaflet::leaflet(dAll_c)
m <- m %>% leaflet::addTiles()
m <- m %>% leaflet::addCircleMarkers(lng = ~longitude, 
                                     lat = ~latitude, 
                                     popup = ~leaflet_info, 
                                     fillOpacity = 0.25, 
                                     radius = 7,col="green")
m

Remove wired occurrences. Click on the pop-up to display gbif information (available when the downloaded data comes from search_gbif function), the points that are outside the distribution are the one on San Francisco (this comes from a collection; rowID=632), the record on Florida (rowID=508,489), and the ones that in the sea (540,591).

# Indixes of the wired data (can change depending the date of the gbif query)
to_rmIDs <- c(632,489,508,540,591)
to_rm <- which(dAll_c$ntboxID %in% to_rmIDs)
dAll <- dAll_c[-to_rm,]
m <- leaflet::leaflet(dAll)
m <- m %>% leaflet::addTiles()
m <- m %>% leaflet::addCircleMarkers(lng = ~longitude, 
                                     lat = ~latitude, 
                                     popup = ~leaflet_info, 
                                     fillOpacity = 0.25, 
                                     radius = 7,col="green")
m

2.4 Remove environmental duplicates

First, we extract environmental information from occurrences

dAll_e10 <- raster::extract(am10,dAll[,2:3])
dAll_ge10 <- data.frame(dAll[,c(2:3,ncol(dAll))],
                        dAll_e10)
dAll_e5 <- raster::extract(am5,dAll[,2:3])
dAll_ge5 <- data.frame(dAll[,c(2:3,ncol(dAll))],
                        dAll_e5)
dAll_e2.5 <- raster::extract(am2.5,dAll[,2:3])
dAll_ge2.5 <- data.frame(dAll[,c(2:3,ncol(dAll))],
                        dAll_e2.5)

We remove duplicated data

dAll_ge10c <- ntbox::clean_dup(dAll_ge10,
                               longitude ="longitude",
                               latitude = "latitude",
                               threshold = res(am10)[1])
# remove NA's
dAll_ge10c <- unique(na.omit(dAll_ge10c))


dAll_ge5c <- ntbox::clean_dup(dAll_ge5,
                              longitude ="longitude",
                              latitude = "latitude",
                              threshold = res(am5)[1])
# remove NA's
dAll_ge5c <- unique(na.omit(dAll_ge5c))

dAll_ge2.5c <- ntbox::clean_dup(dAll_ge2.5,
                                longitude ="longitude",
                                latitude = "latitude",
                                threshold = res(am2.5)[1])
# remove NA's
dAll_ge2.5c <- unique(na.omit(dAll_ge2.5c))

Explore the curated data on a leaflet map for the three spatial resolutions (2.5=blue,5=green,10=red)

mc <- leaflet::leaflet(dAll_ge2.5c)
mc <- mc %>% leaflet::addTiles()
mc <- mc %>% leaflet::addCircleMarkers(lng = ~longitude, 
                                     lat = ~latitude, 
                                     popup = ~leaflet_info, 
                                     fillOpacity = 0.25, 
                                     radius = 7,col="blue") %>%
  leaflet::addCircleMarkers(lng = dAll_ge5c$longitude, 
                                     lat = dAll_ge5c$latitude, 
                                     popup = dAll_ge5c$leaflet_info, 
                                     fillOpacity = 0.25, 
                                     radius = 7,col="green") %>%
  leaflet::addCircleMarkers(lng = dAll_ge10c$longitude, 
                                     lat = dAll_ge10c$latitude, 
                                     popup = dAll_ge10c$leaflet_info, 
                                     fillOpacity = 0.25, 
                                     radius = 7,col="red")
mc

The following table shows the number of records before and after data cleaning process for each spatial resolution

Resolution	GBIF_records_found	Records_affter_claening
10 min	859	231
5 min	859	274
2.5 min	859	306

3 Split the data in train and testing

Now we will create train and testing data using a proportion of 70:30 respectively

trainID10 <- sample(nrow(dAll_ge10c),
                    size =ceiling(nrow(dAll_ge10c)*0.7))

trainID5 <- sample(nrow(dAll_ge5c),
                    size =ceiling(nrow(dAll_ge5c)*0.7))
trainID2.5 <- sample(nrow(dAll_ge2.5c),
                    size =ceiling(nrow(dAll_ge2.5c)*0.7))

Geographic train and test data

dtrain10 <- dAll_ge10c[trainID10,1:2]
dtest10 <-  dAll_ge10c[-trainID10,1:2]

dtrain5 <- dAll_ge5c[trainID5,1:2]
dtest5 <-  dAll_ge5c[-trainID5,1:2]

dtrain2.5 <- dAll_ge5c[trainID2.5,1:2]
dtest2.5 <-  dAll_ge5c[-trainID2.5,1:2]

Environmental train and test

dtrain_e10 <- dAll_ge10c[trainID10,-(1:3)]
dtest_e10 <- dAll_ge10c[-trainID10,-c(1:3)]

dtrain_e5 <- dAll_ge5c[trainID5,-(1:3)]
dtest_e5 <- dAll_ge5c[-trainID5,-c(1:3)]

dtrain_e2.5 <- dAll_ge2.5c[trainID2.5,-(1:3)]
dtest_e2.5 <- dAll_ge2.5c[-trainID2.5,-c(1:3)]

4 Remove strongly correlated variables

First estimate correlation matrix

corsMat10 <- cor(dAll_ge10c[,-(1:3)])
corsMat5 <- cor(dAll_ge5c[,-(1:3)])
corsMat2.5 <- cor(dAll_ge2.5c[,-(1:3)])

Select environmental variables using a correlation filter of 0.95

env_vars10 <- ntbox::correlation_finder(corsMat10,
                          threshold = 0.95,
                          verbose = FALSE)$descriptors

env_vars5 <- ntbox::correlation_finder(corsMat5,
                          threshold = 0.95,
                          verbose = FALSE)$descriptors

env_vars2.5 <- ntbox::correlation_finder(corsMat2.5,
                          threshold = 0.95,
                          verbose = FALSE)$descriptors
env_vars <- intersect(env_vars10,env_vars2.5)
env_vars <- intersect(env_vars5,env_vars)
#env_vars <- union(c(env_vars10,env_vars5),env_vars2.5)

5 Ellipsoid calibration and selection

To calibrate the models ntbox estimates all combinations (\(C^n_k\)) of \(n\) variables, taken \(k\) at a time for each \(k= 2,3,\dots, m\), where \(m\) is lesser than \(n\). It is known that

\[\displaystyle C^n_k =\frac {n!}{k!(n-k)!}.\]

In this example, after selecting 16 of the less correlated variables (see above section) we fit Minimum Volume Ellipsoid Models (Van Aelst and Rousseeuw 2009) to each combination of 3, 4 and 5 variables of these 16 variables; thus, the total number of models is 4823:

\[C^{14}_{3} +C^{14}_{4}+C^{14}_{5}=\frac{14!}{3!(14-3)!}+\frac {14!}{4!(14-4)!}+\frac {14!}{5!(14-5)!}=364+1001+2002=3367\]

5.1 Generate environmental background data

We will generate the random environmental points that will be used to estimate the Partial ROC test (see (Owens et al. 2012; Cobos et al. 2019)) of the calibrated models.

env_bg10 <- ntbox::sample_envbg(envlayers = am10,
                                nbg = 10000,
                                coordinates = TRUE,
                                rseed = 1111)
env_bg5 <- ntbox::sample_envbg(envlayers = am5,
                               nbg = 15000,
                               coordinates = TRUE,
                               rseed = 1111)
env_bg2.5 <- ntbox::sample_envbg(envlayers = am2.5,
                                 nbg = 30000,
                                 coordinates = TRUE,
                                 rseed = 1111)
bg10 <- env_bg10[,1:2]
bg5 <- env_bg5[,1:2]
bg2.5 <- env_bg2.5[,1:2]

5.2 Calibrate and select models

We will use a proportion of 0.95 of the training data; the omission rate criteria is 0.05 (5%). To get good speed performance, models will be calibrated and selected in parallel; each job will process 100 models.

nvarstest <- c(3,4,5)
t10 <- system.time({
  e_selct10 <- ntbox::ellipsoid_selection(env_train = dtrain_e10,
                                          env_test = dtest_e10,
                                          env_vars = env_vars,
                                          level = 0.95,
                                          nvarstest = nvarstest,
                                          env_bg = env_bg10,
                                          omr_criteria=0.05,
                                          parallel = TRUE,
                                          comp_each = 100,
                                          proc = TRUE,
                                          rseed = TRUE)
})

## -----------------------------------------------------------------------------------------
##      **** Starting model selection process ****
## -----------------------------------------------------------------------------------------
## 
## A total number of 364 models will be created for combinations of 14 variables taken by 3 
## 
## A total number of 1001 models will be created for combinations of 14 variables taken by 4 
## 
## A total number of 2002 models will be created for combinations of 14 variables taken by 5 
## 
## -----------------------------------------------------------------------------------------
##   **A total number of 3367 models will be tested **
## 
## -----------------------------------------------------------------------------------------
## Doing calibration from model  1 to  100 in process  1 
## 
## Doing calibration from model  101 to  200 in process  2 
## 
## Doing calibration from model  201 to  300 in process  3 
## 
## Doing calibration from model  301 to  400 in process  4 
## 
## Doing calibration from model  401 to  500 in process  5 
## 
## Doing calibration from model  501 to  600 in process  6 
## 
## Doing calibration from model  601 to  700 in process  7 
## 
## Doing calibration from model  701 to  800 in process  8 
## 
## Doing calibration from model  801 to  900 in process  9 
## 
## Doing calibration from model  901 to  1000 in process  10 
## 
## Doing calibration from model  1001 to  1100 in process  11 
## 
## Doing calibration from model  1101 to  1200 in process  12 
## 
## Doing calibration from model  1201 to  1300 in process  13 
## 
## Doing calibration from model  1301 to  1400 in process  14 
## 
## Doing calibration from model  1401 to  1500 in process  15 
## 
## Doing calibration from model  1501 to  1600 in process  16 
## 
## Doing calibration from model  1601 to  1700 in process  17 
## 
## Doing calibration from model  1701 to  1800 in process  18 
## 
## Doing calibration from model  1801 to  1900 in process  19 
## 
## Doing calibration from model  1901 to  2000 in process  20 
## 
## Doing calibration from model  2001 to  2100 in process  21 
## 
## Doing calibration from model  2101 to  2200 in process  22 
## 
## Doing calibration from model  2201 to  2300 in process  23 
## 
## Doing calibration from model  2301 to  2400 in process  24 
## 
## Doing calibration from model  2401 to  2500 in process  25 
## 
## Doing calibration from model  2501 to  2600 in process  26 
## 
## Doing calibration from model  2601 to  2700 in process  27 
## 
## Doing calibration from model  2701 to  2800 in process  28 
## 
## Doing calibration from model  2801 to  2900 in process  29 
## 
## Doing calibration from model  2901 to  3000 in process  30 
## 
## Doing calibration from model  3001 to  3100 in process  31 
## 
## Doing calibration from model  3101 to  3200 in process  32 
## 
## Doing calibration from model  3201 to  3300 in process  33 
## 
## Doing calibration from model  3301 to  3367 in process  34 
## 
## Finishing calibration of models  901 to  1000 
## 
## Finishing calibration of models  1001 to  1100 
## 
## Finishing calibration of models  1101 to  1200 
## 
## Finishing calibration of models  1201 to  1300 
## 
## Finishing calibration of models  1301 to  1400 
## 
## Finishing calibration of models  1401 to  1500 
## 
## Finishing calibration of models  1501 to  1600 
## 
## Finishing calibration of models  1601 to  1700 
## 
## Finishing calibration of models  1701 to  1800 
## 
## Finishing calibration of models  1801 to  1900 
## 
## Finishing calibration of models  1901 to  2000 
## 
## Finishing calibration of models  2001 to  2100 
## 
## Finishing calibration of models  2101 to  2200 
## 
## Finishing calibration of models  2201 to  2300 
## 
## Finishing calibration of models  2301 to  2400 
## 
## Finishing calibration of models  2401 to  2500 
## 
## Finishing calibration of models  2501 to  2600 
## 
## Finishing calibration of models  2601 to  2700 
## 
## Finishing calibration of models  2701 to  2800 
## 
## Finishing calibration of models  2801 to  2900 
## 
## Finishing calibration of models  2901 to  3000 
## 
## Finishing calibration of models  1 to  100 
## 
## Finishing calibration of models  3001 to  3100 
## 
## Finishing calibration of models  101 to  200 
## 
## Finishing calibration of models  3101 to  3200 
## 
## Finishing calibration of models  201 to  300 
## 
## Finishing calibration of models  3201 to  3300 
## 
## Finishing calibration of models  3301 to  3367 
## 
## Finishing calibration of models  301 to  400 
## 
## Finishing calibration of models  401 to  500 
## 
## Finishing calibration of models  501 to  600 
## 
## Finishing calibration of models  601 to  700 
## 
## Finishing calibration of models  701 to  800 
## 
## Finishing calibration of models  801 to  900 
## 
## Finishing...
## 
## -----------------------------------------------------------------------------------------
##   39 models passed omr_criteria for train data
##   6 models passed omr_criteria for test data
##   1 models passed omr_criteria for train and test data

t5 <- system.time({
  e_selct5 <- ntbox::ellipsoid_selection(env_train = dtrain_e5,
                                         env_test = dtest_e5,
                                         env_vars = env_vars,
                                         level = 0.95,
                                         nvarstest = nvarstest,
                                         env_bg = env_bg5,
                                         omr_criteria=0.05,
                                         parallel = TRUE,
                                         comp_each = 100,
                                         proc = TRUE,
                                         rseed = TRUE)
})

## -----------------------------------------------------------------------------------------
##      **** Starting model selection process ****
## -----------------------------------------------------------------------------------------
## 
## A total number of 364 models will be created for combinations of 14 variables taken by 3 
## 
## A total number of 1001 models will be created for combinations of 14 variables taken by 4 
## 
## A total number of 2002 models will be created for combinations of 14 variables taken by 5 
## 
## -----------------------------------------------------------------------------------------
##   **A total number of 3367 models will be tested **
## 
## -----------------------------------------------------------------------------------------
## Doing calibration from model  1 to  100 in process  1 
## 
## Doing calibration from model  101 to  200 in process  2 
## 
## Doing calibration from model  201 to  300 in process  3 
## 
## Doing calibration from model  301 to  400 in process  4 
## 
## Doing calibration from model  401 to  500 in process  5 
## 
## Doing calibration from model  501 to  600 in process  6 
## 
## Doing calibration from model  601 to  700 in process  7 
## 
## Doing calibration from model  701 to  800 in process  8 
## 
## Doing calibration from model  801 to  900 in process  9 
## 
## Doing calibration from model  901 to  1000 in process  10 
## 
## Doing calibration from model  1001 to  1100 in process  11 
## 
## Doing calibration from model  1101 to  1200 in process  12 
## 
## Doing calibration from model  1201 to  1300 in process  13 
## 
## Doing calibration from model  1301 to  1400 in process  14 
## 
## Doing calibration from model  1401 to  1500 in process  15 
## 
## Doing calibration from model  1501 to  1600 in process  16 
## 
## Doing calibration from model  1601 to  1700 in process  17 
## 
## Doing calibration from model  1701 to  1800 in process  18 
## 
## Doing calibration from model  1801 to  1900 in process  19 
## 
## Doing calibration from model  1901 to  2000 in process  20 
## 
## Doing calibration from model  2001 to  2100 in process  21 
## 
## Doing calibration from model  2101 to  2200 in process  22 
## 
## Doing calibration from model  2201 to  2300 in process  23 
## 
## Doing calibration from model  2301 to  2400 in process  24 
## 
## Doing calibration from model  2401 to  2500 in process  25 
## 
## Doing calibration from model  2501 to  2600 in process  26 
## 
## Doing calibration from model  2601 to  2700 in process  27 
## 
## Doing calibration from model  2701 to  2800 in process  28 
## 
## Doing calibration from model  2801 to  2900 in process  29 
## 
## Doing calibration from model  2901 to  3000 in process  30 
## 
## Doing calibration from model  3001 to  3100 in process  31 
## 
## Doing calibration from model  3101 to  3200 in process  32 
## 
## Doing calibration from model  3201 to  3300 in process  33 
## 
## Doing calibration from model  3301 to  3367 in process  34 
## 
## Finishing calibration of models  901 to  1000 
## 
## Finishing calibration of models  1001 to  1100 
## 
## Finishing calibration of models  1101 to  1200 
## 
## Finishing calibration of models  1201 to  1300 
## 
## Finishing calibration of models  1301 to  1400 
## 
## Finishing calibration of models  1401 to  1500 
## 
## Finishing calibration of models  1501 to  1600 
## 
## Finishing calibration of models  1601 to  1700 
## 
## Finishing calibration of models  1701 to  1800 
## 
## Finishing calibration of models  1801 to  1900 
## 
## Finishing calibration of models  1901 to  2000 
## 
## Finishing calibration of models  2001 to  2100 
## 
## Finishing calibration of models  2101 to  2200 
## 
## Finishing calibration of models  2201 to  2300 
## 
## Finishing calibration of models  2301 to  2400 
## 
## Finishing calibration of models  2401 to  2500 
## 
## Finishing calibration of models  2501 to  2600 
## 
## Finishing calibration of models  2601 to  2700 
## 
## Finishing calibration of models  2701 to  2800 
## 
## Finishing calibration of models  2801 to  2900 
## 
## Finishing calibration of models  2901 to  3000 
## 
## Finishing calibration of models  1 to  100 
## 
## Finishing calibration of models  3001 to  3100 
## 
## Finishing calibration of models  101 to  200 
## 
## Finishing calibration of models  3101 to  3200 
## 
## Finishing calibration of models  201 to  300 
## 
## Finishing calibration of models  3201 to  3300 
## 
## Finishing calibration of models  3301 to  3367 
## 
## Finishing calibration of models  301 to  400 
## 
## Finishing calibration of models  401 to  500 
## 
## Finishing calibration of models  501 to  600 
## 
## Finishing calibration of models  601 to  700 
## 
## Finishing calibration of models  701 to  800 
## 
## Finishing calibration of models  801 to  900 
## 
## Finishing...
## 
## -----------------------------------------------------------------------------------------
##   16 models passed omr_criteria for train data
##   19 models passed omr_criteria for test data
##   3 models passed omr_criteria for train and test data

t2.5 <- system.time({
  e_selct2.5 <- ntbox::ellipsoid_selection(env_train = dtrain_e2.5,
                                           env_test = dtest_e2.5,
                                           env_vars = env_vars,
                                           level = 0.95,
                                           nvarstest = nvarstest,
                                           env_bg = env_bg2.5,
                                           omr_criteria=0.05,
                                           parallel = TRUE,
                                           comp_each = 100,
                                           proc = TRUE,
                                           rseed = TRUE)
})

## -----------------------------------------------------------------------------------------
##      **** Starting model selection process ****
## -----------------------------------------------------------------------------------------
## 
## A total number of 364 models will be created for combinations of 14 variables taken by 3 
## 
## A total number of 1001 models will be created for combinations of 14 variables taken by 4 
## 
## A total number of 2002 models will be created for combinations of 14 variables taken by 5 
## 
## -----------------------------------------------------------------------------------------
##   **A total number of 3367 models will be tested **
## 
## -----------------------------------------------------------------------------------------
## Doing calibration from model  1 to  100 in process  1 
## 
## Doing calibration from model  101 to  200 in process  2 
## 
## Doing calibration from model  201 to  300 in process  3 
## 
## Doing calibration from model  301 to  400 in process  4 
## 
## Doing calibration from model  401 to  500 in process  5 
## 
## Doing calibration from model  501 to  600 in process  6 
## 
## Doing calibration from model  601 to  700 in process  7 
## 
## Doing calibration from model  701 to  800 in process  8 
## 
## Doing calibration from model  801 to  900 in process  9 
## 
## Doing calibration from model  901 to  1000 in process  10 
## 
## Doing calibration from model  1001 to  1100 in process  11 
## 
## Doing calibration from model  1101 to  1200 in process  12 
## 
## Doing calibration from model  1201 to  1300 in process  13 
## 
## Doing calibration from model  1301 to  1400 in process  14 
## 
## Doing calibration from model  1401 to  1500 in process  15 
## 
## Doing calibration from model  1501 to  1600 in process  16 
## 
## Doing calibration from model  1601 to  1700 in process  17 
## 
## Doing calibration from model  1701 to  1800 in process  18 
## 
## Doing calibration from model  1801 to  1900 in process  19 
## 
## Doing calibration from model  1901 to  2000 in process  20 
## 
## Doing calibration from model  2001 to  2100 in process  21 
## 
## Doing calibration from model  2101 to  2200 in process  22 
## 
## Doing calibration from model  2201 to  2300 in process  23 
## 
## Doing calibration from model  2301 to  2400 in process  24 
## 
## Doing calibration from model  2401 to  2500 in process  25 
## 
## Doing calibration from model  2501 to  2600 in process  26 
## 
## Doing calibration from model  2601 to  2700 in process  27 
## 
## Doing calibration from model  2701 to  2800 in process  28 
## 
## Doing calibration from model  2801 to  2900 in process  29 
## 
## Doing calibration from model  2901 to  3000 in process  30 
## 
## Doing calibration from model  3001 to  3100 in process  31 
## 
## Doing calibration from model  3101 to  3200 in process  32 
## 
## Doing calibration from model  3201 to  3300 in process  33 
## 
## Doing calibration from model  3301 to  3367 in process  34 
## 
## Finishing calibration of models  901 to  1000 
## 
## Finishing calibration of models  1001 to  1100 
## 
## Finishing calibration of models  1101 to  1200 
## 
## Finishing calibration of models  1201 to  1300 
## 
## Finishing calibration of models  1301 to  1400 
## 
## Finishing calibration of models  1401 to  1500 
## 
## Finishing calibration of models  1501 to  1600 
## 
## Finishing calibration of models  1601 to  1700 
## 
## Finishing calibration of models  1701 to  1800 
## 
## Finishing calibration of models  1801 to  1900 
## 
## Finishing calibration of models  1901 to  2000 
## 
## Finishing calibration of models  2001 to  2100 
## 
## Finishing calibration of models  2101 to  2200 
## 
## Finishing calibration of models  2201 to  2300 
## 
## Finishing calibration of models  2301 to  2400 
## 
## Finishing calibration of models  2401 to  2500 
## 
## Finishing calibration of models  2501 to  2600 
## 
## Finishing calibration of models  2601 to  2700 
## 
## Finishing calibration of models  2701 to  2800 
## 
## Finishing calibration of models  2801 to  2900 
## 
## Finishing calibration of models  2901 to  3000 
## 
## Finishing calibration of models  1 to  100 
## 
## Finishing calibration of models  3001 to  3100 
## 
## Finishing calibration of models  101 to  200 
## 
## Finishing calibration of models  3101 to  3200 
## 
## Finishing calibration of models  201 to  300 
## 
## Finishing calibration of models  3201 to  3300 
## 
## Finishing calibration of models  3301 to  3367 
## 
## Finishing calibration of models  301 to  400 
## 
## Finishing calibration of models  401 to  500 
## 
## Finishing calibration of models  501 to  600 
## 
## Finishing calibration of models  601 to  700 
## 
## Finishing calibration of models  701 to  800 
## 
## Finishing calibration of models  801 to  900 
## 
## Finishing...
## 
## -----------------------------------------------------------------------------------------
##   1 models passed omr_criteria for train data
##   425 models passed omr_criteria for test data
##   1 models passed omr_criteria for train and test data

# Save the results
#write.csv(e_selct,"margay_model_selection_results.csv",
#          row.names = FALSE)

The elapsed time in minutes

t10/60

##        user      system     elapsed 
## 0.016500000 0.004666667 1.770833333

t5/60

##       user     system    elapsed 
## 0.02216667 0.00700000 2.80416667

t2.5/60

##        user      system     elapsed 
## 0.027333333 0.005666667 3.976000000

names(e_selct2.5)

##  [1] "fitted_vars"            "nvars"                  "om_rate_train"         
##  [4] "om_rate_test"           "bg_prevalence"          "pval_bin"              
##  [7] "pval_proc"              "env_bg_paucratio"       "env_bg_auc"            
## [10] "mean_omr_train_test"    "rank_by_omr_train_test" "rank_omr_aucratio"

Now we show the results for the best models by resolution; here, the criteria for selecting models is filtering those models that have a mean omission rate of training and testing data less o equal than 5% percent and then ordered by maximum AUC. The table contains eleven fields:

fitted_vars: The fitted variables
nvars: Number of variables used to fit the ellipsoid model
om_rate_train: Omission rate of training data
om_rate_test: Omission rate of testing data
bg_prevalence: The estimated prevalence of the species in background data
pval_bin: The p-value of the binomial test (see (Peterson, Papes, and Soberon 2008)) performed in environmental space
pval_proc: The p-value of the partial ROC test performed in environmental space
env_bg_paucratio: Environmental background AUC ratio for partial ROC test
env_bg_auc: Environmental background AUC.
mean_omr_train_test: Mean omission rate of testing and training data
rank_by_omr_train_test: The rank of the models given testing and training data

best10m <- e_selct10 %>% 
  filter(mean_omr_train_test<=0.07) %>% 
  #arrange(desc(env_bg_auc))  %>% 
  mutate(Resolution="10 min",
         N_models = nrow(e_selct10),
         Time_to_run = paste(round(t10[3]/60,2),"mins"))
best5m <- e_selct5 %>% 
  filter(mean_omr_train_test<=0.07) %>% 
  #arrange(desc(env_bg_auc))  %>% 
  mutate(Resolution="5 min",
            N_models = nrow(e_selct5),
            Time_to_run = paste(round(t5[3]/60,2),"mins"))
                                    
best2.5m <- e_selct2.5 %>% 
  filter(mean_omr_train_test<=0.07) %>% 
  #arrange(desc(env_bg_auc)) %>% 
  mutate(Resolution="2.5 min",
         N_models = nrow(e_selct2.5),
         Time_to_run = paste(round(t2.5[3]/60,2),"mins"))


rshow <- c("fitted_vars",
  "om_rate_train",
  "om_rate_test",
  "pval_proc",
  "env_bg_paucratio",
  "env_bg_auc",
  "mean_omr_train_test",
  "Resolution","N_models",
  "Time_to_run")

all_res <- rbind(rbind(best10m[1,rshow],best5m[1,rshow],best2.5m[1,rshow]))
knitr::kable(all_res)

fitted_vars	om_rate_train	om_rate_test	env_bg_paucratio	env_bg_auc	mean_omr_train_test	Resolution	N_models	Time_to_run
bio12,bio19,bio4	0.0493827	0.0434783	1.514281	0.8634392	0.0464305	10 min	3367	1.77 mins
bio12,bio19,bio3,bio4	0.0468750	0.0487805	1.541765	0.8649882	0.0478277	5 min	3367	2.8 mins
bio13,bio15,bio18,bio4,bio9	0.0465116	0.0329670	1.529130	0.8489375	0.0397393	2.5 min	3367	3.98 mins

6 Project the best model

We will project the best models according to the above table. For another complete example see the help of ?ntbox::ellipsoid_selection.

# Select the model number  one in table e_select
modelvars10 <- as.character(all_res$fitted_vars[1]) %>%
  stringr::str_split(pattern = ",",string = .) %>% unlist(.)
modelvars5 <- as.character(all_res$fitted_vars[2]) %>%
  stringr::str_split(pattern = ",",string = .) %>% unlist(.)
modelvars2.5 <- as.character(all_res$fitted_vars[3]) %>%
  stringr::str_split(pattern = ",",string = .) %>% unlist(.)

Fit the models in environmental space.

eall10mins <- rbind(dtrain_e10,dtest_e10)
eall5mins <- rbind(dtrain_e5,dtest_e5)
eall2.5mins <- rbind(dtrain_e2.5,dtest_e2.5)

best_mod10 <- ntbox::cov_center(eall10mins[,modelvars10],
                              mve = T,
                              level = 0.99,
                              vars = 1:length(modelvars10))

best_mod5 <- ntbox::cov_center(eall10mins[,modelvars5],
                               mve = T,
                               level = 0.99,
                               vars = 1:length(modelvars5))

best_mod2.5 <- ntbox::cov_center(eall2.5mins[,modelvars2.5],
                              mve = T,
                              level = 0.99,
                              vars = 1:length(modelvars2.5))

mProj10 <- ntbox::ellipsoidfit(am10[[modelvars10]],
                             centroid = best_mod10$centroid,
                             covar = best_mod10$covariance,
                             level = 0.99,size = 3)
if(length(modelvars10)==3){
  rgl::rglwidget(reuse = FALSE)
}

mProj5 <- ntbox::ellipsoidfit(am5[[modelvars5]],
                             centroid = best_mod5$centroid,
                             covar = best_mod5$covariance,
                             level = 0.99,size = 3)
if(length(modelvars5)==3){
  rgl::rglwidget(reuse = FALSE)
}
  

mProj2.5 <- ntbox::ellipsoidfit(am2.5[[modelvars2.5]],
                             centroid = best_mod2.5$centroid,
                             covar = best_mod2.5$covariance,
                             level = 0.99,size = 3)
if(length(modelvars2.5)==3){
  rgl::rglwidget(reuse = FALSE)
}

Project models in geographic space

#install.packages("wesanderson")
library(wesanderson)
raster::plot(mProj10$suitRaster,
             col= wes_palette("Zissou1", 500, type = "continuous"),
             main="Best model 10 minutes",)

raster::plot(mProj5$suitRaster,
             col= wes_palette("Zissou1", 500, type = "continuous"),
             main="Best model 5 minutes")

raster::plot(mProj2.5$suitRaster,
             col= wes_palette("Zissou1", 500, type = "continuous"),
             main="Best model 2.5 minutes")

7 Binarize maps

We will use a threshold of 10 percent of the training data to binarize the models

bin10 <- ntbox::bin_model(model = mProj10$suitRaster,
                          occs = dAll_ge10c[,1:2],
                          percent = 10)
raster::plot(bin10,col=c("#d8d6d6","#03a0ff"))

bin5 <-  ntbox::bin_model(model = mProj5$suitRaster,
                          occs = dAll_ge10c[,1:2],
                          percent = 10)
raster::plot(bin5,col=c("#d8d6d6","#03a0ff"))

bin2.5 <- ntbox::bin_model(model = mProj2.5$suitRaster,
                            occs = dAll_ge10c[,1:2],
                            percent = 10)
raster::plot(bin2.5,col=c("#d8d6d6","#03a0ff"))

7.1 PCA transformation

In this last section, we show how to use ntbox for transforming environmental layers to PCs. The function that does the transformation and projects it in the geographical space is ntbox::spca. The main argument of the function is the raster stack of environmental layers to be transformed.

am_pc10 <- ntbox::spca(layers_stack=am10)

Let’s see the scree plot

am_pc10$pca_plot

## Warning: Removed 6 rows containing missing values (position_stack).

## Warning: Removed 6 rows containing missing values (position_stack).

## Warning: Removed 6 rows containing missing values (geom_text).

The first four principal components explain ~91% of the total environmental variance.

plot(am_pc10$pc_layers[[1:4]])

References

Cobos, Marlon E., A. Townsend Peterson, Narayani Barve, and Luis Osorio-Olvera. 2019. “kuenm: an R package for detailed development of ecological niche models using Maxent.” PeerJ 7 (February): e6281. https://doi.org/10.7717/peerj.6281.

Hijmans, Robert J. 2017. “raster: Geographic analysis and modeling with raster data.” http://cran.r-project.org/package=raster.

Owens, H, L Campbell, L Dornak, E Saupe, N Barve, J Soberon, K Ingenloff, A Lira-Noriega, C Hensz, and C Myers. 2012. “Constraints on Interpretation of Ecological Niche Models by Limited Environmental Ranges on Calibration Areas: Software Script Appendix.”

Peterson, A. Townsend, Monica Papes, and Jorge Soberon. 2008. “Rethinking receiver operating characteristic analysis applications in ecological niche modeling.” Ecol. Modell. 213 (1): 63–72.

Sanchez, O., M. A. Pineda, H. Benitez., B. González, and H. Berlanga. 1998. Guía de identificación para las aves y mamíferos silvestres de mayor comercio en México protegidos por la C.I.T.E.S. Edited by SEMARNAP and CONABIO. México.

Van Aelst, Stefan, and Peter Rousseeuw. 2009. “Minimum volume ellipsoid.” Wiley Interdiscip. Rev. Comput. Stat. 1 (1): 71–82. https://doi.org/10.1002/wics.19.