NicheToolBox: An example of the model selection protocol for ellipsoid models
1 The example
We demonstrate some of the main functions of ntbox by modeling the potential distribution of Leopardus wiedii, a near-threatened small cat that lives in the Neotropics (Fig. 1).
We modeled the ecological niche of L. wieddi using ellipsoids and show the performance and speed of ntbox model calibration and selection protocol for MVEs by using environmental information from North America at three different spatial resolutions (10’, 5’ and 2.5’).
First, we set the random seed to make our the example reproducible.
2 Get the data for the example
2.1 Environmental data
ntbox
has 4 different native functions to get environmental data, each one is related to the following databases:
- CHELSA (
?ntbox::get_chelsa
) - ENVIREM (
?ntbox::get_envirem_clim
and?ntbox::get_envirem_elev
for climatic and elevation data respectevly) - Bio-Oracle (
?ntbox::get_bio_oracle
).
Here, we use the function getData
from the package raster (Hijmans 2017) to download the WorldClim at 10, 5, and 2.5 ArcMinutes of resolution.
2.2 Crop and mask environmental data
Reading a shapefile for America
amp <- normalizePath("../america_sin_islas")
am <- rgdal::readOGR(dsn =amp,layer="america_sn_islas")
Cut the layers using America as a mask
2.3 Geographic data
From ntbox, we downloaded available occurrences for L. wiedii from the Global Biodiversity Information Facility (https://www.gbif.org) and explore what is the provenance and date of collecting these points.
dAll_a <- ntbox::searh_gbif_data(genus = "Leopardus",
species = "wiedii",
occlim = 5000,
leafletplot = TRUE)
## Total number of occurrence data found: 859
We select those records starting in 1950 as we will use the bioclimatic layers from WorldClim for the modeling process.
dAll_b <- ntbox::clean_dup(dAll_a,longitude = "longitude","latitude",
threshold = 0)
cat("Total number of occurrence data affter cleanining spatial duplicates:",
nrow(dAll_b))
## Total number of occurrence data affter cleanining spatial duplicates: 526
dAll_c <- dAll_b %>% dplyr::filter(year>=1950)
cat("Total number of occurrence data for periods >=1950:",
nrow(dAll_c))
## Total number of occurrence data for periods >=1950: 384
m <- leaflet::leaflet(dAll_c)
m <- m %>% leaflet::addTiles()
m <- m %>% leaflet::addCircleMarkers(lng = ~longitude,
lat = ~latitude,
popup = ~leaflet_info,
fillOpacity = 0.25,
radius = 7,col="green")
m
Remove wired occurrences. Click on the pop-up to display gbif information (available when the downloaded data comes from search_gbif
function), the points that are outside the distribution are the one on San Francisco (this comes from a collection; rowID=632), the record on Florida (rowID=508,489), and the ones that in the sea (540,591).
# Indixes of the wired data (can change depending the date of the gbif query)
to_rmIDs <- c(632,489,508,540,591)
to_rm <- which(dAll_c$ntboxID %in% to_rmIDs)
dAll <- dAll_c[-to_rm,]
m <- leaflet::leaflet(dAll)
m <- m %>% leaflet::addTiles()
m <- m %>% leaflet::addCircleMarkers(lng = ~longitude,
lat = ~latitude,
popup = ~leaflet_info,
fillOpacity = 0.25,
radius = 7,col="green")
m
2.4 Remove environmental duplicates
First, we extract environmental information from occurrences
dAll_e10 <- raster::extract(am10,dAll[,2:3])
dAll_ge10 <- data.frame(dAll[,c(2:3,ncol(dAll))],
dAll_e10)
dAll_e5 <- raster::extract(am5,dAll[,2:3])
dAll_ge5 <- data.frame(dAll[,c(2:3,ncol(dAll))],
dAll_e5)
dAll_e2.5 <- raster::extract(am2.5,dAll[,2:3])
dAll_ge2.5 <- data.frame(dAll[,c(2:3,ncol(dAll))],
dAll_e2.5)
We remove duplicated data
dAll_ge10c <- ntbox::clean_dup(dAll_ge10,
longitude ="longitude",
latitude = "latitude",
threshold = res(am10)[1])
# remove NA's
dAll_ge10c <- unique(na.omit(dAll_ge10c))
dAll_ge5c <- ntbox::clean_dup(dAll_ge5,
longitude ="longitude",
latitude = "latitude",
threshold = res(am5)[1])
# remove NA's
dAll_ge5c <- unique(na.omit(dAll_ge5c))
dAll_ge2.5c <- ntbox::clean_dup(dAll_ge2.5,
longitude ="longitude",
latitude = "latitude",
threshold = res(am2.5)[1])
# remove NA's
dAll_ge2.5c <- unique(na.omit(dAll_ge2.5c))
Explore the curated data on a leaflet map for the three spatial resolutions (2.5=blue,5=green,10=red)
mc <- leaflet::leaflet(dAll_ge2.5c)
mc <- mc %>% leaflet::addTiles()
mc <- mc %>% leaflet::addCircleMarkers(lng = ~longitude,
lat = ~latitude,
popup = ~leaflet_info,
fillOpacity = 0.25,
radius = 7,col="blue") %>%
leaflet::addCircleMarkers(lng = dAll_ge5c$longitude,
lat = dAll_ge5c$latitude,
popup = dAll_ge5c$leaflet_info,
fillOpacity = 0.25,
radius = 7,col="green") %>%
leaflet::addCircleMarkers(lng = dAll_ge10c$longitude,
lat = dAll_ge10c$latitude,
popup = dAll_ge10c$leaflet_info,
fillOpacity = 0.25,
radius = 7,col="red")
mc
The following table shows the number of records before and after data cleaning process for each spatial resolution
Resolution | GBIF_records_found | Records_affter_claening |
---|---|---|
10 min | 859 | 231 |
5 min | 859 | 274 |
2.5 min | 859 | 306 |
3 Split the data in train and testing
Now we will create train and testing data using a proportion of 70:30 respectively
trainID10 <- sample(nrow(dAll_ge10c),
size =ceiling(nrow(dAll_ge10c)*0.7))
trainID5 <- sample(nrow(dAll_ge5c),
size =ceiling(nrow(dAll_ge5c)*0.7))
trainID2.5 <- sample(nrow(dAll_ge2.5c),
size =ceiling(nrow(dAll_ge2.5c)*0.7))
Geographic train and test data
dtrain10 <- dAll_ge10c[trainID10,1:2]
dtest10 <- dAll_ge10c[-trainID10,1:2]
dtrain5 <- dAll_ge5c[trainID5,1:2]
dtest5 <- dAll_ge5c[-trainID5,1:2]
dtrain2.5 <- dAll_ge5c[trainID2.5,1:2]
dtest2.5 <- dAll_ge5c[-trainID2.5,1:2]
Environmental train and test
5 Ellipsoid calibration and selection
To calibrate the models ntbox
estimates all combinations (\(C^n_k\)) of \(n\) variables, taken \(k\) at a time for each \(k= 2,3,\dots, m\), where \(m\) is lesser than \(n\). It is known that
\[\displaystyle C^n_k =\frac {n!}{k!(n-k)!}.\]
In this example, after selecting 16 of the less correlated variables (see above section) we fit Minimum Volume Ellipsoid Models (Van Aelst and Rousseeuw 2009) to each combination of 3, 4 and 5 variables of these 16 variables; thus, the total number of models is 4823:
\[C^{14}_{3} +C^{14}_{4}+C^{14}_{5}=\frac{14!}{3!(14-3)!}+\frac {14!}{4!(14-4)!}+\frac {14!}{5!(14-5)!}=364+1001+2002=3367\]
5.1 Generate environmental background data
We will generate the random environmental points that will be used to estimate the Partial ROC test (see (Owens et al. 2012; Cobos et al. 2019)) of the calibrated models.
env_bg10 <- ntbox::sample_envbg(envlayers = am10,
nbg = 10000,
coordinates = TRUE,
rseed = 1111)
env_bg5 <- ntbox::sample_envbg(envlayers = am5,
nbg = 15000,
coordinates = TRUE,
rseed = 1111)
env_bg2.5 <- ntbox::sample_envbg(envlayers = am2.5,
nbg = 30000,
coordinates = TRUE,
rseed = 1111)
bg10 <- env_bg10[,1:2]
bg5 <- env_bg5[,1:2]
bg2.5 <- env_bg2.5[,1:2]
5.2 Calibrate and select models
We will use a proportion of 0.95 of the training data; the omission rate criteria is 0.05 (5%). To get good speed performance, models will be calibrated and selected in parallel; each job will process 100 models.
nvarstest <- c(3,4,5)
t10 <- system.time({
e_selct10 <- ntbox::ellipsoid_selection(env_train = dtrain_e10,
env_test = dtest_e10,
env_vars = env_vars,
level = 0.95,
nvarstest = nvarstest,
env_bg = env_bg10,
omr_criteria=0.05,
parallel = TRUE,
comp_each = 100,
proc = TRUE,
rseed = TRUE)
})
## -----------------------------------------------------------------------------------------
## **** Starting model selection process ****
## -----------------------------------------------------------------------------------------
##
## A total number of 364 models will be created for combinations of 14 variables taken by 3
##
## A total number of 1001 models will be created for combinations of 14 variables taken by 4
##
## A total number of 2002 models will be created for combinations of 14 variables taken by 5
##
## -----------------------------------------------------------------------------------------
## **A total number of 3367 models will be tested **
##
## -----------------------------------------------------------------------------------------
## Doing calibration from model 1 to 100 in process 1
##
## Doing calibration from model 101 to 200 in process 2
##
## Doing calibration from model 201 to 300 in process 3
##
## Doing calibration from model 301 to 400 in process 4
##
## Doing calibration from model 401 to 500 in process 5
##
## Doing calibration from model 501 to 600 in process 6
##
## Doing calibration from model 601 to 700 in process 7
##
## Doing calibration from model 701 to 800 in process 8
##
## Doing calibration from model 801 to 900 in process 9
##
## Doing calibration from model 901 to 1000 in process 10
##
## Doing calibration from model 1001 to 1100 in process 11
##
## Doing calibration from model 1101 to 1200 in process 12
##
## Doing calibration from model 1201 to 1300 in process 13
##
## Doing calibration from model 1301 to 1400 in process 14
##
## Doing calibration from model 1401 to 1500 in process 15
##
## Doing calibration from model 1501 to 1600 in process 16
##
## Doing calibration from model 1601 to 1700 in process 17
##
## Doing calibration from model 1701 to 1800 in process 18
##
## Doing calibration from model 1801 to 1900 in process 19
##
## Doing calibration from model 1901 to 2000 in process 20
##
## Doing calibration from model 2001 to 2100 in process 21
##
## Doing calibration from model 2101 to 2200 in process 22
##
## Doing calibration from model 2201 to 2300 in process 23
##
## Doing calibration from model 2301 to 2400 in process 24
##
## Doing calibration from model 2401 to 2500 in process 25
##
## Doing calibration from model 2501 to 2600 in process 26
##
## Doing calibration from model 2601 to 2700 in process 27
##
## Doing calibration from model 2701 to 2800 in process 28
##
## Doing calibration from model 2801 to 2900 in process 29
##
## Doing calibration from model 2901 to 3000 in process 30
##
## Doing calibration from model 3001 to 3100 in process 31
##
## Doing calibration from model 3101 to 3200 in process 32
##
## Doing calibration from model 3201 to 3300 in process 33
##
## Doing calibration from model 3301 to 3367 in process 34
##
## Finishing calibration of models 901 to 1000
##
## Finishing calibration of models 1001 to 1100
##
## Finishing calibration of models 1101 to 1200
##
## Finishing calibration of models 1201 to 1300
##
## Finishing calibration of models 1301 to 1400
##
## Finishing calibration of models 1401 to 1500
##
## Finishing calibration of models 1501 to 1600
##
## Finishing calibration of models 1601 to 1700
##
## Finishing calibration of models 1701 to 1800
##
## Finishing calibration of models 1801 to 1900
##
## Finishing calibration of models 1901 to 2000
##
## Finishing calibration of models 2001 to 2100
##
## Finishing calibration of models 2101 to 2200
##
## Finishing calibration of models 2201 to 2300
##
## Finishing calibration of models 2301 to 2400
##
## Finishing calibration of models 2401 to 2500
##
## Finishing calibration of models 2501 to 2600
##
## Finishing calibration of models 2601 to 2700
##
## Finishing calibration of models 2701 to 2800
##
## Finishing calibration of models 2801 to 2900
##
## Finishing calibration of models 2901 to 3000
##
## Finishing calibration of models 1 to 100
##
## Finishing calibration of models 3001 to 3100
##
## Finishing calibration of models 101 to 200
##
## Finishing calibration of models 3101 to 3200
##
## Finishing calibration of models 201 to 300
##
## Finishing calibration of models 3201 to 3300
##
## Finishing calibration of models 3301 to 3367
##
## Finishing calibration of models 301 to 400
##
## Finishing calibration of models 401 to 500
##
## Finishing calibration of models 501 to 600
##
## Finishing calibration of models 601 to 700
##
## Finishing calibration of models 701 to 800
##
## Finishing calibration of models 801 to 900
##
## Finishing...
##
## -----------------------------------------------------------------------------------------
## 39 models passed omr_criteria for train data
## 6 models passed omr_criteria for test data
## 1 models passed omr_criteria for train and test data
t5 <- system.time({
e_selct5 <- ntbox::ellipsoid_selection(env_train = dtrain_e5,
env_test = dtest_e5,
env_vars = env_vars,
level = 0.95,
nvarstest = nvarstest,
env_bg = env_bg5,
omr_criteria=0.05,
parallel = TRUE,
comp_each = 100,
proc = TRUE,
rseed = TRUE)
})
## -----------------------------------------------------------------------------------------
## **** Starting model selection process ****
## -----------------------------------------------------------------------------------------
##
## A total number of 364 models will be created for combinations of 14 variables taken by 3
##
## A total number of 1001 models will be created for combinations of 14 variables taken by 4
##
## A total number of 2002 models will be created for combinations of 14 variables taken by 5
##
## -----------------------------------------------------------------------------------------
## **A total number of 3367 models will be tested **
##
## -----------------------------------------------------------------------------------------
## Doing calibration from model 1 to 100 in process 1
##
## Doing calibration from model 101 to 200 in process 2
##
## Doing calibration from model 201 to 300 in process 3
##
## Doing calibration from model 301 to 400 in process 4
##
## Doing calibration from model 401 to 500 in process 5
##
## Doing calibration from model 501 to 600 in process 6
##
## Doing calibration from model 601 to 700 in process 7
##
## Doing calibration from model 701 to 800 in process 8
##
## Doing calibration from model 801 to 900 in process 9
##
## Doing calibration from model 901 to 1000 in process 10
##
## Doing calibration from model 1001 to 1100 in process 11
##
## Doing calibration from model 1101 to 1200 in process 12
##
## Doing calibration from model 1201 to 1300 in process 13
##
## Doing calibration from model 1301 to 1400 in process 14
##
## Doing calibration from model 1401 to 1500 in process 15
##
## Doing calibration from model 1501 to 1600 in process 16
##
## Doing calibration from model 1601 to 1700 in process 17
##
## Doing calibration from model 1701 to 1800 in process 18
##
## Doing calibration from model 1801 to 1900 in process 19
##
## Doing calibration from model 1901 to 2000 in process 20
##
## Doing calibration from model 2001 to 2100 in process 21
##
## Doing calibration from model 2101 to 2200 in process 22
##
## Doing calibration from model 2201 to 2300 in process 23
##
## Doing calibration from model 2301 to 2400 in process 24
##
## Doing calibration from model 2401 to 2500 in process 25
##
## Doing calibration from model 2501 to 2600 in process 26
##
## Doing calibration from model 2601 to 2700 in process 27
##
## Doing calibration from model 2701 to 2800 in process 28
##
## Doing calibration from model 2801 to 2900 in process 29
##
## Doing calibration from model 2901 to 3000 in process 30
##
## Doing calibration from model 3001 to 3100 in process 31
##
## Doing calibration from model 3101 to 3200 in process 32
##
## Doing calibration from model 3201 to 3300 in process 33
##
## Doing calibration from model 3301 to 3367 in process 34
##
## Finishing calibration of models 901 to 1000
##
## Finishing calibration of models 1001 to 1100
##
## Finishing calibration of models 1101 to 1200
##
## Finishing calibration of models 1201 to 1300
##
## Finishing calibration of models 1301 to 1400
##
## Finishing calibration of models 1401 to 1500
##
## Finishing calibration of models 1501 to 1600
##
## Finishing calibration of models 1601 to 1700
##
## Finishing calibration of models 1701 to 1800
##
## Finishing calibration of models 1801 to 1900
##
## Finishing calibration of models 1901 to 2000
##
## Finishing calibration of models 2001 to 2100
##
## Finishing calibration of models 2101 to 2200
##
## Finishing calibration of models 2201 to 2300
##
## Finishing calibration of models 2301 to 2400
##
## Finishing calibration of models 2401 to 2500
##
## Finishing calibration of models 2501 to 2600
##
## Finishing calibration of models 2601 to 2700
##
## Finishing calibration of models 2701 to 2800
##
## Finishing calibration of models 2801 to 2900
##
## Finishing calibration of models 2901 to 3000
##
## Finishing calibration of models 1 to 100
##
## Finishing calibration of models 3001 to 3100
##
## Finishing calibration of models 101 to 200
##
## Finishing calibration of models 3101 to 3200
##
## Finishing calibration of models 201 to 300
##
## Finishing calibration of models 3201 to 3300
##
## Finishing calibration of models 3301 to 3367
##
## Finishing calibration of models 301 to 400
##
## Finishing calibration of models 401 to 500
##
## Finishing calibration of models 501 to 600
##
## Finishing calibration of models 601 to 700
##
## Finishing calibration of models 701 to 800
##
## Finishing calibration of models 801 to 900
##
## Finishing...
##
## -----------------------------------------------------------------------------------------
## 16 models passed omr_criteria for train data
## 19 models passed omr_criteria for test data
## 3 models passed omr_criteria for train and test data
t2.5 <- system.time({
e_selct2.5 <- ntbox::ellipsoid_selection(env_train = dtrain_e2.5,
env_test = dtest_e2.5,
env_vars = env_vars,
level = 0.95,
nvarstest = nvarstest,
env_bg = env_bg2.5,
omr_criteria=0.05,
parallel = TRUE,
comp_each = 100,
proc = TRUE,
rseed = TRUE)
})
## -----------------------------------------------------------------------------------------
## **** Starting model selection process ****
## -----------------------------------------------------------------------------------------
##
## A total number of 364 models will be created for combinations of 14 variables taken by 3
##
## A total number of 1001 models will be created for combinations of 14 variables taken by 4
##
## A total number of 2002 models will be created for combinations of 14 variables taken by 5
##
## -----------------------------------------------------------------------------------------
## **A total number of 3367 models will be tested **
##
## -----------------------------------------------------------------------------------------
## Doing calibration from model 1 to 100 in process 1
##
## Doing calibration from model 101 to 200 in process 2
##
## Doing calibration from model 201 to 300 in process 3
##
## Doing calibration from model 301 to 400 in process 4
##
## Doing calibration from model 401 to 500 in process 5
##
## Doing calibration from model 501 to 600 in process 6
##
## Doing calibration from model 601 to 700 in process 7
##
## Doing calibration from model 701 to 800 in process 8
##
## Doing calibration from model 801 to 900 in process 9
##
## Doing calibration from model 901 to 1000 in process 10
##
## Doing calibration from model 1001 to 1100 in process 11
##
## Doing calibration from model 1101 to 1200 in process 12
##
## Doing calibration from model 1201 to 1300 in process 13
##
## Doing calibration from model 1301 to 1400 in process 14
##
## Doing calibration from model 1401 to 1500 in process 15
##
## Doing calibration from model 1501 to 1600 in process 16
##
## Doing calibration from model 1601 to 1700 in process 17
##
## Doing calibration from model 1701 to 1800 in process 18
##
## Doing calibration from model 1801 to 1900 in process 19
##
## Doing calibration from model 1901 to 2000 in process 20
##
## Doing calibration from model 2001 to 2100 in process 21
##
## Doing calibration from model 2101 to 2200 in process 22
##
## Doing calibration from model 2201 to 2300 in process 23
##
## Doing calibration from model 2301 to 2400 in process 24
##
## Doing calibration from model 2401 to 2500 in process 25
##
## Doing calibration from model 2501 to 2600 in process 26
##
## Doing calibration from model 2601 to 2700 in process 27
##
## Doing calibration from model 2701 to 2800 in process 28
##
## Doing calibration from model 2801 to 2900 in process 29
##
## Doing calibration from model 2901 to 3000 in process 30
##
## Doing calibration from model 3001 to 3100 in process 31
##
## Doing calibration from model 3101 to 3200 in process 32
##
## Doing calibration from model 3201 to 3300 in process 33
##
## Doing calibration from model 3301 to 3367 in process 34
##
## Finishing calibration of models 901 to 1000
##
## Finishing calibration of models 1001 to 1100
##
## Finishing calibration of models 1101 to 1200
##
## Finishing calibration of models 1201 to 1300
##
## Finishing calibration of models 1301 to 1400
##
## Finishing calibration of models 1401 to 1500
##
## Finishing calibration of models 1501 to 1600
##
## Finishing calibration of models 1601 to 1700
##
## Finishing calibration of models 1701 to 1800
##
## Finishing calibration of models 1801 to 1900
##
## Finishing calibration of models 1901 to 2000
##
## Finishing calibration of models 2001 to 2100
##
## Finishing calibration of models 2101 to 2200
##
## Finishing calibration of models 2201 to 2300
##
## Finishing calibration of models 2301 to 2400
##
## Finishing calibration of models 2401 to 2500
##
## Finishing calibration of models 2501 to 2600
##
## Finishing calibration of models 2601 to 2700
##
## Finishing calibration of models 2701 to 2800
##
## Finishing calibration of models 2801 to 2900
##
## Finishing calibration of models 2901 to 3000
##
## Finishing calibration of models 1 to 100
##
## Finishing calibration of models 3001 to 3100
##
## Finishing calibration of models 101 to 200
##
## Finishing calibration of models 3101 to 3200
##
## Finishing calibration of models 201 to 300
##
## Finishing calibration of models 3201 to 3300
##
## Finishing calibration of models 3301 to 3367
##
## Finishing calibration of models 301 to 400
##
## Finishing calibration of models 401 to 500
##
## Finishing calibration of models 501 to 600
##
## Finishing calibration of models 601 to 700
##
## Finishing calibration of models 701 to 800
##
## Finishing calibration of models 801 to 900
##
## Finishing...
##
## -----------------------------------------------------------------------------------------
## 1 models passed omr_criteria for train data
## 425 models passed omr_criteria for test data
## 1 models passed omr_criteria for train and test data
The elapsed time in minutes
## user system elapsed
## 0.016500000 0.004666667 1.770833333
## user system elapsed
## 0.02216667 0.00700000 2.80416667
## user system elapsed
## 0.027333333 0.005666667 3.976000000
## [1] "fitted_vars" "nvars" "om_rate_train"
## [4] "om_rate_test" "bg_prevalence" "pval_bin"
## [7] "pval_proc" "env_bg_paucratio" "env_bg_auc"
## [10] "mean_omr_train_test" "rank_by_omr_train_test" "rank_omr_aucratio"
Now we show the results for the best models by resolution; here, the criteria for selecting models is filtering those models that have a mean omission rate of training and testing data less o equal than 5% percent and then ordered by maximum AUC. The table contains eleven fields:
- fitted_vars: The fitted variables
- nvars: Number of variables used to fit the ellipsoid model
- om_rate_train: Omission rate of training data
- om_rate_test: Omission rate of testing data
- bg_prevalence: The estimated prevalence of the species in background data
- pval_bin: The p-value of the binomial test (see (Peterson, Papes, and Soberon 2008)) performed in environmental space
- pval_proc: The p-value of the partial ROC test performed in environmental space
- env_bg_paucratio: Environmental background AUC ratio for partial ROC test
- env_bg_auc: Environmental background AUC.
- mean_omr_train_test: Mean omission rate of testing and training data
- rank_by_omr_train_test: The rank of the models given testing and training data
best10m <- e_selct10 %>%
filter(mean_omr_train_test<=0.07) %>%
#arrange(desc(env_bg_auc)) %>%
mutate(Resolution="10 min",
N_models = nrow(e_selct10),
Time_to_run = paste(round(t10[3]/60,2),"mins"))
best5m <- e_selct5 %>%
filter(mean_omr_train_test<=0.07) %>%
#arrange(desc(env_bg_auc)) %>%
mutate(Resolution="5 min",
N_models = nrow(e_selct5),
Time_to_run = paste(round(t5[3]/60,2),"mins"))
best2.5m <- e_selct2.5 %>%
filter(mean_omr_train_test<=0.07) %>%
#arrange(desc(env_bg_auc)) %>%
mutate(Resolution="2.5 min",
N_models = nrow(e_selct2.5),
Time_to_run = paste(round(t2.5[3]/60,2),"mins"))
rshow <- c("fitted_vars",
"om_rate_train",
"om_rate_test",
"pval_proc",
"env_bg_paucratio",
"env_bg_auc",
"mean_omr_train_test",
"Resolution","N_models",
"Time_to_run")
all_res <- rbind(rbind(best10m[1,rshow],best5m[1,rshow],best2.5m[1,rshow]))
knitr::kable(all_res)
fitted_vars | om_rate_train | om_rate_test | pval_proc | env_bg_paucratio | env_bg_auc | mean_omr_train_test | Resolution | N_models | Time_to_run |
---|---|---|---|---|---|---|---|---|---|
bio12,bio19,bio4 | 0.0493827 | 0.0434783 | 0 | 1.514281 | 0.8634392 | 0.0464305 | 10 min | 3367 | 1.77 mins |
bio12,bio19,bio3,bio4 | 0.0468750 | 0.0487805 | 0 | 1.541765 | 0.8649882 | 0.0478277 | 5 min | 3367 | 2.8 mins |
bio13,bio15,bio18,bio4,bio9 | 0.0465116 | 0.0329670 | 0 | 1.529130 | 0.8489375 | 0.0397393 | 2.5 min | 3367 | 3.98 mins |
6 Project the best model
We will project the best models according to the above table. For another complete example see the help of ?ntbox::ellipsoid_selection
.
# Select the model number one in table e_select
modelvars10 <- as.character(all_res$fitted_vars[1]) %>%
stringr::str_split(pattern = ",",string = .) %>% unlist(.)
modelvars5 <- as.character(all_res$fitted_vars[2]) %>%
stringr::str_split(pattern = ",",string = .) %>% unlist(.)
modelvars2.5 <- as.character(all_res$fitted_vars[3]) %>%
stringr::str_split(pattern = ",",string = .) %>% unlist(.)
Fit the models in environmental space.
eall10mins <- rbind(dtrain_e10,dtest_e10)
eall5mins <- rbind(dtrain_e5,dtest_e5)
eall2.5mins <- rbind(dtrain_e2.5,dtest_e2.5)
best_mod10 <- ntbox::cov_center(eall10mins[,modelvars10],
mve = T,
level = 0.99,
vars = 1:length(modelvars10))
best_mod5 <- ntbox::cov_center(eall10mins[,modelvars5],
mve = T,
level = 0.99,
vars = 1:length(modelvars5))
best_mod2.5 <- ntbox::cov_center(eall2.5mins[,modelvars2.5],
mve = T,
level = 0.99,
vars = 1:length(modelvars2.5))
mProj10 <- ntbox::ellipsoidfit(am10[[modelvars10]],
centroid = best_mod10$centroid,
covar = best_mod10$covariance,
level = 0.99,size = 3)
if(length(modelvars10)==3){
rgl::rglwidget(reuse = FALSE)
}
mProj5 <- ntbox::ellipsoidfit(am5[[modelvars5]],
centroid = best_mod5$centroid,
covar = best_mod5$covariance,
level = 0.99,size = 3)
if(length(modelvars5)==3){
rgl::rglwidget(reuse = FALSE)
}
mProj2.5 <- ntbox::ellipsoidfit(am2.5[[modelvars2.5]],
centroid = best_mod2.5$centroid,
covar = best_mod2.5$covariance,
level = 0.99,size = 3)
if(length(modelvars2.5)==3){
rgl::rglwidget(reuse = FALSE)
}
Project models in geographic space
#install.packages("wesanderson")
library(wesanderson)
raster::plot(mProj10$suitRaster,
col= wes_palette("Zissou1", 500, type = "continuous"),
main="Best model 10 minutes",)
7 Binarize maps
We will use a threshold of 10 percent of the training data to binarize the models
bin10 <- ntbox::bin_model(model = mProj10$suitRaster,
occs = dAll_ge10c[,1:2],
percent = 10)
raster::plot(bin10,col=c("#d8d6d6","#03a0ff"))
bin5 <- ntbox::bin_model(model = mProj5$suitRaster,
occs = dAll_ge10c[,1:2],
percent = 10)
raster::plot(bin5,col=c("#d8d6d6","#03a0ff"))
bin2.5 <- ntbox::bin_model(model = mProj2.5$suitRaster,
occs = dAll_ge10c[,1:2],
percent = 10)
raster::plot(bin2.5,col=c("#d8d6d6","#03a0ff"))
7.1 PCA transformation
In this last section, we show how to use ntbox
for transforming environmental layers to PCs. The function that does the transformation and projects it in the geographical space is ntbox::spca
. The main argument of the function is the raster stack of environmental layers to be transformed.
Let’s see the scree plot
## Warning: Removed 6 rows containing missing values (position_stack).
## Warning: Removed 6 rows containing missing values (position_stack).
## Warning: Removed 6 rows containing missing values (geom_text).
The first four principal components explain ~91% of the total environmental variance.
References
Cobos, Marlon E., A. Townsend Peterson, Narayani Barve, and Luis Osorio-Olvera. 2019. “kuenm: an R package for detailed development of ecological niche models using Maxent.” PeerJ 7 (February): e6281. https://doi.org/10.7717/peerj.6281.
Hijmans, Robert J. 2017. “raster: Geographic analysis and modeling with raster data.” http://cran.r-project.org/package=raster.
Owens, H, L Campbell, L Dornak, E Saupe, N Barve, J Soberon, K Ingenloff, A Lira-Noriega, C Hensz, and C Myers. 2012. “Constraints on Interpretation of Ecological Niche Models by Limited Environmental Ranges on Calibration Areas: Software Script Appendix.”
Peterson, A. Townsend, Monica Papes, and Jorge Soberon. 2008. “Rethinking receiver operating characteristic analysis applications in ecological niche modeling.” Ecol. Modell. 213 (1): 63–72.
Sanchez, O., M. A. Pineda, H. Benitez., B. González, and H. Berlanga. 1998. Guía de identificación para las aves y mamíferos silvestres de mayor comercio en México protegidos por la C.I.T.E.S. Edited by SEMARNAP and CONABIO. México.
Van Aelst, Stefan, and Peter Rousseeuw. 2009. “Minimum volume ellipsoid.” Wiley Interdiscip. Rev. Comput. Stat. 1 (1): 71–82. https://doi.org/10.1002/wics.19.