R/ellipsoid_selection.R
ellipsoid_selection.Rd
Performs variable selection for ellipsoid models according to omission rates in the environmental space.
ellipsoid_selection( env_train, env_test = NULL, env_vars, nvarstest, level = 0.95, mve = TRUE, env_bg = NULL, omr_criteria, parallel = F, comp_each = 100, proc = FALSE, proc_iter = 100, rseed = TRUE )
env_train | A data frame with the environmental training data. |
---|---|
env_test | A data frame with the environmental testing data. The default is NULL if given the selection process will show the p-value of a binomial test. |
env_vars | A vector with the names of environmental variables to be used in the selection process. |
nvarstest | A vector indicating the number of variables to fit the ellipsoids during model selection. It is allowed to test models with a different number of variables (i.e. nvarstest=c(3,6)). |
level | Proportion of points to be included in the ellipsoids. This parameter is equivalent to the error (E) proposed by Peterson et al. (2008). |
mve | A logical value. If TRUE a minimum volume ellipsoid will be computed using
the function |
env_bg | Environmental data to compute the approximated prevalence of the model. The data should be a sample of the environmental layers of the calibration area. |
omr_criteria | Omission rate criteria. Value of the omission rate allowed for the selection process. Default NULL see details. |
parallel | The computations will be run in parallel. Default FALSE |
comp_each | Number of models to run in each job in the parallel computation. Default 100 |
proc | Logical if TRUE a partial roc test will be run. |
proc_iter | Numeric. The total number of iterations for the partial ROC bootstrap. |
rseed | Logical. Whether or not to set a random seed for partial roc bootstrap. Default TRUE. |
A data.frame with 5 columns: i) "fitted_vars" the names of variables that were fitted; ii) "om_rate" omission rates of the model; iii) "bg_prevalence" approximated prevalence of the model see details section; iv) The rank value of importance in model selection by omission rate; v) The rank value by prevalence after if the value of omr_criteria is passed.
Model selection occurs in environmental space (E-space). For each variable combination the omission rate (omr) in E-space is computed using the function inEllipsoid
. The results will be ordered by omr and if the user-specified the environmental background "env_bg" an estimated prevalence will be computed and the results will be ordered also by "bg_prevalence".
The number of variables to construct candidate models can be specified by the user in the parameter "nvarstest". Model selection will be run in parallel if the user-specified more than one set of combinations and the total number of models to be tested is greater than 500.
If given"omr_criteria" and "bg_prevalence", the results will be shown pondering those models that met the "omr_criteria" by the value of "bg_prevalence".
For more details and examples go to ellipsoid_omr
help.
Peterson, A.T. et al. (2008) Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecol. Modell., 213, 63–72.
if (FALSE) { # Bioclimatic layers path wcpath <- list.files(system.file("extdata/bios", package = "ntbox"), pattern = ".tif$",full.names = TRUE) # Bioclimatic layers wc <- raster::stack(wcpath) # Occurrence data for the giant hummingbird (Patagona gigas) pg <- utils::read.csv(system.file("extdata/p_gigas.csv", package = "ntbox")) # Split occs in train and test pgL <- base::split(pg,pg$type) pg_train <- pgL$train pg_test <- pgL$test # Environmental data for training and testing pg_etrain <- raster::extract(wc,pg_train[,c("longitude", "latitude")], df=TRUE) pg_etrain <- pg_etrain[,-1] pg_etest <- raster::extract(wc,pg_test[,c("longitude", "latitude")], df=TRUE) pg_etest <- pg_etest[,-1] # Non-correlated variables env_varsL <- ntbox::correlation_finder(cor(pg_etrain), threshold = 0.8, verbose = F) env_vars <- env_varsL$descriptors # Number of variables to fit ellipsoids (3,5,6 ) nvarstest <- c(3,5,6) # Level level <- 0.95 # Environmental background to compute the appoximated # prevalence in the prediction env_bg <- raster::sampleRandom(wc,10000) # Selection process e_selct <- ntbox::ellipsoid_selection(env_train = pg_etrain, env_test = pg_etest, env_vars = env_vars, level = level, nvarstest = nvarstest, env_bg = env_bg, omr_criteria=0.07) # Best ellipsoid model for "omr_criteria" and prevalence bestvarcomb <- stringr::str_split(e_selct$fitted_vars,",")[[1]] # Ellipsoid model projection best_mod <- ntbox::cov_center(pg_etrain[,bestvarcomb], mve = T, level = 0.99, vars = 1:length(bestvarcomb)) # Projection model in geographic space mProj <- ntbox::ellipsoidfit(wc[[bestvarcomb]], centroid = best_mod$centroid, covar = best_mod$covariance, level = 0.99,size = 3) raster::plot(mProj$suitRaster) points(pg[,c("longitude","latitude")],pch=20,cex=0.5) pg_proc <- ntbox::pROC(continuous_mod = mProj$suitRaster, test_data = pg_test[,c("longitude","latitude")], n_iter = 1000, E_percent = 5, boost_percent = 50,parallel = F) print(pg_proc$pROC_summary) }