Performs variable selection for ellipsoid models according to omission rates in the environmental space.

ellipsoid_selection(
  env_train,
  env_test = NULL,
  env_vars,
  nvarstest,
  level = 0.95,
  mve = TRUE,
  env_bg = NULL,
  omr_criteria,
  parallel = F,
  comp_each = 100,
  proc = FALSE,
  proc_iter = 100,
  rseed = TRUE
)

Arguments

env_train

A data frame with the environmental training data.

env_test

A data frame with the environmental testing data. The default is NULL if given the selection process will show the p-value of a binomial test.

env_vars

A vector with the names of environmental variables to be used in the selection process.

nvarstest

A vector indicating the number of variables to fit the ellipsoids during model selection. It is allowed to test models with a different number of variables (i.e. nvarstest=c(3,6)).

level

Proportion of points to be included in the ellipsoids. This parameter is equivalent to the error (E) proposed by Peterson et al. (2008).

mve

A logical value. If TRUE a minimum volume ellipsoid will be computed using the function cov.rob of the MASS package. If False the covariance matrix of the input data will be used.

env_bg

Environmental data to compute the approximated prevalence of the model. The data should be a sample of the environmental layers of the calibration area.

omr_criteria

Omission rate criteria. Value of the omission rate allowed for the selection process. Default NULL see details.

parallel

The computations will be run in parallel. Default FALSE

comp_each

Number of models to run in each job in the parallel computation. Default 100

proc

Logical if TRUE a partial roc test will be run.

proc_iter

Numeric. The total number of iterations for the partial ROC bootstrap.

rseed

Logical. Whether or not to set a random seed for partial roc bootstrap. Default TRUE.

Value

A data.frame with 5 columns: i) "fitted_vars" the names of variables that were fitted; ii) "om_rate" omission rates of the model; iii) "bg_prevalence" approximated prevalence of the model see details section; iv) The rank value of importance in model selection by omission rate; v) The rank value by prevalence after if the value of omr_criteria is passed.

Details

Model selection occurs in environmental space (E-space). For each variable combination the omission rate (omr) in E-space is computed using the function inEllipsoid. The results will be ordered by omr and if the user-specified the environmental background "env_bg" an estimated prevalence will be computed and the results will be ordered also by "bg_prevalence".

The number of variables to construct candidate models can be specified by the user in the parameter "nvarstest". Model selection will be run in parallel if the user-specified more than one set of combinations and the total number of models to be tested is greater than 500. If given"omr_criteria" and "bg_prevalence", the results will be shown pondering those models that met the "omr_criteria" by the value of "bg_prevalence". For more details and examples go to ellipsoid_omr help.

References

Peterson, A.T. et al. (2008) Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecol. Modell., 213, 63–72.

Examples

if (FALSE) { # Bioclimatic layers path wcpath <- list.files(system.file("extdata/bios", package = "ntbox"), pattern = ".tif$",full.names = TRUE) # Bioclimatic layers wc <- raster::stack(wcpath) # Occurrence data for the giant hummingbird (Patagona gigas) pg <- utils::read.csv(system.file("extdata/p_gigas.csv", package = "ntbox")) # Split occs in train and test pgL <- base::split(pg,pg$type) pg_train <- pgL$train pg_test <- pgL$test # Environmental data for training and testing pg_etrain <- raster::extract(wc,pg_train[,c("longitude", "latitude")], df=TRUE) pg_etrain <- pg_etrain[,-1] pg_etest <- raster::extract(wc,pg_test[,c("longitude", "latitude")], df=TRUE) pg_etest <- pg_etest[,-1] # Non-correlated variables env_varsL <- ntbox::correlation_finder(cor(pg_etrain), threshold = 0.8, verbose = F) env_vars <- env_varsL$descriptors # Number of variables to fit ellipsoids (3,5,6 ) nvarstest <- c(3,5,6) # Level level <- 0.95 # Environmental background to compute the appoximated # prevalence in the prediction env_bg <- raster::sampleRandom(wc,10000) # Selection process e_selct <- ntbox::ellipsoid_selection(env_train = pg_etrain, env_test = pg_etest, env_vars = env_vars, level = level, nvarstest = nvarstest, env_bg = env_bg, omr_criteria=0.07) # Best ellipsoid model for "omr_criteria" and prevalence bestvarcomb <- stringr::str_split(e_selct$fitted_vars,",")[[1]] # Ellipsoid model projection best_mod <- ntbox::cov_center(pg_etrain[,bestvarcomb], mve = T, level = 0.99, vars = 1:length(bestvarcomb)) # Projection model in geographic space mProj <- ntbox::ellipsoidfit(wc[[bestvarcomb]], centroid = best_mod$centroid, covar = best_mod$covariance, level = 0.99,size = 3) raster::plot(mProj$suitRaster) points(pg[,c("longitude","latitude")],pch=20,cex=0.5) pg_proc <- ntbox::pROC(continuous_mod = mProj$suitRaster, test_data = pg_test[,c("longitude","latitude")], n_iter = 1000, E_percent = 5, boost_percent = 50,parallel = F) print(pg_proc$pROC_summary) }