NicheToolBox: An example of the model selection protocol for ellipsoid models

0.1 The example

We demonstrate the use of the model selection protocol implemented in ntbox (an article with more information about the method is in (Osorio-Olvera et al. 2020)) by using occurrence data for the giant hummingbird (Patagona gigas). The occurrence records and the modeling layers are available in the ntbox package (if you haven’t installed it go to this link where you will find the installation notes for each of the operating systems).

Fig. 1 Patagona gigas taken from wikimedia

Fig. 1 Patagona gigas taken from wikimedia

0.2 The R packages

Let’s load the R packages that we are going to use in our session

0.3 Loading the occurrence data

With the following commands, you can load into your R session the geographic records

species longitude latitude type
Patagona gigas -65.21600 -22.790906 train
Patagona gigas -78.32320 -0.459242 test
Patagona gigas -68.06772 -16.538205 train
Patagona gigas -70.51691 -33.385099 train
Patagona gigas -70.31293 -33.352039 train
Patagona gigas -70.58427 -34.274945 test

As you can see in the table there are 4 columns: “species”, “longitude” and “latitude” and “type” (with train or test as factors). The data points with the train labels are going to be used for model calibration and the test points will be used for selection.

0.4 Bioclimatic layers

Let’s load the environmental information and display it

The environmental information is for South America and here we will project our niche model …

0.5 Split occurrence

Now we split the occurrence data into train and test using the type variable; the choice of the data is normally done using different partition methods, but in our case, the data previously was split using a random partition.

0.6 Extract the environmental information

The following code extracts the environmental information for both train and test data

bio01 bio02 bio03 bio04 bio05 bio06 bio07 bio08 bio09 bio10 bio11 bio12 bio13 bio14 bio15 bio16 bio17 bio18 bio19
82.80 193.68 62.16 4088.88 219.60 -90.00 309.60 123.96 25.80 123.96 23.56 249.52 65.28 0.00 108.68 172.56 2.96 172.56 3.92
50.20 145.80 69.60 1523.60 145.40 -62.80 208.20 61.20 28.40 65.20 27.40 605.20 129.00 6.00 77.00 325.20 29.20 191.40 29.20
94.76 147.56 56.80 4035.48 238.52 -17.76 256.28 51.08 146.36 146.36 43.08 571.20 128.80 3.64 91.88 337.48 16.64 16.64 327.20
54.08 139.56 56.84 3963.12 189.68 -53.04 242.72 7.12 104.52 104.52 4.04 469.68 108.20 3.76 88.68 267.28 18.48 18.48 261.48
149.24 135.32 53.64 4122.48 290.08 39.68 250.40 101.72 200.44 202.60 96.08 424.84 109.32 1.36 105.16 278.00 9.80 9.80 269.96
111.20 140.00 54.00 4233.40 266.20 8.80 257.40 64.60 161.20 168.60 58.20 1340.80 279.80 18.20 82.80 766.80 68.20 71.80 701.60
bio01 bio02 bio03 bio04 bio05 bio06 bio07 bio08 bio09 bio10 bio11 bio12 bio13 bio14 bio15 bio16 bio17 bio18 bio19
56.96 84.56 83.36 500.52 112.08 11.48 100.60 56.80 53.80 60.96 49.48 1223.80 128.04 70.00 16.56 364.88 243.44 307.56 286.68
108.80 140.20 53.20 4359.80 258.80 -2.00 260.80 59.40 160.20 165.80 53.60 840.20 194.00 3.80 95.80 522.00 20.20 21.80 483.20
138.00 106.00 63.00 2240.00 231.00 63.00 168.00 109.00 164.00 168.00 109.00 349.00 95.00 0.00 111.00 239.00 3.00 3.00 239.00
135.36 128.16 59.24 3250.52 255.56 40.28 215.28 93.84 176.84 176.92 92.84 498.08 136.56 1.36 108.64 331.80 7.32 7.32 331.52
161.80 106.60 53.20 3360.20 273.60 73.80 199.80 123.80 204.80 204.80 118.80 504.40 130.60 2.40 103.40 331.20 9.60 9.60 312.80
136.88 119.04 53.20 3559.80 265.56 43.24 222.32 98.64 178.72 183.88 91.04 835.56 201.76 3.16 99.88 533.76 15.96 21.60 508.16

0.7 Non-correlated variables

Select the non-correlated variables using a threshold correlation value of 0.8

## [1] "bio01" "bio02" "bio03" "bio07" "bio12" "bio14" "bio18" "bio19"

0.8 Specify the number of variables to fit the model

Now we specify the number of variables to fit the ellipsoid models; in the example, we will fit for 3,5, and 6 dimensions

0.9 The level parameter

This parameter is to specify the proportion of training points that will be used to fit the minimum volume ellipsoid (Van Aelst and Rousseeuw 2009).

0.10 Environmental background to compute the approximated AUC ratio

This background data is just to compute the partial ROC test

0.11 Set the omission rate criteria

For selecting the model we will use an arbitrary value of 6 percent of omission; it is not a rule but accepted omission rates are those bellow 10%. We will ask the function to return the partial ROC value (Peterson, Papes, and Soberon 2008)

0.12 Model calibration and selection protocol

Now we just need to use the function ellipsoid_selection to run the model calibration and selection protocol

## -----------------------------------------------------------------------------------------
##      **** Starting model selection process ****
## -----------------------------------------------------------------------------------------
## 
## A total number of 56 models will be created for combinations of 8 variables taken by 3 
## 
## A total number of 56 models will be created for combinations of 8 variables taken by 5 
## 
## A total number of 28 models will be created for combinations of 8 variables taken by 6 
## 
## -----------------------------------------------------------------------------------------
##   **A total number of 140 models will be tested **
## 
## -----------------------------------------------------------------------------------------
##   6 models passed omr_criteria for train data
##   18 models passed omr_criteria for test data
##   2 models passed omr_criteria for train and test data

Let’s see the first 20 rows of the results

fitted_vars nvars om_rate_train om_rate_test bg_prevalence pval_bin pval_proc env_bg_aucratio mean_omr_train_test rank_by_omr_train_test rank_omr_aucratio
bio01,bio02,bio07 3 0.0420168 0.0392157 0.1708667 0 0 1.675877 0.0406162 1 1
bio01,bio02,bio12 3 0.0588235 0.0588235 0.2062457 0 0 1.657898 0.0588235 4 2
bio01,bio03,bio12 3 0.0756303 0.0392157 0.1730408 0 0 1.719728 0.0574230 2 3
bio07,bio14,bio18 3 0.0756303 0.0392157 0.5028165 0 0 1.372321 0.0574230 3 4
bio01,bio07,bio12 3 0.0672269 0.0588235 0.1707679 0 0 1.738229 0.0630252 5 5
bio01,bio02,bio03 3 0.0672269 0.0588235 0.2378694 0 0 1.625587 0.0630252 6 6
bio01,bio03,bio14 3 0.0924370 0.0392157 0.1452713 0 0 1.787052 0.0658263 7 7
bio01,bio07,bio14 3 0.0756303 0.0588235 0.1571302 0 0 1.730370 0.0672269 8 8
bio01,bio12,bio18 3 0.0756303 0.0588235 0.2091116 0 0 1.659776 0.0672269 9 9
bio01,bio02,bio18 3 0.0588235 0.0784314 0.2051586 0 0 1.666558 0.0686275 10 10
bio01,bio03,bio18 3 0.0840336 0.0588235 0.1554501 0 0 1.761996 0.0714286 11 11
bio02,bio14,bio18 3 0.0840336 0.0588235 0.4999506 0 0 1.334296 0.0714286 12 12
bio01,bio03,bio07 3 0.0672269 0.0784314 0.2144481 0 0 1.623151 0.0728291 13 13
bio01,bio12,bio14 3 0.1092437 0.0392157 0.1656290 0 0 1.784394 0.0742297 14 14
bio02,bio03,bio14 3 0.0924370 0.0588235 0.4443127 0 0 1.382246 0.0756303 15 15
bio01,bio02,bio07,bio12,bio14 5 0.1092437 0.0588235 0.1018875 0 0 1.801916 0.0840336 16 16
bio01,bio02,bio03,bio12,bio14 5 0.1092437 0.0588235 0.1163158 0 0 1.788790 0.0840336 17 17
bio07,bio12,bio14 3 0.1092437 0.0588235 0.2198834 0 0 1.640416 0.0840336 18 18
bio01,bio02,bio14 3 0.0924370 0.0784314 0.1987351 0 0 1.655143 0.0854342 19 19
bio02,bio12,bio14 3 0.0924370 0.0784314 0.2652436 0 0 1.564441 0.0854342 20 20

With the following lines of code, I am going to display the model in the first row of the table

References

Osorio-Olvera, Luis, C. Yañez-Arenas, E. Martínez-Meyer, and A. T. Peterson. 2020. “Relationships between population densities and niche?centroid distances in North American birds.” Ecol. Lett. 23: 555–64. https://doi.org/10.1111/ele.13453.

Peterson, A. Townsend, Monica Papes, and Jorge Soberon. 2008. “Rethinking receiver operating characteristic analysis applications in ecological niche modeling.” Ecol. Modell. 213 (1): 63–72.

Van Aelst, Stefan, and Peter Rousseeuw. 2009. “Minimum volume ellipsoid.” Wiley Interdiscip. Rev. Comput. Stat. 1 (1): 71–82. https://doi.org/10.1002/wics.19.

Luis Osorio-Olvera & Rusby Contreras-Díaz

2020-02-13