Selección de un método apropiado

Choosing particular algorithms for niche modeling can be challenging, given that each previous study seems to have its own preferences and its own clear advantage.  As a result, no clear signal emerges as to which is the best platform for such analyses. What is more, a plethora of comparative analyses of different niche modeling algorithms (Elith et al. 2006, Stockman et al. 2006, Guisan et al. 2007, Ortega-Huerta and Peterson 2008, Wisz et al. 2008) presents a similar diversity of results. Of these evaluations, it turns out, some are flat-out erroneous (Stockman et al. 2006) and some are based on inappropriate assumptions (Elith et al. 2006). Some of these complications associated with model comparisons and comparative evaluations are treated in Peterson et al. (2008) and McNyset and Blackburn (2006). Still other researchers argue for seeking consensus among results from many algorithms (Araújo et al. 2005, Marmion et al. 2009), although even these ideas are not agreed upon completely. The end result, then, is that no solid guide is available to help us with choosing among the many choices.

What is clear, however, is that many options exist. The requirements for a useful algorithm include the following:

1.       If the available data is presence only, use either envelope methods (Bioclim,  Mahalanobis distance) or machine learning methods (GARP, Maxent, Neural Networks). If true-absence data is available, use regression methods (GLIM, GAM, Regression Trees).

2.       The method must have the potential to respond to complex structures in the E-space.  That is, if species’ responses to environmental variables are non-linear, a linear model will not suffice. In this sense, in general, extremely simple approaches such as BIOCLIM (Nix 1986) or DOMAIN (Carpenter et al. 1993) are not recommendable unless the data has been explored first and presences in E-space have box-like or ellipsoidal structures. Approaches capable of replicating complex responses to presence-only data include the maximum entropy  Maxent (Phillips et al. 2006) and genetic algorithms like GARP (Stockwell and Peters 1999).

3.       The algorithm should not be overly “data hungry.” That is, some very powerful algorithms tend to require large amounts of input occurrence data to function effectively (Wisz et al. 2008), which are rarely available in niche modeling applications.

4.       Using too few data runs the risk of providing a very narrow model. Too many data tend to create “overfitting” problems. What is too few? Depends on how well the occurrence points represent the –unknown—range of preferences of the species. GARP has been shown to be rather tolerant to small datasets, so a rule of thumb is that more than 10 occurrence points at distinct localities may be the lowest safe number. What is too many? Maxent is sensitive to large numbers of occurrences, in the sense that it becomes a fitting exercise with low capacity for extrapolation. This is not bad if Go is being modeled and if points are known to come from the extreme localities of a distribution. If extrapolations are required, more than a few hundred data points are unadvisable.

In this guide, we present three niche modeling algorithms in detail: Maxent, GARP and Bioclim. GARP and Maxent perform similarly (Peterson et al. 2008) (contra Elith et al. 2006), are both powerful in characterizing ecological niches, have both been applied widely, and often yield complementary results. This latter point is apparent in that while GARP may err on the side of producing overly broad results, Maxent often errs on the side of overfitting, so the two together provide a quite-useful counterpoint (Papeş and Gaubert 2007). In the following modules, then, we treat requirements regarding data input, model calibration strategies, model evaluation strategies, and many other topics in niche modeling.