As part of our recent overhaul of the predictive modeling tools, we added support for random forests as a new modeling type.
Random forest (Breiman, 2001) is machine learning algorithm that fits many classification or regression tree (CART) models to random subsets of the input data and uses the combined result (the forest) for prediction. A principal feature of random forests is their ability to estimate the importance of each predictor variable in modeling the response variable. MGET’s Fit Random Forest Model tool outputs the importance estimate as text and a plot:
The tool can also output partial dependence plots that show the effect of each predictor on the response variable when the rest of the predictors are held constant:
MGET’s tools can fit both classification and regression forests using either the R randomForest package (Liaw and Wiener, 2002) which implements Breiman’s classic algorithm, or the cforest function from the R party package (Hothorn et al, 2006; Strobl et al, 2007; Strobl et al, 2008).
For a detailed description of random forests and practical advice their application in ecology, see Cutler et al. (2007).
Breiman L (2001) Random forests. Machine Learning, 45: 5-32.
Cutler DR, Edwards Jr. TC, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random Forests for Classification in Ecology. Ecology 88: 2783-2792.
Hothorn T, Buehlmann P, Dudoit S, Molinaro A, Van Der Laan M (2006) Survival Ensembles. Biostatistics 7: 355-373.
Liaw A and Wiener M (2002) Classification and Regression by randomForest. R News 2: 18-22.
Strobl S, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional Variable Importance for Random Forests. BMC Bioinformatics 9: 307.
Strobl S, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics 8: 25.