Week 12 Tree-Based Models in R

This week, we focus on tree-based models and their implementation in R. For the more advanced, a recommendable resource for tree-based modeling is Prasad, Iverson, and Liaw (2006) or Gries (2021). Very good papers dealing with many critical issues related to tree-based models are Strobl, Malley, and Tutz (2009) and Breiman (2001). The aim of this week is to show how to implement and perform basic tree-based modeling and classification using R.

Preparation and session set up

For this week, we need to install certain packages from an R library so that the scripts shown below are executed without errors. Before turning to the code below, please install the packages by running the code below this paragraph - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).

# install packages
install.packages("Boruta")
install.packages("tree")
install.packages("caret")
install.packages("cowplot")
install.packages("tidyverse")
install.packages("ggparty")
install.packages("Gmisc")
install.packages("grid")
install.packages("Hmisc")
install.packages("party")
install.packages("partykit")
install.packages("randomForest")
#install.packages("Rling")
install.packages("pdp")
install.packages("tidyr")
install.packages("RCurl")
install.packages("vip")
install.packages("flextable")
# install klippy for copy-to-clipboard button in code chunks
install.packages("remotes")
remotes::install_github("rlesur/klippy")

Now that we have installed the packages, we can activate them as shown below.

# set options
options(stringsAsFactors = F)
options(scipen = 999)
options(max.print=10000)
# load packages
library(Boruta)
library(tree)
library(caret)
library(cowplot)
library(tidyverse)
library(ggparty)
library(Gmisc)
library(grid)
library(Hmisc)
library(party)
library(partykit)
library(randomForest)
#library(Rling)
library(pdp)
library(RCurl)
library(tidyr)
library(vip)
library(flextable)
# activate klippy for copy-to-clipboard button
klippy::klippy()

NOTE

In some cases, installing the caret package can be a bit more complicated. In my case, it was necessary to execute the code chunk shown below. However, once the caret package is installed, you do not need to go through these steps again and can simply activate it by calling library(caret).

`

# install caret library
source("https://bioconductor.org/biocLite.R"); 
biocLite(); library(Biobase)
install.packages("Biobase", 
                 repos=c("http://rstudio.org/_packages", 
                         "http://cran.rstudio.com", 
                         "http://cran.rstudio.com/", dependencies=TRUE))
install.packages("dimRed", dependencies = TRUE)
install.packages('caret', dependencies = TRUE)
# activate caret library
library(caret)

`


Once you have installed R, RStudio, and have also initiated the session by executing the code shown above, you are good to go.

References

Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures.” Statistical Science 16: 199–231. https://projecteuclid.org/euclid.ss/1009213726.
Gries, Stefan Th. 2021. Statistics for Linguistics Using r: A Practical Introduction. Berlin & New York: Mouton de Gruyter.
Prasad, Anantha M, Louis R Iverson, and Andy Liaw. 2006. “Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction.” Ecosystems 9 (2): 181–99.
Strobl, Carolin, James Malley, and Gerhard Tutz. 2009. “An Introduction to Recursive Partitioning: Rationale,application and Characteristics of Classification and Regression Trees, Bagging and Random Forests.” Psychological Methods 14 (4): 323–48. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2927982/.