R ships with considerable functionality. It also comes with a set of pre-loaded packages
e.g.
“base”
“graphics”
“stats”
R also comes with a set of packages installed, but not loaded on launch
e.g.
“boot”
“MASS”
“Matrix”
Pre-loaded packages operate “out of the box”. For example, plot is part of the graphics package, which ships with R.
1
plot(x = 1:10, y = 1:10)
On CRAN
Any of these can be installed with install.packages("pkg_name"). You will then have access to all the functionality of the package.
Notice this plot only goes to mid-2014. As of this writing (11/22/17), there are 11,892 packages available on CRAN! See https://cran.r-project.org/web/packages/
Other packages
On github
Installing from github
First, install the devtools package from CRAN
1
install.packages("devtools")
Next, load the devtools library to access the install_github function. For example, to install my esvis package
1
2
library(devtools)
install_github("DJAnderson07/esvis")
You then have access to all the functionality of that package once you load it. Let’s look at these data:
Before fitting model, you’ll generally need to import some data, let’s do so now
Make sure your data file is stored in the same place that your script is
The setclass argument above is actually not required, but makes it a bit easier to work with.
1
2
3
library(rio)
d <- import("synthetic_data.csv", setclass = "tbl_df")
d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
## # A tibble: 11,218 x 6
## SID grade clock cohort LD33 SS
## <int> <int> <int> <int> <chr> <int>
## 1 1243667 7 1 5 Never 238
## 2 12961647 6 0 7 Never 221
## 3 5477581 7 1 5 Never 224
## 4 4177568 8 2 5 Never 248
## 5 9368752 7 1 6 Never 239
## 6 7736290 7 1 7 Never 239
## 7 9486143 6 0 5 Never 220
## 8 6181953 7 1 5 Never 237
## 9 7966652 7 1 7 Never 234
## 10 7776640 7 1 6 Never 234
## # ... with 11,208 more rows
Research Questions
What is the average growth from Grades 6-8 in math (SS)
Does the averge initial achievement or rate of growth depend upon cohort?
Does the averge initial achievement or rate of growth depend upon LD33, the students’ pattern of SLD classification?
I don’t remember why the variable has the name it does
NOTE: Multiple regression is NOT the best way to approach this. A multilevel model would be preferable. But, at the end, I’ll show you how simple it is to extend what we do here to the multilevel modeling approach.
Step 1: Look at your data!
Always best to visualize your data first. Let’s produce plots addressing each of our research questions.
What does the average growth look like? (plot on next slide)
1
2
3
4
5
6
library(tidyverse)
theme_set(theme_light())# Not neccessary, but I like it
ggplot(d, aes(x = grade, y = SS)) +
geom_point() +
geom_smooth(method = "lm")
Does initial achievement or average growth depend upon cohort?
1
2
3
4
ggplot(d, aes(grade, SS)) +
geom_point() +
geom_smooth(method = "lm",
aes(color = factor(cohort)))
Does initial achievement or average growth depend upon LD status?