Advanced R Markdown functionality (10:15 - 11:00 minutes)
Use of git/GitHub for version control and collaboration (11:00 - 11:45)
.gitignore
; 10 min.) Wrap-up/ideas for next steps/staying in touch (11:45 - 12:00)
Introduce yourself What's your prior experience with R/R Markdown and git/GitHub? Why are you here?
Realize this is all going to be an intro
Covering a lot of content - that's purposeful
Idea is to give you exposure and some basic familiarity
Realize this is all going to be an intro
Covering a lot of content - that's purposeful
Idea is to give you exposure and some basic familiarity
Resources to learn more
(plenty of others out there too)
Much of what we're going to be discussing represents an ideal that we have only recently begun working towards.
None of what we will talk about should be taken as a referendum on you or your current practices. However, we hope to help to convince you that you should be working toward the reproducible research ideal, and that, as a field, we should be moving toward reproducible research being the minimal standard.
Much of what we're going to be discussing represents an ideal that we have only recently begun working towards.
None of what we will talk about should be taken as a referendum on you or your current practices. However, we hope to help to convince you that you should be working toward the reproducible research ideal, and that, as a field, we should be moving toward reproducible research being the minimal standard.
We will be focusing on reproducible research with R (obviously). Other options are available but, in our view, none are as clear, comprehensive, and easy to implement as the tools at your disposal through R.
Replicability is the gold standard for research. Ideally, most research would be verified through replication.
Reproducibility represents a minimal standard, which itself can aid replication (tremendously), by conducting and documenting the research sufficiently that an independent researcher could reproduce all the results from a study, provided the data were available
Replicability is the gold standard for research. Ideally, most research would be verified through replication.
Reproducibility represents a minimal standard, which itself can aid replication (tremendously), by conducting and documenting the research sufficiently that an independent researcher could reproduce all the results from a study, provided the data were available
Turns out this is a more difficult standard than we would generally like to admit.
Reproducibility as an ethical standard
Reproducibility as an ethical standard
If your work is not reproducible, it is often not truly replicable.
Reproducibility as an ethical standard
If your work is not reproducible, it is often not truly replicable.
If your work is reproducible, then others have a "recipe" for replication.
Initially, we may think of journal articles as research, but really the research is everything that went into the article, not the article itself.
Some (Buckheit & Donoho, 2015) conceive of the article as the "advertisement".
Initially, we may think of journal articles as research, but really the research is everything that went into the article, not the article itself.
Some (Buckheit & Donoho, 2015) conceive of the article as the "advertisement".
Striving toward reproducible research will:
Make your own code more efficient/easily interpretable
Reduce errors
Increase efficiency by not having to redo tables and figures with each tweak to a model.
Start with a basic text document (not Word, text)
Use the text document to write your article
Start with a basic text document (not Word, text)
Use the text document to write your article
Embed code within the text document that corresponds to your analysis. Note this is not just copying the code in. The code should be live and what you're working with while conducting your research.
Start with a basic text document (not Word, text)
Use the text document to write your article
Embed code within the text document that corresponds to your analysis. Note this is not just copying the code in. The code should be live and what you're working with while conducting your research.
Render the document into a different format (pdf, html, etc.).
Readers can then read the "advertisement", but if they are interested in reproducing your results they can access the text file that contains the analysis code.
Single product that has the advertisement and the research process embedded.
Outside of reproducibility, you may want to use R Markdown to:
Produce slides
Keep track of your analysis (notes, essentially), even if you end up using something like Word
Share code with others
Quickly share results with others
Produce professional looking data products
Offers a platform in which you can store private data
Has an Application Programming Interface (API) to connect with R
Example: https://osf.io/9ex7k/
From within your R Studio Project:
---title: Example Markdown documentauthor: - Daniel Anderson - : "2019-05-07"---
output:
argument (pdf_document
,
html_document
, word_document
). Must be specified as it is rendered, if
not supplied.The YAML will control a lot of how a document looks. For example, if you wanted to render with a different syntax highlighter:
---title: "Doc Title"output: pdf_document---
---title: "Doc Title"output: pdf_document: highlight: kate---
Start a code chunk with ```{r}
, then produce some r code, then close the
chunk with three additional back ticks ```
.
Start a code chunk with ```{r}
, then produce some r code, then close the
chunk with three additional back ticks ```
.
a <- 3b <- 5a + b * (exp(a)/b)
## [1] 23.08554
# Level 1## Level 2 ### Level 3 (etc.)
* Unordered list - inset + inset more - etc.1. Ordered list a. blah blah2. More stuff
You can show code without evaluating it, using eval = FALSE
.
a + b * (exp(a)/b)
Alternatively, you can evaluate the code without displaying it, using echo =
FALSE
.
FALSE
ggplot(msleep, aes(sleep_rem, sleep_total)) + geom_point()
Warning is printed to the console when rendering.
TRUE
ggplot(msleep, aes(sleep_rem, sleep_total)) + geom_point()
## Warning: Removed 22 rows containing missing values (geom_point).
error = TRUE
ggplot(msleep, aes(sleep, sleep_total)) + geom_point()
## Error: Aesthetics must be either length 1 or the same as the data (83): x
error = TRUE
ggplot(msleep, aes(sleep, sleep_total)) + geom_point()
## Error: Aesthetics must be either length 1 or the same as the data (83): x
If error = FALSE
, the document won't render if it encounters an error.
Some functions will return messages. You may want to suppress these.
FALSE
ggplot(msleep, aes(sleep_total)) + geom_histogram()
TRUE
ggplot(msleep, aes(sleep_total)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The include
argument is used to evaluate code that is not included in the
document at all. For example, when setting up your global options.
Change the default behavior
opts_chunk$set(...) # insert options here
For example, you can set echo = FALSE
and fig.width = 6.5
and fig.height = 8
with the following code.
opts_chunk$set(echo = FALSE, fig.width = 6.5, fig.height = 8)
This is most useful when producing a report for somebody who doesn't use R and has no use or knowledge of the code.
You can always override the global options within a particular chunk, e.g.
```{r, chunkName, echo = TRUE}
```
warnings = FALSE
message = FALSE
errors = TRUE
echo = FALSE
Options | Arguments | Default | Result |
---|---|---|---|
eval | logical | TRUE | Evaluate the code? |
echo | logical | TRUE | Show the code? |
results | markup, asis, hold, hide | markup | Render the results |
warning | logical | TRUE | Print warnings? |
error | logical | TRUE | Preserve errors? (if FALSE, quit) |
message | logical | TRUE | Print any messages? |
include | logical | TRUE | Include any of the code or output or code? |
tidy | logical | FALSE | Tidy code? (see formatR package) |
Options | Arguments | Default | Result | |
---|---|---|---|---|
9 | cache | logical, 0:3 | FALSE | Cache code chunks? |
10 | cache.comments | logical | NULL | Cache invalidated by comment changes? |
11 | dependson | char, num | NULL | Current chunk depend on prior cached chunks? |
12 | autodep | logical | FALSE | Depends determined automatically? |
13 | fig.height/fig.width | numeric | 7, 7 | Height and width of figure |
14 | fig.show | asis, hold, animate, hide | asis | How the figure should be displayed |
15 | interval | numeric | 1 | Animate speed |
For complete documentation, see http://yihui.name/knitr/options/
A single back tick followed by r
produces inline code to be evaluated.
This is an example of inline code, where I want to refer to the sum of a
and
b
, which is 8.
This is extremely useful in writing reports. Never have to update any numbers in text, regardless of changes to your models or data (if you are careful about it).
Get the same document to render to different formats by modifying the YAML to output HTML, PDF, and .docx
---title: "My Document"author: "Stephanie Lawson"output: html_document---
---title: "My Document"author: "Stephanie Lawson"output: pdf_document---
---title: "My Document"author: "Stephanie Lawson"output: html_document: toc: true toc_depth: 2 toc_float: true number_sections: true higlight: kate---
install.packages("tinytex")tinytex::install_tinytex()
This is another amazing package by Yihui Xie. See more about it here
ndering to PDF
---title: "My Document"author: "Stephanie Lawson"output: pdf_document: toc: true number_sections: true higlight: kate---
A key feature is the ability to use a reference document; see here
---title: "My Document"author: "Stephanie Lawson"output: word_document---
Keyboard shortcuts
ā, ā, Pg Up, k | Go to previous slide |
ā, ā, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |