Transparent and Reproducible Research with R (Part 1)And a bit of git/GitHubDaniel AndersonJoshua M RosenbergApril 7, 20191 / 52

2 / 52

AgendaFirst two hoursIntroduction (8:00 - 8:45)Who we are, who participants are and why they're here (15 min.) 
Reproducible research and literate programming (20 min.) 
Conducting science in the public and in an open way & OSF (10 min.) 

3 / 52

AgendaFirst two hoursIntroduction (8:00 - 8:45)Who we are, who participants are and why they're here (15 min.) 
Reproducible research and literate programming (20 min.) 
Conducting science in the public and in an open way & OSF (10 min.) 

R Markdown (8:45 - 10:00)Delineating code chunks from plain text (15 min.) 
Creating headers (5 min.) 
Creating lists and using other features of markdown (10 min.) 
Whole-document and code chunk-specific options (15 min.) 
Rendering and sharing documents in different formats (15 min.) 
Lab (practice) (15 min.)

3 / 52

Break (10:00 - 10:15)4 / 52

Last two hours

Advanced R Markdown functionality (10:15 - 11:00 minutes)
- Formatting tables (20 min.)
- Creating manuscripts to submit for publication (via {papaja}; 25 min.)
Use of git/GitHub for version control and collaboration (11:00 - 11:45)
- Introduction to GitHub, RStudio interface, and GitKraken GUI (20 min.)
- Making changes, committing them, and pushing them to the repository (15 min.)
- Use of GitHub (and ignoring specific files via .gitignore; 10 min.)
Wrap-up/ideas for next steps/staying in touch (11:45 - 12:00)

5 / 52

#whoami

Research Assistant Professor: Behavioral Research and Teaching, University of Oregon (#goducks)
Dad (two daughters: 6 (almost 7) and 4)
Primary areas of interest
- 💗💗R💗💗 and computational research
- Open data, open science, and reproducible workflows
- Growth modeling, achievement gaps, and variance between educational institutions (particularly spatially)

6 / 52

#whoami 2

Assistant Professor: STEM Education, University of Tennessee, Knoxville
Also a Dad (one-year-old toddler!)
Primary areas of interest
- Data science in education (network analytic methods, experience sampling method, computational grounded theory)
- Data science education (integrating data science and science education)

7 / 52

#whoyouis

Introduce yourself What's your prior experience with R/R Markdown and git/GitHub? Why are you here?

8 / 52

This slide means we'retransitioning to Daniel9 / 52

This slide means we'retransitioning to Josh10 / 52

Before we really get started

Realize this is all going to be an intro
- Covering a lot of content - that's purposeful
- Idea is to give you exposure and some basic familiarity

11 / 52

Before we really get started

Realize this is all going to be an intro
- Covering a lot of content - that's purposeful
- Idea is to give you exposure and some basic familiarity
Resources to learn more
- Daniel's class(es)
- Intro course
- Data viz course
- Functional programming
- r4ds
- R Markdown book
- (Developing) data science in education book
- (plenty of others out there too)

11 / 52

Reproducible Research and

Literate Programming

8:15 - 8:30am

12 / 52

A couple caveatsMuch of what we're going to be discussing represents an ideal that we have only recently begun working towards.
13 / 52

A couple caveats

Much of what we're going to be discussing represents an ideal that we have only recently begun working towards.
None of what we will talk about should be taken as a referendum on you or your current practices. However, we hope to help to convince you that you should be working toward the reproducible research ideal, and that, as a field, we should be moving toward reproducible research being the minimal standard.

13 / 52

A couple caveats

Much of what we're going to be discussing represents an ideal that we have only recently begun working towards.
None of what we will talk about should be taken as a referendum on you or your current practices. However, we hope to help to convince you that you should be working toward the reproducible research ideal, and that, as a field, we should be moving toward reproducible research being the minimal standard.
We will be focusing on reproducible research with R (obviously). Other options are available but, in our view, none are as clear, comprehensive, and easy to implement as the tools at your disposal through R.

13 / 52

What is reproducible research?Replicability is the gold standard for research. Ideally, most
research would be verified through replication. 
14 / 52

What is reproducible research?

Replicability is the gold standard for research. Ideally, most research would be verified through replication.
Reproducibility represents a minimal standard, which itself can aid replication (tremendously), by conducting and documenting the research sufficiently that an independent researcher could reproduce all the results from a study, provided the data were available

14 / 52

What is reproducible research?

Replicability is the gold standard for research. Ideally, most research would be verified through replication.
Reproducibility represents a minimal standard, which itself can aid replication (tremendously), by conducting and documenting the research sufficiently that an independent researcher could reproduce all the results from a study, provided the data were available
Turns out this is a more difficult standard than we would generally like to admit.

14 / 52

Why should we care?

Reproducibility as an ethical standard
- More transparency
- More potential for results to be verified (and errors found/corrected)

15 / 52

Why should we care?

Reproducibility as an ethical standard
- More transparency
- More potential for results to be verified (and errors found/corrected)
If your work is not reproducible, it is often not truly replicable.

15 / 52

Why should we care?

Reproducibility as an ethical standard
- More transparency
- More potential for results to be verified (and errors found/corrected)
If your work is not reproducible, it is often not truly replicable.
If your work is reproducible, then others have a "recipe" for replication.

15 / 52

Are journal articles research?Initially, we may think of journal articles as research, but really the
research is everything that went into the article, not the article itself. 
16 / 52

Are journal articles research?

Initially, we may think of journal articles as research, but really the research is everything that went into the article, not the article itself.
Some (Buckheit & Donoho, 2015) conceive of the article as the "advertisement".

16 / 52

Are journal articles research?

Initially, we may think of journal articles as research, but really the research is everything that went into the article, not the article itself.
Some (Buckheit & Donoho, 2015) conceive of the article as the "advertisement".
If all we have is the advertisement, can we really fully understand the steps and decisions made during the research?
- In large-scale data analysis, the answer is generally "no".

16 / 52

Tangential benefits

Striving toward reproducible research will:

Make your own code more efficient/easily interpretable
- Can help with collaboration on a project
Reduce errors
Increase efficiency by not having to redo tables and figures with each tweak to a model.

17 / 52

What does literate programming look like?What this workshop is about!18 / 52

What does literate programming look like?What this workshop is about!Start with a basic text document (not Word, text)
18 / 52

What does literate programming look like?

What this workshop is about!

Start with a basic text document (not Word, text)
Use the text document to write your article

18 / 52

What does literate programming look like?

What this workshop is about!

Start with a basic text document (not Word, text)
Use the text document to write your article
Embed code within the text document that corresponds to your analysis. Note this is not just copying the code in. The code should be live and what you're working with while conducting your research.

18 / 52

What does literate programming look like?

What this workshop is about!

Start with a basic text document (not Word, text)
Use the text document to write your article
Embed code within the text document that corresponds to your analysis. Note this is not just copying the code in. The code should be live and what you're working with while conducting your research.
Render the document into a different format (pdf, html, etc.).
- Select which code (if any) will be displayed
- Build tables of results and plots to be produced

18 / 52

End result

Readers can then read the "advertisement", but if they are interested in reproducing your results they can access the text file that contains the analysis code.
Single product that has the advertisement and the research process embedded.

19 / 52

Open Science Framework

Offers a platform in which you can store private data
Has an Application Programming Interface (API) to connect with R
Example: https://osf.io/9ex7k/

24 / 52

R Markdown Basics8:45 - 9:0025 / 52

R Markdown

From within your R Studio Project:

26 / 52

First thing: Render!

27 / 52

Create new a R Markdown doc

### Try it out!

YAML Front Matter

---
title: Example Markdown document
author: 
  - Daniel Anderson
  - : "2019-05-07"
---

Three dashes before and after the YAML fields
Case sensitive
Many other fields are possible.
- For example, you may want to include an output: argument (pdf_document, html_document, word_document). Must be specified as it is rendered, if not supplied.

28 / 52

Example: Change syntax highlighting

The YAML will control a lot of how a document looks. For example, if you wanted to render with a different syntax highlighter:

Standard Rmd

---
title: "Doc Title"
output: pdf_document
---

kate

---
title: "Doc Title"
output: 
  pdf_document:
    highlight: kate
---

29 / 52

Code chunks versus text

30 / 52

Code chunks

Start a code chunk with ```{r}, then produce some r code, then close the

chunk with three additional back ticks ``` .

31 / 52

Code chunks

Start a code chunk with ```{r}, then produce some r code, then close the

chunk with three additional back ticks ``` .

a <- 3
b <- 5
a + b * (exp(a)/b)

## [1] 23.08554

31 / 52

R Markdown basicsHeadings, lists, chunk options, and inline code9:00 - 9:30am32 / 52

Headings and Lists

Not R-lists

# Level 1
## Level 2 
### Level 3 (etc.)

 * Unordered list
  - inset
    + inset more
  - etc.
1. Ordered list
  a. blah blah
2. More stuff

33 / 52

echo and eval

You can show code without evaluating it, using eval = FALSE.

a + b * (exp(a)/b)

Alternatively, you can evaluate the code without displaying it, using echo = FALSE.

34 / 52

warning

Warning = `FALSE`

ggplot(msleep, aes(sleep_rem, sleep_total)) + 
  geom_point()

Warning is printed to the console when rendering.

35 / 52

Warning = `TRUE`

ggplot(msleep, aes(sleep_rem, sleep_total)) + 
  geom_point()

## Warning: Removed 22 rows containing missing values (geom_point).

36 / 52

Show errors

error = TRUE

ggplot(msleep, aes(sleep, sleep_total)) + 
  geom_point()

## Error: Aesthetics must be either length 1 or the same as the data (83): x

37 / 52

Show errors

error = TRUE

ggplot(msleep, aes(sleep, sleep_total)) + 
  geom_point()

## Error: Aesthetics must be either length 1 or the same as the data (83): x

If error = FALSE, the document won't render if it encounters an error.

37 / 52

Message

Some functions will return messages. You may want to suppress these.

message = `FALSE`

ggplot(msleep, aes(sleep_total)) +
  geom_histogram()

38 / 52

Message

message = `TRUE`

ggplot(msleep, aes(sleep_total)) +
  geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

39 / 52

include

The include argument is used to evaluate code that is not included in the document at all. For example, when setting up your global options.

40 / 52

Setting global options

Change the default behavior

opts_chunk$set(...) # insert options here

For example, you can set echo = FALSE and fig.width = 6.5 and fig.height = 8 with the following code.

opts_chunk$set(echo = FALSE, fig.width = 6.5, fig.height = 8)

This is most useful when producing a report for somebody who doesn't use R and has no use or knowledge of the code.

You can always override the global options within a particular chunk, e.g.

```{r, chunkName, echo = TRUE}

```

41 / 52

Other things to consider setting globally:warnings = FALSE
message = FALSE
errors = TRUE
echo = FALSE
Caching options (next slides)
42 / 52

More complete chunk options
 
    Options 
    Arguments 
    Default 
    Result 
  


    eval 
    logical 
    TRUE 
    Evaluate the code? 
  

    echo 
    logical 
    TRUE 
    Show the code? 
  

    results 
    markup, asis, hold, hide 
    markup 
    Render the results 
  

    warning 
    logical 
    TRUE 
    Print warnings? 
  

    error 
    logical 
    TRUE 
    Preserve errors? (if FALSE, quit) 
  

    message 
    logical 
    TRUE 
    Print any messages? 
  

    include 
    logical 
    TRUE 
    Include any of the code or output or code? 
  

    tidy 
    logical 
    FALSE 
    Tidy code? (see formatR package) 
  

43 / 52

Options	Arguments	Default	Result
eval	logical	TRUE	Evaluate the code?
echo	logical	TRUE	Show the code?
results	markup, asis, hold, hide	markup	Render the results
warning	logical	TRUE	Print warnings?
error	logical	TRUE	Preserve errors? (if FALSE, quit)
message	logical	TRUE	Print any messages?
include	logical	TRUE	Include any of the code or output or code?
tidy	logical	FALSE	Tidy code? (see formatR package)

(and a few more)

	Options	Arguments	Default	Result
9	cache	logical, 0:3	FALSE	Cache code chunks?
10	cache.comments	logical	NULL	Cache invalidated by comment changes?
11	dependson	char, num	NULL	Current chunk depend on prior cached chunks?
12	autodep	logical	FALSE	Depends determined automatically?
13	fig.height/fig.width	numeric	7, 7	Height and width of figure
14	fig.show	asis, hold, animate, hide	asis	How the figure should be displayed
15	interval	numeric	1	Animate speed

For complete documentation, see http://yihui.name/knitr/options/

44 / 52

Inline code

A single back tick followed by r produces inline code to be evaluated.

This is an example of inline code, where I want to refer to the sum of a and b, which is 8.

This is extremely useful in writing reports. Never have to update any numbers in text, regardless of changes to your models or data (if you are careful about it).

45 / 52

Real example

46 / 52

Rendering and documents

9:30 - 10:00am-

Rendering R Markdown documents

Modify the YAML

Get the same document to render to different formats by modifying the YAML to output HTML, PDF, and .docx

From

---
title: "My Document"
author: "Stephanie Lawson"
output: html_document
---

To

---
title: "My Document"
author: "Stephanie Lawson"
output: pdf_document
---

47 / 52

Rendering to HTML

---
title: "My Document"
author: "Stephanie Lawson"
output: 
  html_document:
    toc: true
    toc_depth: 2
    toc_float: true
    number_sections: true
    higlight: kate
---

48 / 52

Re# Rendering to a PDFNeed a tex (pronounced tek) distributionOur recommendation for this workshop with probably everything you'll ever need: 
{tinytex}

49 / 52

Re# Rendering to a PDF

Need a tex (pronounced tek) distribution
- Our recommendation for this workshop with probably everything you'll ever need: {tinytex}

install.packages("tinytex")
tinytex::install_tinytex()

This is another amazing package by Yihui Xie. See more about it here

49 / 52

ndering to PDF

---
title: "My Document"
author: "Stephanie Lawson"
output: 
  pdf_document:
    toc: true
    number_sections: true
    higlight: kate
---

50 / 52

Rendering to .docx

A key feature is the ability to use a reference document; see here

---
title: "My Document"
author: "Stephanie Lawson"
output: word_document
---

51 / 52

Break10:00 - 10:1552 / 52

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Transparent and Reproducible Research with R (Part 1)

And a bit of git/GitHub

Daniel Anderson

Joshua M Rosenberg

April 7, 2019

Agenda

First two hours

Agenda

First two hours

Break (10:00 - 10:15)

Last two hours

#whoami

#whoami 2

#whoyouis

This slide means we're

transitioning to Daniel

This slide means we're

transitioning to Josh

Before we really get started

Before we really get started

Reproducible Research and

Literate Programming

8:15 - 8:30am

A couple caveats

A couple caveats

A couple caveats

What is reproducible research?

What is reproducible research?

What is reproducible research?

Why should we care?

Why should we care?

Why should we care?

Are journal articles research?

Are journal articles research?

Are journal articles research?

Tangential benefits

What does literate programming look like?

What this workshop is about!

What does literate programming look like?

What this workshop is about!

What does literate programming look like?

What this workshop is about!

What does literate programming look like?

What this workshop is about!

What does literate programming look like?

What this workshop is about!

End result

Other reasons dynamic documents are useful

Challenges

Open science and R Markdown basics

8:35 - 8:45 am

Open science and public work

Benefits

Dilemmas

Open Science Framework

R Markdown Basics

8:45 - 9:00

R Markdown

First thing: Render!

Create new a R Markdown doc

### Try it out!

YAML Front Matter

Example: Change syntax highlighting

Standard Rmd

kate

Code chunks versus text

Code chunks

Code chunks

R Markdown basics

Headings, lists, chunk options, and inline code

9:00 - 9:30am

Headings and Lists

Not R-lists

echo and eval

warning

Warning = FALSE

Warning = TRUE

Show errors

Show errors

Message

Warning = `FALSE`

Warning = `TRUE`

message = `FALSE`

message = `TRUE`