class: center, middle, inverse, title-slide # Transparent and Reproducible Research with R (Part 1) ## And a bit of git/GitHub ### Daniel Anderson ### Joshua M Rosenberg ### April 7, 2019 --- class: inverse background-image:url(https://media.giphy.com/media/yoJC2A59OCZHs1LXvW/giphy.gif) background-size:contain --- # Agenda ### First two hours - Introduction (8:00 - 8:45) - Who we are, who participants are and why they're here (15 min.) - Reproducible research and literate programming (20 min.) <!-- DA --> - Conducting science in the public and in an open way & OSF (10 min.) <!-- JMR --> -- - R Markdown (8:45 - 10:00) - Delineating code chunks from plain text (15 min.) <!-- JMR --> - Creating headers (5 min.) <!-- JMR --> - Creating lists and using other features of markdown (10 min.) <!-- DA --> - Whole-document and code chunk-specific options (15 min.) <!-- DA --> - Rendering and sharing documents in different formats (15 min.) <!-- JMR --> - Lab (practice) (15 min.) --- class:inverse right background-image:url(img/break.jpg) background-size:cover # Break (10:00 - 10:15) --- # Last two hours - Advanced R Markdown functionality (10:15 - 11:00 minutes) - Formatting tables (20 min.) <!-- DA --> - Creating manuscripts to submit for publication (via {papaja}; 25 min.) - Use of git/GitHub for version control and collaboration (11:00 - 11:45) - Introduction to GitHub, RStudio interface, and GitKraken GUI (20 min.) <!-- DA --> - Making changes, committing them, and pushing them to the repository (15 min.) <!-- DA --> - Use of GitHub (and ignoring specific files via `.gitignore`; 10 min.) <!-- JMR --> - Wrap-up/ideas for next steps/staying in touch (11:45 - 12:00) --- # #whoami .pull-left[ * Research Assistant Professor: Behavioral Research and Teaching, University of Oregon ([#goducks](https://twitter.com/search?q=%23goducks&src=typd)) * Dad (two daughters: 6 (almost 7) and 4) * Primary areas of interest + ššRšš and computational research + Open data, open science, and reproducible workflows + Growth modeling, achievement gaps, and variance between educational institutions (particularly spatially) ] <img src="img/thefam.jpg" width="350px" style="display: block; margin: auto;" /> --- # #whoami 2 .pull-left[ * Assistant Professor: STEM Education, University of Tennessee, Knoxville * Also a Dad (one-year-old toddler!) * Primary areas of interest + Data science in education (network analytic methods, experience sampling method, computational grounded theory) + Data science education (integrating data science and science education) ] <img src="img/kr-joro.jpg" width="350px" style="display: block; margin: auto;" /> --- class: inverse bottom right background-image: url(https://images.pexels.com/photos/1385627/pexels-photo-1385627.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260) background-size: cover # #whoyouis Introduce yourself What's your prior experience with R/R Markdown and git/GitHub? Why are you here? --- class: da center middle # This slide means we're # transitioning to Daniel --- class: jmr center middle # This slide means we're ## transitioning to Josh --- # Before we really get started * Realize this is all going to be an intro + Covering a lot of content - that's purposeful + Idea is to give you exposure and some basic familiarity -- * Resources to learn more + Daniel's class(es) + [Intro course](http://www.datalorax.com/vita/ds/ds1/) + [Data viz course](https://uo-datasci-specialization.github.io/c2-communicate_transform_data/) + [Functional programming](https://uo-datasci-specialization.github.io/c3-fun_program_r/) + [r4ds](https://r4ds.had.co.nz) + [R Markdown book](https://bookdown.org/yihui/rmarkdown/) + [(Developing) data science in education book](https://github.com/data-edu/data-science-in-education) + .gray[(plenty of others out there too)] --- class: da center middle # Reproducible Research and # Literate Programming <br> ### 8:15 - 8:30am --- # A couple caveats * Much of what we're going to be discussing represents an ideal that we have only recently begun working towards. -- * None of what we will talk about should be taken as a referendum on you or your current practices. However, we hope to help to convince you that you should be working toward the reproducible research ideal, and that, as a field, we should be moving toward reproducible research being the *minimal standard*. -- * We will be focusing on reproducible research with R (obviously). Other options are available but, in our view, none are as clear, comprehensive, and easy to implement as the tools at your disposal through R. --- # What is reproducible research? * .bolder[Replicability] is the gold standard for research. Ideally, most research would be verified through replication. -- * Reproducibility represents a .bolder[minimal] standard, which itself can aid replication (tremendously), by conducting and documenting the research sufficiently that .blue[an independent researcher could reproduce all the results from a study], provided the data were available -- * Turns out this is a more difficult standard than we would generally like to admit. --- # Why should we care? * Reproducibility as an ethical standard + More transparency + More potential for results to be verified (and errors found/corrected) -- * If your work **is not** reproducible, it is often not truly replicable. -- * If your work **is** reproducible, then others have a "recipe" for replication. --- # Are journal articles research? * Initially, we may think of journal articles as research, but really the research is everything that went into the article, not the article itself. -- * Some (Buckheit & Donoho, 2015) conceive of the article as the "advertisement". -- * If all we have is the advertisement, can we really fully understand the steps and decisions made during the research? + In large-scale data analysis, the answer is generally "no". --- # Tangential benefits Striving toward reproducible research will: * Make your own code more efficient/easily interpretable + Can help with collaboration on a project * Reduce errors * Increase efficiency by not having to redo tables and figures with each tweak to a model. --- # What does literate programming look like? ### What this workshop is about! -- 1. Start with a basic text document (not Word, text) -- 2. Use the text document to write your article -- 3. Embed code within the text document that corresponds to your analysis. Note this is not just copying the code in. The code should be live and what you're working with while conducting your research. -- 4. Render the document into a different format (pdf, html, etc.). + Select which code (if any) will be displayed + Build tables of results and plots to be produced --- # End result * Readers can then read the "advertisement", but if they are interested in reproducing your results they can access the text file that contains the analysis code. * Single product that has the advertisement and the research process embedded. --- # Other reasons dynamic documents are useful Outside of reproducibility, you may want to use R Markdown to: * Produce slides * Keep track of your analysis (notes, essentially), even if you end up using something like Word * Share code with others * Quickly share results with others * Produce professional looking data products --- # Challenges * Word is the industry standard (frustratingly so, to us) + Word output is less than ideal * Can be difficult when collaborating with others * Some journal articles *require* papers submitted in Word + Potentially get a pdf to word converter, but still less than ideal * Advanced features have a relatively steep learning curve --- class: jmr center middle # Open science and R Markdown basics ### 8:35 - 8:45 am --- # Open science and public work .pull-left[ ### Benefits * Working in the public has potential benefits: * People know what your expertise is * Potential colleagues can reach out to you for collaborations * Allow others to build upon your work * Build a network ] .pull-right[ ### Dilemmas * Working in the public has *some* potential dilemmas/drawbacks to manage: * Sometimes difficult to share pre-prints (due to copyright issues) * Getting 'scooped' * Often difficult to share data ] --- # Open Science Framework - Offers a platform in which you can store *private* data - Has an Application Programming Interface (API) to connect with R - Example: [https://osf.io/9ex7k/](https://osf.io/9ex7k/) --- # R Markdown Basics ## 8:45 - 9:00 --- # R Markdown From within your R Studio Project: ![](img/rmd_template.png ) --- # First thing: Render! ![](img/knit.png) --- class: inverse center middle # Create new a R Markdown doc ### ### Try it out! # YAML Front Matter ``` --- title: Example Markdown document author: - Daniel Anderson - : "2019-05-07" --- ``` ![](img/frontMatter.png) * Three dashes before and after the YAML fields * Case sensitive * Many other fields are possible. + For example, you may want to include an `output:` argument (`pdf_document`, `html_document`, `word_document`). Must be specified as it is rendered, if not supplied. --- # Example: Change syntax highlighting The YAML will control a lot of how a document looks. For example, if you wanted to render with a different syntax highlighter: .pull-left[ ### Standard Rmd ``` --- title: "Doc Title" output: pdf_document --- ``` ] .pull-right[ ### kate ``` --- title: "Doc Title" output: pdf_document: highlight: kate --- ``` ] --- # Code chunks versus text ![](img/code_chunks.png) --- # Code chunks Start a code chunk with ` ```{r}`, then produce some r code, then close the chunk with three additional back ticks ` ``` `. ![](img/codeChunk1.png) -- ```r a <- 3 b <- 5 a + b * (exp(a)/b) ``` ``` ## [1] 23.08554 ``` --- class: da center middle # R Markdown basics ## Headings, lists, chunk options, and inline code ### 9:00 - 9:30am --- # Headings and Lists ### Not R-lists .pull-left[ ``` # Level 1 ## Level 2 ### Level 3 (etc.) ``` <br/> ``` * Unordered list - inset + inset more - etc. 1. Ordered list a. blah blah 2. More stuff ``` ] .pull-right[ ![](img/headersLists.png) ] --- # echo and eval .pull-left[ You can show code without evaluating it, using `eval = FALSE`. <div align = "left"> <img src = img/codeChunk2.png height = 100> </div> ```r a + b * (exp(a)/b) ``` ] .pull-right[ Alternatively, you can evaluate the code without displaying it, using `echo = FALSE`. <div align = "left"> <img src = img/codeChunk3.png height = 100> </div> ![](workshop-presentation-p1_files/figure-html/plotExample-1.png)<!-- --> ] --- # warning ### Warning = `FALSE` ```r ggplot(msleep, aes(sleep_rem, sleep_total)) + geom_point() ``` ![](workshop-presentation-p1_files/figure-html/ggplotWarning2-1.png)<!-- --> Warning is printed to the console when rendering. --- ### Warning = `TRUE` ```r ggplot(msleep, aes(sleep_rem, sleep_total)) + geom_point() ``` ``` ## Warning: Removed 22 rows containing missing values (geom_point). ``` ![](workshop-presentation-p1_files/figure-html/ggplotWarning1-1.png)<!-- --> --- # Show errors `error = TRUE` ```r ggplot(msleep, aes(sleep, sleep_total)) + geom_point() ``` ``` ## Error: Aesthetics must be either length 1 or the same as the data (83): x ``` -- <br> If `error = FALSE`, the document won't render if it encounters an error. ![](img/errorRendering.png) --- # Message Some functions will return messages. You may want to suppress these. ### message = `FALSE` ```r ggplot(msleep, aes(sleep_total)) + geom_histogram() ``` ![](workshop-presentation-p1_files/figure-html/messages2-1.png)<!-- --> --- # Message ### message = `TRUE` ```r ggplot(msleep, aes(sleep_total)) + geom_histogram() ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](workshop-presentation-p1_files/figure-html/messages1-1.png)<!-- --> --- # include ![](img/includeFALSE.png) The `include` argument is used to evaluate code that is not included in the document at all. For example, when setting up your global options. --- # Setting global options Change the default behavior ```r opts_chunk$set(...) # insert options here ``` For example, you can set `echo = FALSE` and `fig.width = 6.5` and `fig.height = 8` with the following code. ```r opts_chunk$set(echo = FALSE, fig.width = 6.5, fig.height = 8) ``` This is most useful when producing a report for somebody who doesn't use R and has no use or knowledge of the code. You can always override the global options within a particular chunk, e.g. ` ```{r, chunkName, echo = TRUE}` ` ``` ` --- ## Other things to consider setting globally: * `warnings = FALSE` * `message = FALSE` * `errors = TRUE` * `echo = FALSE` * Caching options (next slides) --- # More complete chunk options <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Options </th> <th style="text-align:left;"> Arguments </th> <th style="text-align:left;"> Default </th> <th style="text-align:left;"> Result </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> eval </td> <td style="text-align:left;"> logical </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> Evaluate the code? </td> </tr> <tr> <td style="text-align:left;"> echo </td> <td style="text-align:left;"> logical </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> Show the code? </td> </tr> <tr> <td style="text-align:left;"> results </td> <td style="text-align:left;"> markup, asis, hold, hide </td> <td style="text-align:left;"> markup </td> <td style="text-align:left;"> Render the results </td> </tr> <tr> <td style="text-align:left;"> warning </td> <td style="text-align:left;"> logical </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> Print warnings? </td> </tr> <tr> <td style="text-align:left;"> error </td> <td style="text-align:left;"> logical </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> Preserve errors? (if FALSE, quit) </td> </tr> <tr> <td style="text-align:left;"> message </td> <td style="text-align:left;"> logical </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> Print any messages? </td> </tr> <tr> <td style="text-align:left;"> include </td> <td style="text-align:left;"> logical </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> Include any of the code or output or code? </td> </tr> <tr> <td style="text-align:left;"> tidy </td> <td style="text-align:left;"> logical </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> Tidy code? (see formatR package) </td> </tr> </tbody> </table> --- # (and a few more) <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Options </th> <th style="text-align:left;"> Arguments </th> <th style="text-align:left;"> Default </th> <th style="text-align:left;"> Result </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> cache </td> <td style="text-align:left;"> logical, 0:3 </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> Cache code chunks? </td> </tr> <tr> <td style="text-align:left;"> 10 </td> <td style="text-align:left;"> cache.comments </td> <td style="text-align:left;"> logical </td> <td style="text-align:left;"> NULL </td> <td style="text-align:left;"> Cache invalidated by comment changes? </td> </tr> <tr> <td style="text-align:left;"> 11 </td> <td style="text-align:left;"> dependson </td> <td style="text-align:left;"> char, num </td> <td style="text-align:left;"> NULL </td> <td style="text-align:left;"> Current chunk depend on prior cached chunks? </td> </tr> <tr> <td style="text-align:left;"> 12 </td> <td style="text-align:left;"> autodep </td> <td style="text-align:left;"> logical </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> Depends determined automatically? </td> </tr> <tr> <td style="text-align:left;"> 13 </td> <td style="text-align:left;"> fig.height/fig.width </td> <td style="text-align:left;"> numeric </td> <td style="text-align:left;"> 7, 7 </td> <td style="text-align:left;"> Height and width of figure </td> </tr> <tr> <td style="text-align:left;"> 14 </td> <td style="text-align:left;"> fig.show </td> <td style="text-align:left;"> asis, hold, animate, hide </td> <td style="text-align:left;"> asis </td> <td style="text-align:left;"> How the figure should be displayed </td> </tr> <tr> <td style="text-align:left;"> 15 </td> <td style="text-align:left;"> interval </td> <td style="text-align:left;"> numeric </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Animate speed </td> </tr> </tbody> </table> .footnote[For complete documentation, see http://yihui.name/knitr/options/] --- # Inline code A single back tick followed by `r` produces inline code to be evaluated. <div align = "center"> <img src = img/inlineCode.png width = 1000> </div> <br> This is an example of inline code, where I want to refer to the sum of `a` and `b`, which is 8. This is *extremely* useful in writing reports. Never have to update any numbers in text, regardless of changes to your models or data (if you are careful about it). --- # Real example ![](img/ell_means.png) --- class: jmr center middle # Rendering and documents ## 9:30 - 10:00am- # Rendering R Markdown documents ## Modify the YAML Get the same document to render to different formats by modifying the YAML to output HTML, PDF, and .docx .pull-left[ ### From ``` --- title: "My Document" author: "Stephanie Lawson" output: html_document --- ``` ] .pull-right[ ### To ``` --- title: "My Document" author: "Stephanie Lawson" output: pdf_document --- ``` ] --- # Rendering to HTML ``` --- title: "My Document" author: "Stephanie Lawson" output: html_document: toc: true toc_depth: 2 toc_float: true number_sections: true higlight: kate --- ``` --- # Re# Rendering to a PDF * Need a tex (pronounced tek) distribution + Our recommendation for this workshop with probably everything you'll ever need: {tinytex} -- ```r install.packages("tinytex") tinytex::install_tinytex() ``` This is another amazing package by Yihui Xie. See more about it [here](https://yihui.name/tinytex/) --- ndering to PDF ``` --- title: "My Document" author: "Stephanie Lawson" output: pdf_document: toc: true number_sections: true higlight: kate --- ``` --- # Rendering to .docx A key feature is the ability to use a reference document; see [here](https://bookdown.org/yihui/rmarkdown/word-document.html) ``` --- title: "My Document" author: "Stephanie Lawson" output: word_document --- ``` --- s: da crse center middle # Break ### 10:00 - 10:15