class: center, middle, inverse, title-slide # Transparent and Reproducible Research with R (Part 2) ## And a bit of git/GitHub ### Daniel Anderson ### Joshua M Rosenberg ### April 7, 2019 --- # Agenda # Last two hours - Advanced R Markdown functionality (10:15 - 11:00 minutes) - Formatting tables (20 min.) <!-- DA --> - Creating manuscripts to submit for publication (via {papaja}; 25 min.) - Use of git/GitHub for version control and collaboration (11:00 - 11:45) - Introduction to GitHub, RStudio interface, and GitKraken GUI (20 min.) <!-- DA --> - Making changes, committing them, and pushing them to the repository (15 min.) <!-- DA --> - Use of GitHub (and ignoring specific files via `.gitignore`; 10 min.) <!-- JMR --> - Wrap-up/ideas for next steps/staying in touch (11:45 - 12:00) --- class: da center middle # R Markdown Tables <br> ### 10:15 - 10:35am --- class: inverse background-image:url(https://github.com/rstudio/gt/raw/master/man/figures/logo.svg?sanitize=TRUE) background-size: contain --- # Overview * New package (still very actively under development) by RStudio * Really promising + Pipe-oriented + Beautiful tables easy + Spanner heads/grouping used to be a total pain - not so anymore + Renders to HTML/PDF without even thinking about it * May run into bumps because of the active development --- # Install ```r remotes::install_github("rstudio/gt") ``` --- # The hard part * Getting your data in the format you want a table in * Utilize your `gather`/`spread` skills regularly ```r library(fivethirtyeight) flying ``` ``` ## # A tibble: 1,040 x 27 ## respondent_id gender age height children_under_18 household_income ## <dbl> <chr> <ord> <ord> <lgl> <ord> ## 1 3436139758 <NA> <NA> <NA> NA <NA> ## 2 3434278696 Male 30-44 "6'3\… TRUE <NA> ## 3 3434275578 Male 30-44 "5'8\… FALSE $100,000 - $149… ## 4 3434268208 Male 30-44 "5'11… FALSE $0 - $24,999 ## 5 3434250245 Male 30-44 "5'7\… FALSE $50,000 - $99,9… ## 6 3434245875 Male 30-44 "5'9\… TRUE $25,000 - $49,9… ## 7 3434235351 Male 30-44 "6'2\… TRUE <NA> ## 8 3434218031 Male 30-44 "6'0\… TRUE $0 - $24,999 ## 9 3434213681 <NA> <NA> "6'0\… TRUE <NA> ## 10 3434172894 Male 30-44 "5'6\… FALSE $0 - $24,999 ## # … with 1,030 more rows, and 21 more variables: education <ord>, ## # location <chr>, frequency <ord>, recline_frequency <ord>, ## # recline_obligation <lgl>, recline_rude <ord>, recline_eliminate <lgl>, ## # switch_seats_friends <ord>, switch_seats_family <ord>, ## # wake_up_bathroom <ord>, wake_up_walk <ord>, baby <ord>, ## # unruly_child <ord>, two_arm_rests <chr>, middle_arm_rest <chr>, ## # shade <chr>, unsold_seat <ord>, talk_stranger <ord>, get_up <ord>, ## # electronics <lgl>, smoked <lgl> ``` --- ```r smry <- flying %>% count(gender, age, recline_frequency) %>% filter(!is.na(age), !is.na(recline_frequency)) %>% spread(age, n) smry ``` ``` ## # A tibble: 10 x 6 ## gender recline_frequency `18-29` `30-44` `45-60` `> 60` ## <chr> <ord> <int> <int> <int> <int> ## 1 Female Never 24 21 19 23 ## 2 Female Once in a while 36 25 30 36 ## 3 Female About half the time 10 22 18 17 ## 4 Female Usually 13 22 26 28 ## 5 Female Always 10 21 29 12 ## 6 Male Never 24 17 20 18 ## 7 Male Once in a while 19 39 40 29 ## 8 Male About half the time 11 11 16 11 ## 9 Male Usually 14 30 15 27 ## 10 Male Always 11 14 21 14 ``` --- # Turn into table ### Disclaimer These all look slightly different on the slides .pull-left[ ```r library(gt) smry %>% gt() ``` ] .pull-right[
gender
recline_frequency
18-29
30-44
45-60
> 60
Female
Never
24
21
19
23
Female
Once in a while
36
25
30
36
Female
About half the time
10
22
18
17
Female
Usually
13
22
26
28
Female
Always
10
21
29
12
Male
Never
24
17
20
18
Male
Once in a while
19
39
40
29
Male
About half the time
11
11
16
11
Male
Usually
14
30
15
27
Male
Always
11
14
21
14
] --- ## Add gender as a grouping variable .pull-left[ ```r smry %>% group_by(gender) %>% gt() ``` ] .pull-right[
recline_frequency
18-29
30-44
45-60
> 60
Female
Never
24
21
19
23
Once in a while
36
25
30
36
About half the time
10
22
18
17
Usually
13
22
26
28
Always
10
21
29
12
Male
Never
24
17
20
18
Once in a while
19
39
40
29
About half the time
11
11
16
11
Usually
14
30
15
27
Always
11
14
21
14
] --- # Add a spanner head ```r smry %>% group_by(gender) %>% gt() %>% tab_spanner(label = "Age Range", columns = vars(`18-29`, `30-44`, `45-60`, `> 60`)) ``` ---
recline_frequency
Age Range
18-29
30-44
45-60
> 60
Female
Never
24
21
19
23
Once in a while
36
25
30
36
About half the time
10
22
18
17
Usually
13
22
26
28
Always
10
21
29
12
Male
Never
24
17
20
18
Once in a while
19
39
40
29
About half the time
11
11
16
11
Usually
14
30
15
27
Always
11
14
21
14
--- # Change column names ```r smry %>% group_by(gender) %>% gt() %>% tab_spanner(label = "Age Range", columns = vars(`18-29`, `30-44`, `45-60`, `> 60`)) %>% cols_label(recline_frequency = "Recline") ``` ---
Recline
Age Range
18-29
30-44
45-60
> 60
Female
Never
24
21
19
23
Once in a while
36
25
30
36
About half the time
10
22
18
17
Usually
13
22
26
28
Always
10
21
29
12
Male
Never
24
17
20
18
Once in a while
19
39
40
29
About half the time
11
11
16
11
Usually
14
30
15
27
Always
11
14
21
14
--- # Align columns ```r smry %>% group_by(gender) %>% gt() %>% tab_spanner(label = "Age Range", columns = vars(`18-29`, `30-44`, `45-60`, `> 60`)) %>% cols_label(recline_frequency = "Recline") %>% cols_align(align = "left", columns = vars(recline_frequency)) ``` ---
Recline
Age Range
18-29
30-44
45-60
> 60
Female
Never
24
21
19
23
Once in a while
36
25
30
36
About half the time
10
22
18
17
Usually
13
22
26
28
Always
10
21
29
12
Male
Never
24
17
20
18
Once in a while
19
39
40
29
About half the time
11
11
16
11
Usually
14
30
15
27
Always
11
14
21
14
--- # Add a title ```r smry %>% group_by(gender) %>% gt() %>% tab_spanner(label = "Age Range", columns = vars(`18-29`, `30-44`, `45-60`, `> 60`)) %>% cols_label(recline_frequency = "Recline") %>% cols_align(align = "left", columns = vars(recline_frequency)) %>% tab_header(title = "Airline Passengers", subtitle = "Leg space is limited, what do you do?") ``` ---
Airline Passengers
Leg space is limited, what do you do?
Recline
Age Range
18-29
30-44
45-60
> 60
Female
Never
24
21
19
23
Once in a while
36
25
30
36
About half the time
10
22
18
17
Usually
13
22
26
28
Always
10
21
29
12
Male
Never
24
17
20
18
Once in a while
19
39
40
29
About half the time
11
11
16
11
Usually
14
30
15
27
Always
11
14
21
14
--- # Format columns ```r smry %>% mutate_at(vars(`18-29`, `30-44`, `45-60`, `> 60`), ~./100) %>% group_by(gender) %>% gt() %>% tab_spanner(label = "Age Range", columns = vars(`18-29`, `30-44`, `45-60`, `> 60`)) %>% fmt_percent(vars(`18-29`, `30-44`, `45-60`, `> 60`), decimals = 0) %>% cols_label(recline_frequency = "Recline") %>% cols_align(align = "left", columns = vars(recline_frequency)) %>% tab_header(title = "Airline Passengers", subtitle = "Leg space is limited, what do you do?") ``` ---
Airline Passengers
Leg space is limited, what do you do?
Recline
Age Range
18-29
30-44
45-60
> 60
Female
Never
24%
21%
19%
23%
Once in a while
36%
25%
30%
36%
About half the time
10%
22%
18%
17%
Usually
13%
22%
26%
28%
Always
10%
21%
29%
12%
Male
Never
24%
17%
20%
18%
Once in a while
19%
39%
40%
29%
About half the time
11%
11%
16%
11%
Usually
14%
30%
15%
27%
Always
11%
14%
21%
14%
--- # Add a source note ```r smry %>% mutate_at(vars(`18-29`, `30-44`, `45-60`, `> 60`), ~./100) %>% group_by(gender) %>% gt() %>% tab_spanner(label = "Age Range", columns = vars(`18-29`, `30-44`, `45-60`, `> 60`)) %>% fmt_percent(vars(`18-29`, `30-44`, `45-60`, `> 60`), decimals = 0) %>% cols_label(recline_frequency = "Recline") %>% cols_align(align = "left", columns = vars(recline_frequency)) %>% tab_header(title = "Airline Passengers", subtitle = "Leg space is limited, what do you do?") %>% tab_source_note(source_note = md("Data from [fivethirtyeight](https://fivethirtyeight.com/features/airplane-etiquette-recline-seat/)")) ``` ---
Airline Passengers
Leg space is limited, what do you do?
Recline
Age Range
18-29
30-44
45-60
> 60
Female
Never
24%
21%
19%
23%
Once in a while
36%
25%
30%
36%
About half the time
10%
22%
18%
17%
Usually
13%
22%
26%
28%
Always
10%
21%
29%
12%
Male
Never
24%
17%
20%
18%
Once in a while
19%
39%
40%
29%
About half the time
11%
11%
16%
11%
Usually
14%
30%
15%
27%
Always
11%
14%
21%
14%
Data from
fivethirtyeight
--- # Color cells ```r smry %>% mutate_at(vars(`18-29`, `30-44`, `45-60`, `> 60`), ~./100) %>% group_by(gender) %>% gt() %>% tab_spanner(label = "Age Range", columns = vars(`18-29`, `30-44`, `45-60`, `> 60`)) %>% fmt_percent(vars(`18-29`, `30-44`, `45-60`, `> 60`), decimals = 0) %>% cols_label(recline_frequency = "Recline") %>% data_color(vars(`18-29`, `30-44`, `45-60`, `> 60`), colors = scales::col_numeric(palette = c(c("#FFFFFF", "#FF0000")), domain = NULL)) %>% cols_align(align = "left", columns = vars(recline_frequency)) %>% tab_header(title = "Airline Passengers", subtitle = "Leg space is limited, what do you do?") %>% tab_source_note(source_note = md("Data from [fivethirtyeight](https://fivethirtyeight.com/features/airplane-etiquette-recline-seat/)")) ``` ---
Airline Passengers
Leg space is limited, what do you do?
Recline
Age Range
18-29
30-44
45-60
> 60
Female
Never
24%
21%
19%
23%
Once in a while
36%
25%
30%
36%
About half the time
10%
22%
18%
17%
Usually
13%
22%
26%
28%
Always
10%
21%
29%
12%
Male
Never
24%
17%
20%
18%
Once in a while
19%
39%
40%
29%
About half the time
11%
11%
16%
11%
Usually
14%
30%
15%
27%
Always
11%
14%
21%
14%
Data from
fivethirtyeight
--- # What else? .pull-left[ * Lots more it can do, and lots more in development * See the [website](https://gt.rstudio.com) ] -- .pull-right[ gtcars case study is worth going through ![](https://gt.rstudio.com/articles/images/gtcars.svg) ] --- class: da center middle # APA Manuscripts ## 10:35 - 11:00 am --- # {papaja} Despite having been around for about 4 years [{papaja}](https://github.com/crsh/papaja) is still not on CRAN. More evidence that some of the best packages are not on CRAN. -- Install with devtools ```r devtools::install_github("crsh/papaja") ``` --- # WIP The package is seemingly perpetually under development. What does this mean? -- * Re-install regularly. -- * Not everything may work perfect - don't worry though, most things do -- * You may want to peruse the current [issues](https://github.com/crsh/papaja/issues) + If you run into one (and you're sure it's an issue) consider opening one yourself + Bonus - the developer is very kind, so even if you open up a silly issue, he's likely to be understanding --- # Use the template .pull-left[ ![](img/new-rmd.png) ] .pull-right[ ![](img/from-template.png) ] --- # YAML A few more options than the default <div> <img src = img/papaja-yaml.png height = 450> </div> --- # First thing - Render! <div> <img src = img/papaja-pdf1.png height = 450> </div> --- # Modifications * Obvious ones + title + author & author info + abstract + keywords -- * Less obvious + `shorttitle` (running head) + `authornote` (can fully delete or modify) + `wordcount` (fairly useless at this point imo) + `bibliography` (we'll talk more about this momentarily) + `linenumbers` + floats + mask (for blind peer-review) + `classoption` --- # Let's play for a minute! Modify some of the options on the previous slide. Specifically, try changing `classoption` from `man` to `jou`. Try other things too. --- # Add some LaTeX options ``` header-includes: - \raggedbottom - \setlength{\parskip}{0pt} ``` This will help .gray[(save you lots of time googling)] remove the extra space between paragraphs. --- class: da center middle # git/GitHub ## An Introduction <br> ### 11:00 - 11:40am More info can be found here: http://www.datalorax.com/vita/ds/ds1-slides/w4p2/ --- class: inverse middle background-image:url(img/final-doc.png) background-size:contain .footnote[“Piled Higher and Deeper” by Jorge Cham, http://www.phdcomics.com] --- ### From swcarpentry ![](http://swcarpentry.github.io/git-novice/fig/play-changes.svg) We can think of the changes as separate from the document --- ![](http://swcarpentry.github.io/git-novice/fig/versions.svg) This means there are many possible versions of the same document --- ![](http://swcarpentry.github.io/git-novice/fig/merge.svg) Unless there are conflicts, two changes from the same document can be merged together <!-- Show repo for NCME pres & talk about publishing --> --- class: inverse middle center # How? That's what we'll do today! --- # Some basic terminology * Version Control System + A tool to help us track changes. *git* is one such system. -- * Commit + Changes that have been made to the file(s) -- * Repository (repo) + Entire project (including history) -- * Clone + Download files locally -- * Pull + Get latest changes -- * Push + Send latest commits to remote repo --- # Demo (30 minutes) ### Goal: Get your work from this morning in a GitHub repo! * Create new repo * Clone it locally * Add your files * Commit * Push * Publish --- class: jmr center middle # Using GitHub ### 11:35 - 11:45am --- # Example ### [tidyLPA](https://github.com/data-edu/tidyLPA) --- # Ignoring Files * When we initialized the repo, we started it with a `.gitignore` file * The `.gitignore` file tells the repo not to track certain files + e.g., proprietary data --- class: inverse center middle # Wrapping up ### 11:45am - 12:00pm Daniel Anderson - [daniela@uoregon.edu](mailto:daniela@uoregon.edu) - [@datalorax_](https://twitter.com/datalorax_) Joshua Rosenberg -[jmrosenberg@utk.edu](mailto:jmrosenberg@utk.edu) -[@jrosenberg6432][https://twitter.com/jrosenberg6432] Mailing list: [rr-in-edu@googlegroups.com](rr-in-edu@googlegroups.com) **Questions? Ideas? Thank you!** --- class: inverse center middle # Appendix --- class: inverse middle center # Citations --- ## Citations To include references in your paper, you must: * Create an external .bib file using LaTeX formatting (we'll get to this) * Include `bibliography: nameOfYourBibFile.bib` in your YAML front matter. * Refer to the citations in text using `@` --- # Creating a .bib doc ![](../img/googleScholarSearch.png) -- <div> <img src = ../img/googleCite.png height = 300> </div> --- ![](img/googleBibTex.png) -- ![](img/bibEntry.png) --- # In text citations <table> <thead> <tr> <th style="text-align:left;"> Citation Style </th> <th style="text-align:left;"> Output </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> @Briggs11 </td> <td style="text-align:left;"> Briggs and Weeks (2011) </td> </tr> <tr> <td style="text-align:left;"> [see @Baldwin2014; @Caruso2000] </td> <td style="text-align:left;"> (see Baldwin et al. 2014; Caruso 2000) </td> </tr> <tr> <td style="text-align:left;"> [@Linn02, p. 9] </td> <td style="text-align:left;"> (Linn and Haug 2002, 9) </td> </tr> <tr> <td style="text-align:left;"> [-@Goldhaber08] </td> <td style="text-align:left;"> (2008) </td> </tr> </tbody> </table> <br> Note this is not APA. However, references are included automatically at the end of the document. Include `# References` as the last line of your document to give it a title. --- ## A few real examples ![](img/intext_ref1.png) ![](img/intext_ref2.png) ![](img/intext_ref3.png) --- ## References ![](img/references.png)