These slides were produced with R
See the source code here
The focus of this particular talk is not on the code itself
Text
How is the text displayed (e.g., font, face, location)?
What is the purpose of the text?
Text
How is the text displayed (e.g., font, face, location)?
What is the purpose of the text?
Transparency
Are there overlapping pieces?
Can transparency help?
Text
How is the text displayed (e.g., font, face, location)?
What is the purpose of the text?
Transparency
Are there overlapping pieces?
Can transparency help?
How would you encode these data into a display?
Month | Day | Location | Temperature |
---|---|---|---|
Jan | 1 | Chicago | 25.6 |
Jan | 1 | San Diego | 55.2 |
Jan | 1 | Houston | 53.9 |
Jan | 1 | Death Valley | 51.0 |
Jan | 2 | Chicago | 25.5 |
Jan | 2 | San Diego | 55.3 |
Jan | 2 | Houston | 53.8 |
Jan | 2 | Death Valley | 51.2 |
Jan | 3 | Chicago | 25.3 |
Both represent three scales
Additional scales can become lost without high structure in the data
Distinguish groups from each other
Represent data values
Distinguish groups from each other
Represent data values
Highlight
See more about the Okabe Ito palette origins here: http://jfly.iam.u-tokyo.ac.jp/color/
More than 5-ish categories generally becomes too difficult to track
still too many...
Get a subset
(but could still be improved)
Do some research, find what you like and what tends to work well
Check for colorblindness
Look into http://colorbrewer2.org/
Above all else, show the data
-Edward Tufte
Above all else, show the data
-Edward Tufte
Above all else, show the data
-Edward Tufte
Data-Ink Ratio = Ink devoted to the data / total ink used to produce the figure
Common goal: Maximize the data-ink ratio
Empirically, Tufte's plot was the most difficult for viewers to interpret.
Visual cues (labels, gridlines) reduce the data-ink ratio, but can also reduce cognitive load.
Whenever possible, visualize your data with solid, colored shapes rather than with lines that outline those shapes. Solid shapes are more easily perceived, are less likely to create visual artifacts or optical illusions, and do more immediately convey amounts than do outlines.
emphasis added
Prior slide is a great example of when annotations can be used in place of a legend to
How do we display more than one distribution at a time?
## # A tibble: 1,313 x 5## name class age sex survived## <chr> <chr> <dbl> <chr> <int>## 1 Allen, Miss Elisabeth Walton 1st 29 female 1## 2 Allison, Miss Helen Loraine 1st 2 female 0## 3 Allison, Mr Hudson Joshua Creighton 1st 30 male 0## 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 1st 25 female 0## 5 Allison, Master Hudson Trevor 1st 0.92 male 1## 6 Anderson, Mr Harry 1st 47 male 1## # … with 1,307 more rows
Note the default colors really don't work well in most of these
## # A tibble: 6 x 13## State `2004-05` `2005-06` `2006-07` `2007-08` `2008-09` `2009-10` `2010-11`## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 Alab… 5682.838 5840.550 5753.496 6008.169 6475.092 7188.954 8071.134## 2 Alas… 4328.281 4632.623 4918.501 5069.822 5075.482 5454.607 5759.153## 3 Ariz… 5138.495 5415.516 5481.419 5681.638 6058.464 7263.204 8839.605## 4 Arka… 5772.302 6082.379 6231.977 6414.900 6416.503 6627.092 6900.912## 5 Cali… 5285.921 5527.881 5334.826 5672.472 5897.888 7258.771 8193.739## 6 Colo… 4703.777 5406.967 5596.348 6227.002 6284.137 6948.473 7748.201## # … with 5 more variables: `2011-12` <dbl>, `2012-13` <dbl>, `2013-14` <dbl>,## # `2014-15` <dbl>, `2015-16` <dbl>
Pie charts are just stacked bar charts with a radial coordinate system
See many examples here: http://www.tylervigen.com/spurious-correlations
It is tempting to lay down inflexible rules about what to do in terms of producing your graphs, and to dismiss people who don’t follow them as producing junk charts or lying with statistics. But being honest with your data is a bigger problem than can be solved by rules of thumb about making graphs. In this case there is a moderate level of agreement that bar charts should generally include a zero baseline (or equivalent) given that bars encode their variables as lengths. But it would be a mistake to think that a dot plot was by the same token deliberately misleading, just because it kept itself to the range of the data instead.
Avoid line drawings
Sort bar charts in ascending/descending order as long as the other axis does not have implicit meaning
Consider dropping legends and using annotations, when possible
Use color to your advantage, but be sensitive to color-blindness, and use the right kind of palette
Consider double-encoding data (shapes and color)
Make your labels bigger! Didn't talk about this one much but it's super common and really important
Essentially never
Use dual axes (produce separate plots instead)
Use 3D unnecessarily
Be wary of
Truncated axes
Pie charts (particularly with lots of categories)
@datalorax_ @datalorax daniela@uoregon.edu
Slides available at http://www.datalorax.com/talks/psych-seminar/
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |