This is the main landing page for the course,
Introduction to managing and visualizing data in R: a short course on spatial and non-spatial data.
Page url: https://michaeldgarber.github.io/teach-r/
My background using R: https://michaeldgarber.github.io/teach-r/who-am-i.html
This short course will introduce participants to popular tools in R for manipulating (“wrangling”) and visualizing data through public-health-focused examples. Specifically, students will become familiar with ways to wrangle and visualize both aspatial (i.e., common spreadsheet-like) data and spatial data. The course will focus on tools that are a part of and that work well with the tidyverse, a set of packages with a common philosophy that, in my opinion, makes R intuitive and fun. Packages covered will include dplyr and sf for manipulating data, and ggplot2, mapview, and tmap for visualizing data. Many examples will transition back and forth between data with and without a spatial component, showing that working with spatial data need not require separate specialized geographic information systems (GIS) software. The course will conclude with a somewhat advanced but important topic, creating functions and iterating over them using the purrr toolkit.
Background modules:
Background and set-up
This module is intended for those who have never used R. It describes key terms, shows how to install R and RStudio, and offers some tips for managing projects and files.
Link: https://michaeldgarber.github.io/teach-r/pre-reqs.html
Managing packages: attached vs loaded via a namespace and other nuances
R’s vast package ecosystem provides abundant functionality, but packages can also be a source of frustration if they don’t load as expected. In this module, we explore some technical details on ways packages are loaded that may facilitate troubleshooting and understanding. We specifically cover the difference between a package being attached to the search path versus loaded via a namespace and how to check this distinction. We also learn how to remove packages, both from the current session and from the computer altogether. This module has a fair amount of technical detail and doesn’t necessarily need to be reviewed before proceeding with the main modules.
Link: https://michaeldgarber.github.io/teach-r/manage-packages.html
Main course modules:
Introduction to R and data wrangling: learning the basics of dplyr with publicly available COVID-19 data.
Dplyr, as stated on its documentation page, “is a grammar of data manipulation providing a consistent set of verbs that help you solve the most common data manipulation challenges.” In this module, we learn the fundamentals of dplyr in the context of an applied example with publicly available COVID-19 data from the New York Times.
Link: https://michaeldgarber.github.io/teach-r/dplyr-1-nyt-covid.html
R for data wrangling 2: more dplyr with tidycensus, mapview, and ggplot2
In this module, we continue with the COVID-19 example from the first tutorial and introduce four additional packages throughout the example: tidycensus, mapview, sf, and ggplot2. Tidycensus allows you to conveniently download census data into R. Mapview lets you make an interactive map with one line of code, sf is a toolbox for managing and manipulating spatial data, and ggplot2 is a prominent package for making graphs and figures. The next sessions elaborate on sf and mapview.
Link: https://michaeldgarber.github.io/teach-r/dplyr-2-mapview-tidycensus
R for spatial data wrangling: using sf to wrangle OpenStreetMap-downloaded pharmacies in Atlanta and assess population living nearby
The sf package and its corresponding object class have become standard tools for managing and representing (vector-based) spatial data in R. An appealing aspect of the sf object class is that it behaves like regular rectangular data and is thus amenable to common data manipulation techniques. In this module, we will work through an example analysis of pharmacies downloaded from OpenStreetMap in the Atlanta area. Through the example, we describe several operations in the sf ecosystem, including ways to merge spatial data, manipulate coordinate systems, and create buffers.
Part 1: https://michaeldgarber.github.io/teach-r/sf-atl-pharm-part-1.html
Part 2: https://michaeldgarber.github.io/teach-r/sf-atl-pharm-part-2.html
Making static and interactive maps in R
This module will elaborate upon the capabilities of some of R’s mapping tools covered in previous sessions. Specifically, it will illustrate tools for both static maps (as in an image) and interactive maps (as in zooming in and out like Google Maps). Mapview and ggplot2 will again be covered, and tmap will be introduced. This module will also cover color palettes available in R for visualizing continuous and categorical data.
Link: https://michaeldgarber.github.io/teach-r/map-making.html (work in progress)
Monte Carlo simulations and boots with the purrr::map_dfr()
Creating your own functions can help reduce tedious and error-prone repetition in code. Functions can also be used to iteratively modify steps in the data analysis to estimate uncertainty of results through Monte Carlo simulation techniques and bootstrap resampling. This module will motivate, describe, and provide demo examples of Monte Carlo simulations and bootstrapping using map_dfr()
from purrr.
Link: https://michaeldgarber.github.io/teach-r/monte-carlo-sim-bootstrapping-purrr.html
Extra material:
Using R to gather bicycle infrastructure data from OpenStreetMap
Part 1 (background): https://michaeldgarber.github.io/teach-r/osmdata-for-bikes-part-1
Part 2 (code demo): https://michaeldgarber.github.io/teach-r/osmdata-for-bikes-part-2
Dropbox folder with all recordings
Google Drive folder with all recordings
Session 1 (August 2, 2022)
Session recording on Dropbox (please download to watch past 1 hour)
Session recording on Google Drive
Modules covered:
R scripts used during session
Session 2 (August 3, 2022)
Session recording on Dropbox
Session recording on Google Drive
Module covered: https://michaeldgarber.github.io/teach-r/dplyr-2-mapview-tidycensus
R script used during session: https://github.com/michaeldgarber/teach-r/blob/main/scripts/2-dplyr-mapview-tidycensus.R
Session 3 (August 4, 2022)
Session recording on Dropbox
Session recording on Google Drive
Module covered: https://michaeldgarber.github.io/teach-r/sf-atl-pharm-part-1.html
R script used during session: https://github.com/michaeldgarber/teach-r/blob/main/scripts/3-sf-atl-pharm.R
Session 4 (August 8, 2022)
Session recording on Dropbox
Session recording on Google Drive
Modules covered:
R scripts used during session:
Session 5 (August 9, 2022)
Session recording on Dropbox
Session recording on Google Drive
Modules covered:
R script used during session:
I mostly use packages that are a part of the tidyverse or which work well with the tidyverse (e.g., sf). The R universe has lots of great resources on those and other topics. Here are some that I’d recommend:
https://michaeldgarber.github.io/teach-r/recommended-resources.html
Copyright © 2022 Michael D. Garber