This is the main landing page for the course,

Introduction to managing and visualizing data in R: a short course on spatial and non-spatial data.

Page url: https://michaeldgarber.github.io/teach-r/

My background using R: https://michaeldgarber.github.io/teach-r/who-am-i.html


Course summary

This short course will introduce participants to popular tools in R for manipulating (“wrangling”) and visualizing data through public-health-focused examples. Specifically, students will become familiar with ways to wrangle and visualize both aspatial (i.e., common spreadsheet-like) data and spatial data. The course will focus on tools that are a part of and that work well with the tidyverse, a set of packages with a common philosophy that, in my opinion, makes R intuitive and fun. Packages covered will include dplyr and sf for manipulating data, and ggplot2, mapview, and tmap for visualizing data. Many examples will transition back and forth between data with and without a spatial component, showing that working with spatial data need not require separate specialized geographic information systems (GIS) software. The course will conclude with a somewhat advanced but important topic, creating functions and iterating over them using the purrr toolkit.


Modules

Background modules:

  1. Background and set-up

    This module is intended for those who have never used R. It describes key terms, shows how to install R and RStudio, and offers some tips for managing projects and files.

    Link: https://michaeldgarber.github.io/teach-r/pre-reqs.html

  2. Managing packages: attached vs loaded via a namespace and other nuances

    R’s vast package ecosystem provides abundant functionality, but packages can also be a source of frustration if they don’t load as expected. In this module, we explore some technical details on ways packages are loaded that may facilitate troubleshooting and understanding. We specifically cover the difference between a package being attached to the search path versus loaded via a namespace and how to check this distinction. We also learn how to remove packages, both from the current session and from the computer altogether. This module has a fair amount of technical detail and doesn’t necessarily need to be reviewed before proceeding with the main modules.

    Link: https://michaeldgarber.github.io/teach-r/manage-packages.html

Main course modules:

  1. Introduction to R and data wrangling: learning the basics of dplyr with publicly available COVID-19 data.

    Dplyr, as stated on its documentation page, “is a grammar of data manipulation providing a consistent set of verbs that help you solve the most common data manipulation challenges.” In this module, we learn the fundamentals of dplyr in the context of an applied example with publicly available COVID-19 data from the New York Times.

    Link: https://michaeldgarber.github.io/teach-r/dplyr-1-nyt-covid.html

  2. R for data wrangling 2: more dplyr with tidycensus, mapview, and ggplot2

    In this module, we continue with the COVID-19 example from the first tutorial and introduce four additional packages throughout the example: tidycensus, mapview, sf, and ggplot2. Tidycensus allows you to conveniently download census data into R. Mapview lets you make an interactive map with one line of code, sf is a toolbox for managing and manipulating spatial data, and ggplot2 is a prominent package for making graphs and figures. The next sessions elaborate on sf and mapview.

    Link: https://michaeldgarber.github.io/teach-r/dplyr-2-mapview-tidycensus

  3. R for spatial data wrangling: using sf to wrangle OpenStreetMap-downloaded pharmacies in Atlanta and assess population living nearby

    The sf package and its corresponding object class have become standard tools for managing and representing (vector-based) spatial data in R. An appealing aspect of the sf object class is that it behaves like regular rectangular data and is thus amenable to common data manipulation techniques. In this module, we will work through an example analysis of pharmacies downloaded from OpenStreetMap in the Atlanta area. Through the example, we describe several operations in the sf ecosystem, including ways to merge spatial data, manipulate coordinate systems, and create buffers.

    Part 1: https://michaeldgarber.github.io/teach-r/sf-atl-pharm-part-1.html

    Part 2: https://michaeldgarber.github.io/teach-r/sf-atl-pharm-part-2.html

  4. Making static and interactive maps in R

    This module will elaborate upon the capabilities of some of R’s mapping tools covered in previous sessions. Specifically, it will illustrate tools for both static maps (as in an image) and interactive maps (as in zooming in and out like Google Maps). Mapview and ggplot2 will again be covered, and tmap will be introduced. This module will also cover color palettes available in R for visualizing continuous and categorical data.

    Link: https://michaeldgarber.github.io/teach-r/map-making.html (work in progress)

  5. Monte Carlo simulations and boots with the purrr::map_dfr()

    Creating your own functions can help reduce tedious and error-prone repetition in code. Functions can also be used to iteratively modify steps in the data analysis to estimate uncertainty of results through Monte Carlo simulation techniques and bootstrap resampling. This module will motivate, describe, and provide demo examples of Monte Carlo simulations and bootstrapping using map_dfr() from purrr.

    Link: https://michaeldgarber.github.io/teach-r/monte-carlo-sim-bootstrapping-purrr.html

Extra material:

Using R to gather bicycle infrastructure data from OpenStreetMap


Sessions and recordings

Dropbox folder with all recordings

Google Drive folder with all recordings


Session 1 (August 2, 2022)

Session 2 (August 3, 2022)

Session 3 (August 4, 2022)

Session 4 (August 8, 2022)

Session 5 (August 9, 2022)




Copyright © 2022 Michael D. Garber