Part 1 url: https://michaeldgarber.github.io/teach-r/osmdata-for-bikes-part-1

Part 2 url: https://michaeldgarber.github.io/teach-r/osmdata-for-bikes-part-2

In part 1, I described OpenStreetMap’s data structure for bicycle infrastructure and paths. In this one, let’s dive into some R code to manage and visualize those data.

The package I use to get OpenStreetMap data is osmdata, authored by Mark Padgham and Robin Lovelace. The purpose of this package is to download vector data from OSM, for example the roadways and bike infrastructure described previously.

First, let’s get a feel for the osmdata package by downloading Atlanta’s roads. Then we’ll focus on a smaller area to explore bike infrastructure.

1 Use osmdata to download Atlanta’s roads

Install and load osmdata. I’ll also use the tidyverse, sf, mapview, and ggmap.

library(osmdata)
library(tidyverse)
library(sf)
library(mapview)
library(ggmap)

The three workhorse functions of this package are opq(), add_osm_feature(), and osmdata_sf().

  • opq(), short for overpass query, uses the overpass API to query OpenStreetMap data. The great thing about this package is one need not know about the overpass API to make use of the data. The important argument it takes is a bounding box (bbox=), which can be simply the name of a place like a city. Or, alternatively, four lat/lon values can be used. This function can be used on its own to simply request all of the data within a bounding box. It’s often more useful to request specific features, which is done using:
  • add_osm_feature(). This function takes a key and returns objects with values for that key. Recall, a highway (i.e., a roadway or path) is a type of key.
  • osm_data_sf() passes the queries specified by the bounding box and the requested features to the overpass API and returns an sf object.

These three functions can be piped (%>%) together. For example, suppose we’d like vector data corresponding to the primary roadways (minor arterials) in Atlanta. The .$osm_lines line of code extracts the line-based geometry, specifically. Polygons might also be included.

atl_primary = opq(bbox = "Atlanta, Georgia, USA") %>%
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%  
  .$osm_lines %>% #Extract the osm_lines, specifically.
  st_as_sf() #Confirm it's sf.
atl_primary %>% mapview()

For context, let’s add secondary roads, tertiary roads, trunk roads, and the interstate highways. We’ll leave out residential roads, as the data become too big. The code is repetitive, but I’ve had better luck creating an sf object for each highway type than calling all types in one query using sequential add_osm_feature() function calls.

atl_motorway = opq(bbox = "Atlanta, Georgia, USA") %>%
  add_osm_feature(key = "highway", value = "motorway") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

atl_trunk = opq(bbox = "Atlanta, Georgia, USA") %>%
  add_osm_feature(key = "highway", value = "trunk") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

atl_secondary = opq(bbox = "Atlanta, Georgia, USA") %>%
  add_osm_feature(key = "highway", value = "secondary") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

atl_tertiary = opq(bbox = "Atlanta, Georgia, USA") %>%
  add_osm_feature(key = "highway", value = "tertiary") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

Each of those objects are sf objects, so they can be manipulated using dplyr verbs.

class(atl_primary)
## [1] "sf"         "data.frame"

Accordingly, we can bind the sf data together using bind_rows().

atl_roads =  atl_primary %>%
  bind_rows(
    atl_motorway,
    atl_trunk,
    atl_secondary,
    atl_tertiary
  )

#Take a look at the number of features of each highway type.
table(atl_roads$highway)
## 
##      motorway motorway_link       primary     secondary      tertiary 
##          1234             7          1255          3899          2854 
##         trunk 
##           496

1.0.1 Tangent to visualize a smaller radius

To visualize these all together, I’m going to pare down to a 1-mile radius around Five Points, as I was having issues rendering the larger mapview to html. In your own console, this simple call to mapview(atl_roads) should work.

atl_roads %>% 
  mapview(zcol = "highway")

To visualize in a smaller radius, I’ll use mutate_geocode() from ggmap to geocode Five Points and will create a 1-mile buffer around it using st_buffer() from sf.

library(ggmap)
register_google(key = "your_ggmap_key") #https://cran.r-project.org/web/packages/ggmap/readme/README.html
five_points_1mi_rad = as_tibble("Five Points, GA")  %>% 
  as_tibble() %>%
  mutate_geocode(value, force = TRUE) %>% 
  st_as_sf(coords = c("lon", "lat"),crs = 4326) %>% #The output at this step is a point-based sf geometry.
  #Change the coordinate system to one feet.
  st_transform("+proj=tmerc +lat_0=30 +lon_0=-84.16666666666667 +k=0.9999 +x_0=699999.9999999999
               +y_0=0 +datum=NAD83 +units=us-ft +no_defs +ellps=GRS80 +towgs84=0,0,0") %>%
  st_buffer(5280) %>% 
  #Transform back to coordinate system 4326 so it jives with the default coordinate system output of the OpenStreetMap data above.
  st_transform(4326) 
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Five+Points,+GA&key=xxx
## "Five Points, GA" not uniquely geocoded, using "five points, atlanta, ga 30303, usa"

Find the intersection of the OSM vector data with this buffer radius using st_intersection(), and visualize using mapview().

library(RColorBrewer) #to get a better palette.
set1_5 = RColorBrewer::brewer.pal(n=5, name  = "Set1") 
atl_roads %>%
  st_intersection(five_points_1mi_rad) %>%
  dplyr::select(osm_id, name, HFCS, bicycle, highway) %>%    #pick a few variables
  mapview(
    zcol = "highway",
    color =set1_5,
    map.types = c("CartoDB.DarkMatter", "CartoDB.Positron", "OpenStreetMap")
  )

2 Use osmdata to gather bicycle infrastructure around the intersection of Monroe and Ponce.

Let’s now construct a dataset of bicycle infrastructure. We’ll focus on a 1-mile radius around Monroe St NE and Ponce de Leon Ave NE, as there is a diverse mix of bicycle infrastructure in that area.

Define the bounding box using coordinates. The sequence is xmin, ymin, xmax, ymax, where x is longitude and y is latitude.

ponce_monroe_bbox =  osmdata::opq(bbox =c(-84.38882296093993, 33.786737, -84.352018, 33.759368705358256))

2.1 Gather roadway features for 1-mile radius around Ponce and Monroe.

We’d only expect to find on-road bicycle infrastructure on primary, secondary, tertiary, or residential roadways, so we need not pull in other types like motorways.

pm_primary = ponce_monroe_bbox %>% 
  add_osm_feature(key = "highway", value = "primary") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

pm_secondary = ponce_monroe_bbox %>% 
  add_osm_feature(key = "highway", value = "secondary") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

pm_tertiary = ponce_monroe_bbox %>% 
  add_osm_feature(key = "highway", value = "tertiary") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

pm_residential = ponce_monroe_bbox %>% 
  add_osm_feature(key = "highway", value = "residential") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

As mentioned, because these are each sf objects, we can wrangle them using sf and dplyr functions. Use dplyr::select() to narrow down the relevant variables (keys) and dplyr::mutate() to add an indicator variable.

pm_roadways = pm_primary %>% 
  bind_rows(
    pm_secondary,
    pm_tertiary,
    pm_residential
  ) %>% 
  #narrow down the variable names (keys). Soft code some of them in case they're not there.
  dplyr::select(
    osm_id, name, contains("HFCS"), 
    starts_with("highway"), #there is sometimes a key called highway_1, which this grabs.
    contains("surface"), contains("sidewalk"),
    contains("foot"),
    contains("cycle"), contains("bicycle"),
      contains("segregated")
  ) %>% 
  #make an indicator variable for this feature collection
  mutate(
    motor_or_nah = "motor"
  )

pm_roadways %>%
  mapview(
    zcol = "highway",
    color =set1_5,
    map.types = c("CartoDB.DarkMatter", "CartoDB.Positron", "OpenStreetMap")
  )

Some observations:

  • On 10th St NE, there is a second highway key called highway_1. Its value is cycleway, referring to the protected bike lane along that corridor. It’s not clear if one would have enough information on this alone to know that this is a protected bike lane and not a conventional or buffered bike lane, though.
  • Lanier Boulevard Northeast has a tag that bicycles are designated. From experiencce, this seems right. It is a pleasant biking street.
  • Ponce de Leon Ave NE has a buffered bike lane. The OSM data does indicate that Ponce has a bike lane (cycleway = lane), but there is no note about it being buffered.
  • 5th St NE west of Piedmont is correctly noted as having a lane.
  • Peachtree Center (osm_id = 591623771) has a protected bike lane, but it does not appear as a cycleway here. Perhaps it’s coded as its own feature below?

Explore the cycleway and bicycle keys.

table(pm_roadways$cycleway)
## 
##        lane          no shared_lane 
##          18           1          26
table(pm_roadways$cycleway.left)
## 
## lane 
##    3
table(pm_roadways$cycleway.right)
## 
##        lane shared_lane 
##           2           9
table(pm_roadways$cycleway.both)
## 
## no 
##  2
table(pm_roadways$highway_1)
## 
## cycleway 
##       15
table(pm_roadways$bicycle)
## 
## designated        yes 
##         69          2

Create a new variable indicating the type of on-street infrastructure per OSM.

pm_roadways_wrangle = pm_roadways %>%  
  mutate(
    bike_infra = case_when(
      #---Protected lanes----#
      highway_1 == "cycleway" ~ "protected lane",
      
      #---Lanes (conventional or buffered)-------#
      cycleway == "lane" ~ "lane of unknown type",
      cycleway.right == "lane" ~ "lane of unknown type",
      cycleway.left == "lane" ~ "lane of unknown type",
      
      #----Sharrow-------------#
      cycleway == "shared_lane" ~ "sharrow",
      cycleway.right == "shared_lane" ~ "sharrow",
      cycleway.left == "shared_lane" ~ "sharrow" 
      )
  )

dark2_3 = RColorBrewer::brewer.pal(n=3, name  = "Dark2") 
pm_roadways_wrangle %>% 
  filter(is.na(bike_infra)==FALSE) %>% 
  mapview(zcol = "bike_infra", color = dark2_3)

2.2 Gather paths for 1-mile radius around Ponce and Monroe

Grab the generic paths, cycleways, and footways.

pm_path = ponce_monroe_bbox %>% 
  add_osm_feature(key = "highway", value = "path") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

pm_cycleway = ponce_monroe_bbox %>% 
  add_osm_feature(key = "highway", value = "cycleway") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

pm_footway = ponce_monroe_bbox %>% 
  add_osm_feature(key = "highway", value = "footway") %>%
  osmdata_sf() %>%  
  .$osm_lines %>%  
  st_as_sf()  

Bind them together and visualize.

pm_non_motor = pm_path %>% 
  bind_rows(
    pm_cycleway, pm_footway
  ) %>% 
    #narrow down the variable names again.
  dplyr::select(
    osm_id, name, contains("HFCS"),
    starts_with("highway"), #There is sometimes a key called highway_1, which this picks up.
    contains("surface"), contains("sidewalk"),
    contains("cycle"), contains("bicycle"),
  contains("segregated")
    ) %>% 
   #make an indicator variable for this feature collection
  mutate(
    motor_or_nah = "nah"
  )

dark2_3 = RColorBrewer::brewer.pal(n=3, name  = "Dark2") 
pm_non_motor %>% 
    mapview(
      zcol = "highway",
      color =dark2_3,
       map.types = c("CartoDB.Positron", "CartoDB.DarkMatter", "OpenStreetMap")
  )

Some additional observations:

  • The Beltline Eastside Trail is clearly coded as a cycleway, as is most of the Freedom Parkway aka Stone Mountain Trail.
  • It looks like most of the footways are sidewalks, although the footway key is also grabbing some paths in Piedmont Park. Let’s omit footways from consideration for parsimony here.
  • Along 10th St, another feature (osm_id=741964056) is coded as a cycleway, in addition to the highway_1=cycleway noted above. That is, the same roadway has two geometries - one for the roadway itself and one for the protected bike lane paralleling it. As we noted
  • It’s not clear what the difference between some of the paths and cycleways are. For example, osm_id=169957934 in Freedom Park is coded as a path, but is adjacent to very similar paved trails coded as cycleways.

Consolidate the off-road infrastructure so it can be visualized together with the other on-road infrastructure.

pm_non_motor_wrangle = pm_non_motor %>% 
  mutate(
    bike_infra =
  case_when(
    osm_id == 741964056 ~ "protected lane",
    name == "Portman PATH" ~ "protected lane", #using some local knowledge
    name == "Peachtree Center Cycle Track" ~ "protected lane",
    name == "Stone Mountain Trail" ~ "protected lane", #probably would have been picked up with softer coding..

    #otherwise...
    highway == "cycleway" ~"paved trail",
    highway == "path" & surface == "paved" ~ "paved trail"
  )
  )

pm_non_motor_wrangle %>% 
  filter(is.na(bike_infra)==FALSE) %>% 
  mapview(
    lwd=4,
    zcol = "bike_infra" ,color = c("purple", "green"))

Combine the on-street infrastructure with the off-street infrastructure.

dark2_5 = RColorBrewer::brewer.pal(n=4, name  = "Dark2") 
pm_non_motor_wrangle %>% 
  bind_rows(pm_roadways_wrangle) %>% 
  filter(is.na(bike_infra)==FALSE) %>% 
  mapview(
    zcol = "bike_infra",
     layer.name = "Bike infrastructure", #Label this once since it's the last one.
    color = dark2_5,
    lwd=4,
    map.types = c("CartoDB.Positron", "CartoDB.DarkMatter", "OpenStreetMap")
    )

3 Summary

3.1 Advantages of using OpenStreetMap for gathering bicycle-infrastructure data

  • osmdata makes it pretty simple to bring OpenStreetMap vector data into R as an sf object, where it can then be managed using sf and dplyr.
  • It’s a single source. You don’t have to hunt down data from several portals or bug people at government agencies.
  • It’s used by many apps and additional data sources, which can make linking data easier. For example, I used Strava data for a project, and Strava delivered their data snapped to the OpenStreetMap basemap. As a result, I could link some of the values using an aspatial cross-walk rather than having to merge on spatial attributes.
  • It’s regularly updated by the OSM community.
  • Any edits to the map you might make for a particular project can be used by the worldwide mapping community (hooray for citizen science).
  • It’s free.

3.2 Some caveats

  • The data on bicycle infrastructure are not without their issues.
  • The same type of infrastructure may be coded in many ways.
  • The same coding scheme may refer to different types of infrastructure.
  • There is no clear classification scheme for buffered bike lanes.
  • It will still be wise to check with other sources, such as Google Streetview or some spot in-person audits.

For further reading, I’d suggest the article by Ferster & colleagues (2020):

Using OpenStreetMap to inventory bicycle infrastructure: A comparison with open data from cities.



Copyright © 2022 Michael D. Garber