Data exploration is a key aspect of any analytical workflow, and one nice way to explore and analyze data is by using maps. In this post, we will use a map to visualize housing prices in the U.S., specifically the state-by-state House Price Index (HPI) published by Freddie Mac.
Our goal is to articulate how to quickly build an interactive Shiny app that allows a user to click on a state that has been shaded by its annual house price appreciation and display a chart of house prices over time.
The final app can be viewed below or in a new window.
In this post we will focus on how to construct a map and shade it according to the annual house price appreciation (HPA) of each state. We will need to do the following:
- import the Freddie HPI data from Quandl
dplyrand pipes to wrangle to the format we want
- calculate annual HPA for each state and save in tidy format
- import geospatial object to construct a map of the USA
- add our HPA data to that object
- use leaflet to build a map with states shaded by HPA
First, let's load up the necessary packages.
install.packages('tigris') # Note that things might get tricky with this particular package. # If you are using a Linux OS, run the following before installing 'tigris': # 1. sudo apt-get update && sudo apt-get install libgdal-dev libproj-dev # 2. install.packages("rgdal") # 3. install.packages("rgeos") # 4. library(rgdal) # 5. library(rgeos) install.packages('tidyverse') install.packages('leaflet') install.packages('Quandl') install.packages('lubridate')
library(tigris) library(tidyverse) library(leaflet) library(Quandl) library(lubridate) # You might want to supply an api key # Quandl.api_key("[your key here]")
Now we can import the data from Quandl using the
Quandl() function and supplying the data set code
states_hpi <- Quandl("FMAC/HPI", order = 'asc') %>% select(-53:-54)
Quandl supplies a nicely formatted
Date column and defaults the import to a
data.frame object. That's helpful because we will use
dplyr and pipes for our wrangling. As happens quite often, the wrangling is going to be the most challenging part of our project. We need to take those 50 state HPIs and calculate the annual price appreciation for each state.
We will use pipes to go through the following logic:
- We need only two data points, the most recent price and the price 12-months previous to that. So, we'll use
filter()to grab those two rows.
- Our data are currently in a wide format, but we need it to be in a long, tidy format. We'll use
gather(state, value, -Date)to change from wide to long.
- Then, we
- It's worth lingering on these two steps because they are the least intuitive at first glance. Originally, we had 1 date column and 51 state/DC columns.
group_by()will reformat our data to one very long column called
state, one very long column called
valueand one very long column called
Date. Why is this useful?
- We want to calculate HPA by state, meaning we are going to run 51 separate calculations and store the results in 51 different locations. Changing our data to a long format and grouping by state will allow us to apply our HPA calculation to 1 column (the value column), and store the results in 1 newly created column. This keeps the data in a tidy format.
- Back to the substance, to calculate HPA and create a new column for it, we call
mutate(hpa = (value - lag(value))/ lag(value)). This creates the
hpacolumn and gives a value, by state, of the house price appreciation in the last 12 months. Remember we have already filtered the dates in step 1.
- Next, we reformat the new
- We don't need the date column anymore, so we can
- Finally, we
rename()the state column to
STUSPS, which will be explained later!
states_wrangled <- filter(Date == ymd("2016-03-31") | Date == ymd("2017-03-31")) %>% gather(state, value, -Date) %>% group_by(state) %>% mutate(hpa = (value - lag(value))/ lag(value)) %>% mutate(hpa = round(hpa, digits = 4) * 100) %>% na.omit() %>% select(state, hpa) %>% rename(STUSPS = state)
We now have wrangled our original 52 column data frame of state HPIs to a 2-column data frame with state HPAs.
Next, let's build a map of the USA that is shaded according to that second column!
First, we need to import an object that holds geometric data for all 50 states and then add that
hpa data to the object.
tigris package makes it easy to import a
simple features data frame that has geospatial data for the 50 states. To do so, use the aptly named
states() function and set
class = "sf".
states <- states(cb = TRUE, class = "sf")
states object contains the longitude and latitude coordinates of each state polygon in the
geometry column, and also has a column for
STUSPS, which are the state abbreviations used by the USPS. This is the reason that when we created our
hpa data frame, we renamed the state column to
STUSPS. We will use that common column name to merge our HPA data into the simple features spatial data object by calling
merge(states, states_wrangled, by = "STUSPS"...). This function will add an
hpa column in our spatial data frame according to matches in the
STUSPScolumn (there is a column called
STUSPS in both of these data frames). For any columns whose
STUSPS values don't match, the function will place an NA in the
# Now we want to merge by a common column name. states_hpa_leaflet <- merge(states, states_wrangled, by = "STUSPS", all.x = TRUE)
Notice that there is now a column called
hpa. The fourth row has an NA because our spatial object had an entry for American Samoa but our Quandl HPI data did not. When we shade the map, that NA will be a gray polygon.
Next we'll create a shading scheme with the
colorNumeric() function from the
leaflet package. We'll go with a the blue-green palette by setting
palette = "GnBu", and use the argument
domain = states_hpa_leaflet$hpa to shade by
# Build states map statesPal<- colorNumeric( palette = "GnBu", domain = states_hpa_leaflet$hpa)
We want something to happen when a user clicks the map so let's create a pop-up to display state
NAME and exact
statesPopup <- paste0( states_hpa_leaflet$NAME, "Annual House Price Percent Change:", states_hpa_leaflet$hpa, "%")
Now it's time to put it all together and call the
leaflet() function on our spatial object.
leaf_states <- addProviderTiles("CartoDB.Positron") %>% setView(-95, 40, zoom = 4) %>% addPolygons(stroke = TRUE, color = "black", weight = .4, opacity = 1.0, smoothFactor = 0.5, fill = TRUE, fillColor = ~statesPal(hpa), fillOpacity = .8, layerId = ~STUSPS, popup = statesPopup) leaf_states
Click on a state to test the popup. Do the darker blues appear to be in states where we expect large house price increases? Do lighter greens/yellows appear in states where we expect low/negative house price movement? Alright, this is the exact map that we'll use in Shiny - literally the exact map because we are just going to load up that leaf-states object in the app. From there we wire up Quandl so a user can access HPI data by clicking the map, then pass it to highcharter for visualizing. The full code for making that connection is available in the source code button on the finished Shiny app. Thanks for reading!
About the author:
Jonathan works with RStudio's financial services customers. He studied International Relations as an undergraduate at Harvard, worked in finance at JPMorgan and then did graduate work in Political Economy at Emory University before joining RStudio.