Prerequisites

Experience with the specific topic: None

Professional experience: None

RStudio Version: 1.0.136

R Version: 3.3.1

In this article, we will take a look at R-shiny and leaflet for an interactive mapping web application. We will use the NYPD Seven Major Felonies dataset. In order to follow along, please download the dataset here. It’s a large file so it may take some time to download.

Introduction to Shiny

Shiny is an open-source R package for building very quick and powerful web applications just using the R syntax. It is easy to use, has great video and written tutorials, and has a great community that can provide answers to most of your questions. In order to build a dashboard with shiny, you don’t have to know any HTML, CSS, or JavaScript. Of course, if you know them you can make your apps prettier but you can still build strong and to-the-point dashboards without them. You can get an overview of what can be done using shiny here.

Introduction to Leaflet

Leaflet is a popular interactive mapping library written in JavaScript. We will be using the R integration for leaflet. When it comes to interactive mapping, I personally haven’t used any other mapping libraries because leaflet’s R package has been more than enough in providing a solution to most of the tasks I’ve been faced with. You can find the documentation to the leaflet R package here.

Step 1: Install the Required R Packages

In order to follow along with this tutorial, you first need to install and load the required packages. Here are the list of packages that are required:

  • shiny
  • leaflet
  • tidyverse (a powerful collection of data manipulation libraries)

To install an R package you can use the install.packages function from the RStudio console. You can also pass a vector of package names to the install.packages function and it’ll do the rest:

package_vector = c("shiny", "leaflet", "tidyverse")
install.packages(package_vector)

Note: While installing leaflet, you may get the following error: Warning in install.packages: installation of package 'leaflet'
had non-zero exit status. This may cause an error when you try loading the library by using library(leaflet). If you get this warning message and an error when you try to load the package, try installing leaflet by:

  • install.package(devtools)
  • devtools::install_github("rstudio/leaflet")

Step 2: Create the Base Shiny App

You can structure your shiny app in two ways. You can either have an app.R file that has all of your ui components and the server logic, or you can create three separate files: ui.R, server.R and global.R. You can have additional files like helpers.R but—for the purpose of this tutorial—either app.R or the three files suffice.

We will follow the three-file structure in this tutorial because I personally think it is more organized when the ui and server are in different files which makes is easier to debug.

We will also be using the RStudio IDE, which can be downloaded here.

Now, let’s create a new project in RStudio, create our three separate files, and save them into our project directory. Here are the file names:

  • ui.R
  • server.R
  • global.R

When you run your app, global.R runs first and loads everything in that file to the global R session environment. I personally find it easier to do everything that has to be done before the app starts in global.R. So now, in order to launch the base app, we have to put some code in all the files.

In the global.R file, load the following required libraries:

library(shiny)
library(leaflet)
library(dplyr)

In the server.R file, create our server function which will be called once for each session (For more detailed information about server.R function, please refer to Shiny Documentation):

server <- function(input,output, session){
}

In the ui.R file, create an empty page and print Hello World!:

ui <- fluidPage(
  print("Hello World!")
)

Now that we have our base app ready, you can do runApp() from the RStudio console and an empty page with Hello World! printed will pop up. Congratulations on launching your first shiny web app!

Step 3: Create the Base Map

Now that we have the basic app running, let’s add a base map. Since we are going to map out the Seven Major Felonies data for New York City (NYC), set the map so that the initial view will be NYC.

In order to create a map, we need to do two things:

  1. We need to create an output from ui.R. So in our ui.R file, add this line of code:
  leafletOutput("mymap",height = 1000)
  1. We need to take that output as an input, in server.R and create the map. In order to do this, we will use a function called renderLeaflet.
output$mymap <- renderLeaflet({
   m <- leaflet() %>%
          addTiles() %>%
          setView(lng=-73.935242, lat=40.730610 , zoom=10)
   m
 }) 

Here we are chaining methods in order to create our map:

  • leaflet() creates the map widget
  • addTiles() adds the default OpenStreet map tiles
  • setView() sets the view to the provided coordinates with the provided zoom level, in this case NYC. (You can also create the map without the setView method.)

Step 4: Load the Data and Prepare It for Mapping

Now we are ready to load in our data for mapping. If the data is small enough, you can prefer to do everything about the data in the global.R file. However, our dataset has more than 1,000,000 rows, which means that the app will try to do everything in global.R at every launch. This will cause an increase in the app launch time which, in this case, is not ideal. Instead we will do every processing we need beforehand and save the processed dataset as an .rds file which is a serialized version of the dataset and compresses it using .gzip compression. This will save us time every time we launch our app.

Let’s create a process.R file to do the data preparation and load our dataset like the following:

df = read.csv("./NYPD_7_Major_Felony_Incidents.csv", stringsAsFactors = F)

In the dataset, we can see that we have a Location.1 column that contains the crucial Latitude-Longitude information for mapping. But now this column is in string format and combined. To map the data in leaflef we need columns named as lat/latitude and lon/lng/long/longitude (case insensitive).

First, we will split the column by , and create two separate columns named Latitude and Longitude. In order to do this, we will use separate function from the tidyverse/tidyr package:

df <- tidyr::separate(data=df, 
                      col=Location.1, 
                      into=c("Latitude", "Longitude"), 
                      sep=",",
                      remove=FALSE)

Also, we don’t need the “(” or “)” so we will remove them by stringr::str_replace_all from the tidyverse/stringr package:

df$Latitude <- stringr::str_replace_all(df$Latitude, "[(]", "")
df$Longitude <- stringr::str_replace_all(df$Longitude, "[)]", "")

We now have clean Latitude and Longitude columns but their data type is still string. We need to convert them to numeric so that they can be interpreted as coordinates. But how many digits are we going to keep after the decimal point? You can find a detailed answer to this question in this forum post. The default option in R is to keep seven digits overall, which means five digits after the decimal point. Five decimal points are worth up to 1.1m and it is more than enough for the purpose of this article.

In order to convert theLatitude and Longitude columns to numeric, we will use the as.numeric() base R function:

df$Latitude <- as.numeric(df$Latitude)
df$Longitude <- as.numeric(df$Longitude)

Now that our longitude and latitude columns are in the structure we want, we can continue to mapping.

Step 5: Map the Data with Leaflet

In our dataset, we have more than 1,000,000 points. It may take a really long time to render and plot all of the points on the map depending on your machine specs. Since the purpose of this article is to introduce shiny and leaflet for mapping points, I took 1,000 points and saved them as a .rds file. Then, in my global.R file, I read the small dataset which takes a much shorter amount of time:

sample_data <- df[c(1:1000),]
saveRDS(sample_data, "./sample_data.rds")

In order to map these 1,000 points first, we will remove the setView function since the view will be automatically set to NYC. If we want to add a filter functionality into the map in the future, we’ll have to first create a reactive object and put our dataset in it. This will allow us to instantly filter the dataset and update the map accordingly. Our server.R file will look like this:

server <- function(input,output, session){
  
 data <- reactive({
   x <- df
 })  
  
 output$mymap <- renderLeaflet({
   df <- data()
   
   m <- leaflet(data = df) %>%
          addTiles() %>%
          addMarkers(lng = ~Longitude, 
                     lat = ~Latitude,
                     popup = paste("Offense", df$Offense, "<br>",
                                   "Year:", df$CompStat.Year))
   m
 }) 
}

The first reactive object is our dataset which we read in the global.R file. In the renderLeaflet method, we are assigning our reactive dataframe data() to a local df in the renderLeaflet. Then we are passing that into the leaflet function and adding the base map. The magic happens with the addMarkers method. We are passing the already structured Latitude and Longitude and also adding a pop-up. The pop-up will appear when we click on the point and it will print the offense type and the year. You can add as much information as you want. You can also change the pop-up window so that it will appear when you hover over the point.

After all five steps, our files look like this:

  • ui.R
ui <- fluidPage(
  leafletOutput("mymap",height = 1000)
)
  • server.R
server <- function(input,output, session){
  
 data <- reactive({
   x <- df
 })  
  
 output$mymap <- renderLeaflet({
   df <- data()
   
   m <- leaflet(data = df) %>%
          addTiles() %>%
          addMarkers(lng = ~Longitude, 
                     lat = ~Latitude,
                     popup = paste("Offense", df$Offense, "<br>",
                                   "Year:", df$CompStat.Year))
   m
 }) 
}
  • global.R
library(shiny)
library(leaflet)
library(dplyr)

df <- readRDS("./sample_data.rds")
  • The file where we processed our data, process.R:
library(shiny)
library(leaflet)
library(dplyr)
library(tidyr)
library(tidyverse)


df = read.csv("./NYPD_7_Major_Felony_Incidents.csv", stringsAsFactors = F)


df <- tidyr::separate(data=df, 
                      col=Location.1, 
                      into=c("Latitude", "Longitude"), 
                      sep=",",
                      remove=FALSE)

df$Latitude <- stringr::str_replace_all(df$Latitude, "[(]", "")
df$Longitude <- stringr::str_replace_all(df$Longitude, "[)]", "")


df$Latitude <- as.numeric(df$Latitude)
df$Longitude <- as.numeric(df$Longitude)
saveRDS(df, "./data.rds")


sample_data <- df[c(1:1000),]
saveRDS(sample_data, "./sample_data.rds")

 

Conclusion

The purpose of this tutorial was an introduction to interactive mapping using shiny and leaflet. Here is what the map looks like after the five steps:

interactive map NYCAs demonstrated, leaflet and shiny are very user-friendly tools for interactive mapping for beginner to advanced level data scientists. In just five steps, we structured our data and created a map that looks nice and is easy to interpret. You can play with other functionalities of leaflet building off of this tutorial. If you try to render all the points, you can even try to spin-up a powerful VM and use the free distribution of shiny-server.

Arda Kosar
Author
Arda Kosar

Arda is currently a data scientist on the Search & Data Science Team at Publicis Worldwide. He started his career in data science at NYC Data Science Academy and holds a Bachelor’s degree in Mechatronics Engineering and an MBA. He develops unique tools, analysis, and quantification designs that helps his team meet digital business objectives, as well as uncovers insights to make data-driven decisions for clients. Arda considers himself a continuous learner who is enthusiastic about constantly exploring the new technologies in his field. He has also recently contributed to Kubeflow, an open-source project by Google, and is dedicated to deploying machine learning workflows on Kubernetes.