Welcome to the WiDS Data Dive, in partnership with the Chicago/Northern Illinois Red Cross.
You will have 90 minutes to complete the following task.
Below is a plot of number of incidents per month from 2014-2018 that the Chicago/Northern Illinois Red Cross responded to. As you can see, even ignoring seasonality, we observe a drop in total incidents in 2018. Our questions is simple: Why?
NOTE: This is not a question that we or the Red Cross definitively know the answer to. Feel free to be as creative as time allows!
Current Red Cross hypotheses include: weather, more stable economic trends, successful implementation of preparedness initiatives (including the Red Cross’s Home Fire Campaign), a reduction in fire-related crimes, or better cooking habits. Jim McGowan, our collaborater, provided a helpful link on some of these hypotheses. We do not provide datasets on these. You can also identify other data sources!
We will give you a zip file to download that has:
Below you’ll also find:
NOTE: This is not a modeling exercise; we simply are trying to look into interesting pathways to explore further.
Your team will need to create 2-3 slides (keep it simple) with your best visualization and text takeaways and/or further inquiries.
Please email your 2 slides with the email subject: “DATA DIVE”, and a list of team members’ names to als1@u.northwestern.edu by 6:45 pm.
A winning team will be announced shortly before 9 pm. Winner gets…bragging rights and eternal WiDS-dom and fame.
The code book will list each variable name with a description followed by the variable type in italics. If variable is categorical, each possible response will be listed.
Below is some R code for downloading already cleaned data. It has all dates and time stamps coded to be the correct type, also changed incident_num
to an integer without a dash.
Below is the code to create the cleaned rds file.
Our incident data has as a zip code variable. The below example code joins Red Cross incident data with spatial data based on zip code, and makes a map of the number of incidents in 2018.
The below code also creates a data frame (zip_income_pop_dat
) from the ACS Community Survey (ACS) 2010 Data for income (mean, median) and population per zip. It should be ready for you to join + map away!
There were a small percentage of incidents in Indiana and Wisconsin, but choose to focus on Illinois for mapping purposes.
# Illinois Zip Code Map ----------------------------------------------------
# Loading packages ------------------------------------------------------------
require(tidyverse)
require(rgdal)
require(maptools)
require(rgeos)
require(maps)
require(mapproj)
require(ggthemes)
require(RCurl)
require(lubridate)
require(janitor)
require(hms)
keep_zip_codes <- read_rds("Redcross.rds") %>%
as_tibble() %>%
clean_names() %>% #makes names lowercase column names
mutate(zip = zip %>% as.character()) %>% #we decide to make all zip codes
select(zip)
# Forming Chicago Zip Map Tibble ------------------------------------------
#Make sure all spatial data files from the zipfile are downloaded into your working directory!
il_map_dat <- readOGR(dsn="tl_2010_17_zcta510.shp") #read in IL zip codes shapefile
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\Mena Whalen\Dropbox\WiDS_Data_Dive\tl_2010_17_zcta510.shp", layer: "tl_2010_17_zcta510"
## with 1384 features
## It has 11 fields
## Integer64 fields read as strings: ALAND10 AWATER10
il_map_dat@data$id <- rownames(il_map_dat@data)
il_points <- fortify(il_map_dat, region = "ZCTA5CE10") %>% #fortify helps us turn the map into data points in a data frame
as_tibble() %>%
filter(id %in% keep_zip_codes$zip)
#creating mapping dataset
il_map_df <- full_join(il_points, il_map_dat@data, by="id") %>%
as_tibble()
#loads data with zip code, mean/median income and population
zip_income_pop_dat <- read_csv(file = "population_income.csv") %>%
clean_names() %>% #all lowercase
mutate(zip = zip %>% as.character()) #we choose to make the zip codes as character
red_cross <- read_rds("Redcross.rds") %>%
as_tibble() %>%
clean_names() %>%
mutate(zip = zip %>% as.character())
zip_raw_count <- red_cross %>%
mutate(year = year(date)) %>%
group_by(zip, year) %>%
summarise(incident_count = n()) %>% #counts the number of incidents in each zip code in each year
ungroup() %>%
filter(year == 2018) %>% #we just will look at 2018
mutate(more_than_10 = case_when(incident_count <= 5 ~ "< 5",
incident_count <= 10 ~ "5-10",
TRUE ~ ">10")) #make discrete coloring scale
# make the plot!
il_map_df %>%
left_join(zip_raw_count, by = c("id" = "zip")) %>%
ggplot(aes(long, lat , group = group, fill = more_than_10)) +
geom_polygon() +
geom_path(color = "black") +
coord_quickmap() +
theme_map() + ggtitle('Incidents Per Zip Code in 2018')
Some of the time stamp variables such as verified
and on_scene
have errors from incorrect data entry.
The two main errors were:
on_scene
(Red Cross arrives at the incident) time stamps LATER than dispatched
(Red Cross learns about incident and dispatches volunteers). This doesn’t make any sense.The below code changes these erroneous time stamps to NA
.
require(dplyr)
require(lubridate)
# creates a new variable for the time it takes for the Red Cross to be on the scene in minutes
red_cross$time_to_inc <- difftime(red_cross$on_scene, red_cross$verified, units = "min")
# creates a new variable for the time the Red Cross is at the incident in minutes
red_cross$time_of_inc <- difftime(red_cross$departed, red_cross$on_scene, units = "min")
# creates a new data frame from the red cross that makes all the timestamp variable be after 2014 and on
# as well as make sure that the time to and time of variable is not negative
red_cross_time <-red_cross %>%
mutate(verified = if_else(verified > ymd("2013-12-31"), verified, as.POSIXct(NA)),
dispatched = ifelse(dispatched > ymd("2013-12-31"), dispatched, as.POSIXct(NA)),
responders_id =ifelse(responders_id > ymd("2013-12-31"), responders_id, as.POSIXct(NA)),
on_scene = ifelse(on_scene > ymd("2013-12-31"), on_scene, as.POSIXct(NA)),
departed = ifelse(departed > ymd("2013-12-31"), departed, as.POSIXct(NA)),
time_to_inc =ifelse(time_to_inc >-1,time_to_inc, as.POSIXct(NA)),
time_of_inc = ifelse(time_of_inc >-1, time_of_inc, as.POSIXct(NA)))
# look at the median time the Red Cross takes to get to the scene
median(red_cross_time$time_to_inc, na.rm = T)
## [1] 82
# look at the median time the Red Cross spends at an incient
median(red_cross_time$time_of_inc, na.rm = T)
## [1] 62
Here is a survey to give us some feedback on how to improve WiDS in the future. Thank you for your participation this year.
From Toni Stapleton looking at the drop in no response needed incidents over time.
From the WiDS Team: Emily Hittner, Aleksandra Sasha Shirman, Blake Shirman, Katherine Simeon, Oscar Zarate. First looking at incident type to find that most incidents are nonrespose and fire.
Then looking at just fires per day.
Looking at a fourier transform for the dominant signal
They then looked at county level normalized fire frequency (per 10k people as a rough estimate of county size).
From WiDS Team: Frank Fineis, Martha Eichlersmith, and Noelle Samia. Running a generalized linear mixed model, they could not detect meaningful difference between pre and post 01/01/2018.
#Acknowledgments
We’d like to thank Jim McGowan of the Chicago/Northern Illinois Red Cross for their partnership, and Kisa Kowal, Arend Kuyper, Karen Smilowitz, and Reut Nocham for their technical/logistical support.