Preface

Motivation

Extreme weather, such as major storms, hurricanes, tornadoes, and floods can cause major damage and health effects in areas that come in contact with them. However, assessing the public health impacts of major events can be difficult because attributing health problems to storms is largely inferential in nature. Official counts of deaths from events like hurricanes tend to undercount the overall impact of a hurricane on the population because of what constitutes a death “directly caused” by the event. The definition used by the National Weather Service for a direct fatality is “A direct fatality or injury is defined as a fatality or injury directly attributable to the hydro-meteorological event itself, or impact by airborne/falling/moving debris, i.e., missiles generated by wind, water, ice, lightning, tornado, etc.”

In a recently publicized example, the official death count for Hurricane Maria, which hit the island of Puerto Rico, was 64. One systematic investigation of mortality after the hurricane estimated 4,645 (95% CI, 793 to 8,498) deaths from September 20 through December 31, 2017. A simpler analysis using just time series data estimated about 1,750 deaths in the same time period.

What about all of the other storms and floods that hit cities in the U.S. on an annual basis? What do we know about the mortality effects of those events? The goal of this Homework is to just scratch the surface of answering that question and to look at some data that might be useful for addressing these kinds of quesitons.

Overall Objective

The goal of this assignment is to develop an estimate for the health impact of major storm events in the United States. For this problem we will focus on mortality impacts. Using the NOAA Storm Event Database and the NMMAPS mortality data, you must link the two together, fit whatever models are needed to develop your estimate, and then report on your estimate while noting the limitations and uncertainties.

During the course of your analysis there will be many options for you to explore and various approaches that you may take. Part of your job will be to choose amongst these many options and focus on a specific approach that you find most interesting or has the greatest potential for success.

Data

Datasets that we will focus on here are

Problem 1: Exploring the Mortality Data

Problem 1.1

The mortality data are available in a single mortality.zip file. Inside the zip file is a single CSV file named mortality_1987-2005.csv.bz2. Download and unzip the file. Read the mortality_1987-2005.csv.bz2 data into R using the readr package. Then, read the nmmaps_cities.csv file into R to get the metadata on the cities in the NMMAPS study.

## Add your code

Note that many of the “cities” in the NMMAPS data are actually combinations of different counties. So for example, “New York City” is a combination of 6 sepaparate counties. The nmmaps_cities.csv files provides the mapping of counties to “cities”. In addition, the 5-digit FIPS code identifying each county is provided here.

Problem 1.2

Make a plot of the daily mortality from all non-accidental deaths (death) versus date for the city of New Orleans, LA in the year 2004.

## Add your code

Are there interesting features such as when mortality tends to be high or low?

Add a summary of your findings here

Problem 1.3

Try making this same plot for different years, different cities, and different causes of death.

## Add your code

If you were to focus your analysis on a single city, or a few cities, which ones might be the most practical or interesting ones to choose?

Add a summary of your findings here

Problem 1.4

Take all of the data on non-accidental deaths for New York City, NY and divide them by season of the year. In which season does mortality tend to be the highest?

## Add your code

Add a summary of your findings here

Problem 1.5

Is the seasonal pattern of mortality the same in every city in the NMMAPS data? Summarize your results below.

## Add your code

Add a summary of your findings here

Problem 1.6

Take a look at any other temporal patterns in the data. Are there day-of-week effects? Weekly or monthly trends? Yearly trends?

## Add your code

Add a summary of your findings here

Problem 2: Exploring the Storm Event Data

The storm event database goes back until 1950, with one file per year. You will likely not need all of it.

Problem 2.1

Just to start, download the data for 2004 (these files are labeled StormEvents_details-*), read the data into R and take a look at it. You will see that the storm event data have a column for the year and date/time of each event. There are also separate columns for when the event began (i.e. BEGIN_DATE_TIME) and for when the event ended (END_DATE_TIME). Convert these columns into R date/time objects and add a new column that contains the length of each event in the dataset.

## Add your code

Problem 2.2

Given your new begin/end date columns, we can look at temporal patterns of specific storm events. How many flash floods occur in each of the four seasons of 2004 for the state of Texas?

## Add your code

Add a summary of your findings here

Problem 2.3

What is the seasonal pattern for other major storm events in the database? Adapt your code from above to explore these patterns for other event types and other locations.

## Add your code

Add a summary of your findings here

Problem 3: Linking the Datasets

Having thoroughly explored the mortality and storm events database, you will eventually need to link the two together in order to determine what if any connection there is between major storms and mortailty.

Problem 3.1

The mortality data are presented as a time series with the number of deaths for each day. However, the storm events data are presented as events, with one record for each event. One transformation that is likely to be useful is to convert the storm events data into a time series. A dataset of this nature will have a column for the date and another column indicating whether there was an event (i.e. flood, hurricane) occurring on that date.

Create a time series for flash floods for the state of Louisiana in the year 2004. Then, make a time series plot of the flash floods.

## Add your code

Problem 3.2

Link the time series flood data with mortality data for New Orleans, LA from the NMMAPS dataset. Then, make a scatterplot of storm event and deaths in New Orleans for 2004.

## Add your code

Problem 3.3

Make the same scatterplot as above, but now stratified by season so that there are four plots.

## Add your code

At this point you may want to modify and adapt the code you’ve written above to explore the following additional questions:

  • What does the relationship between death and hurricanes or other major event times look like?

  • How does the scatterplot look in different years or states?

  • Does it make sense to look at a smaller geographical unit than the state? What about county/city?

  • Which major storm events are most frequent in which states? It might make sense to focus on events that are somewhat regularly occurring rather than very rare events.

Problem 4: Narrowing Down the Question

Once you’ve linked the data together and had the chance to work through some plots and visuals of the data, you will need to narrow things down in order to focus on an question for the purpose of the homework.

Consider the following questions as you narrow things down:

At this point you have should have a good idea of what specific aspect of the data you would like to use to develop an estimate of the mortality effect of major storms.

Briefly state what your approach will be here

Note: You do not have to cover anything and everything. Rather, you should produce a produce a reasonable answer given the data and time available.

Problem 5: Modeling

For this part, you will need to develop a model for relating your chosen storm factor (or combination thereof) with a mortality outcome.

## Add your code

Problem 6: Narrative

Given the model results you produced in Problem 5, it’s time to narrow them down to a presentable format.

To get full credit for the problem, you will need the following: