Preface

Motivation

Your parents retired a few years ago and they are considering relocating into a retirement home. Currently, their top two choices for states to move to are Florida or California. While they are excited to live in some place new, they are worried about driving in their old age considering the motor vehicle traffic fatalities in Florida and California.

You want to recommend to your parents which state might be better or maybe even recommend a different state based on other preferences they may have, such as access to mass transit, weather, hiking, etc. You did some research and found the Fatality Analysis Reporting System (FARS) of the National Highway Traffic Safety Administration (NHTSA). You read that “FARS is a nationwide census providing NHTSA, Congress and the American public yearly data regarding fatal injuries suffered in motor vehicle traffic crashes”. You decide to explore the dataset and other state-level features to be able to recommend a state for your parents to move to.

Some datasets that you will find useful are:

Problem 1

Problem 1.1

Read in the accident.dbf dataset into R from the 2015 FARS dataset and create a data frame called acc.

Hint: You might find the foreign R package useful here.

## add your code here

Problem 1.2

You see first colum in the acc dataset (STATE) contains the Geographic Locator Code (GLC) for each US state. Read in the GLCs dataset for the US and US territories. Create a data frame called states that contains a state name and state GLC code in each row.

## add your code here 

Problem 1.3

Add the state abbreviation and region to the states data frame using the state dataset in R.

## add your code here

Problem 1.4

Add the state name, abbreviation and region to the acc dataset.

## add your code here

Problem 1.5

Add a column to the acc dataset containing the 2015 population total for each state.

## add your code here 

Problem 2

Which states have the most motor vehicle fatalities?

Problem 2.1

Calculate the total number of fatalities by state and visualize the results with a barplot. The x-axis should be names of the states and the y-axis should be the total number of fatalities. Order the barplot in descending order with the largest number of fatalities on the left side and the smallest on the right side.

## add your code here

Which states contain the most fatalities? Which states contain the least?

Add a summary of your findings here

Problem 2.2

If we consider the top 3 states with the most fatalities, are there certain times of the year which are more or less problematic for these states? Create a data visualization (plot) to explore this question and summarize your findings.

Hint: Create a plot of fatalities across time. Read about the as.Date() function.

## add your code here

If there are states that have differences across time, why do you think that is?

Add a summary of your findings here

Problem 3

Based on the lectures, we learned that healthcare spending and coverage was highly related to total population size. Let’s explore how traffic fatalities are related to population size and other variables that might be relevant to your parents.

Problem 3.1

Is there a relationship between total fatality and population size? How does this change across regions and states. Create a data visualization (plot) to explore this question and summarize your results.

Hint: Try coloring the states by regions and add the state abbreviations to the plot.

## add your code here

Add a summary of your findings here

Problem 3.2

Instead of total number of fatalities, calculate the fatality rate (total number of fatalities divided by population size). How does the fatality rate change across regions and states? Create a data visualization (plot) to explore these questions and summarize your results.

## add your code here

Which states have the highest fatality rate? Which states have the lowest fatlity rate?

Add a summary of your findings here

Problem 3.3

Is rate of traffic fatalities related to life expectancy? Create a data visualization (plot) to explore this question and summarize your results.

Hint: Color the states by regions and add the state abbreviations on top of your plot.

## add your code here

Add a summary of your findings here

Problem 4

As your parents are now retired, they have three additional other concerns:

  1. How much a state spends on medicare
  2. Living in warm weather
  3. Access to several large cosmopolitan cities.

Given what you have learned in this homework assignment, investigate these three additional concerns from your parents by looking at other datasets that would be useful in helping them decide where to live. You can use whatever sources of data that you think best help make a recommendation for your parents on where to move to for their retirement.

Note: There is no wrong answer, but to get full credit, a complete response to this quesion should include: (1) recommendations of states to move to, (2) additional exploratory data analyses investigating the three additional concerns from your parents, and (3) a summary of the findings from the data analyzed in the three previous problems and/or other datasets that you think are relevant given their additional three concerns listed above.

Add a summary of your findings here