Interested in taking this course and want prepare over the summer?

We will use the R programming language. To learn more about R, follow these tutorials:

Some specific R packages that will be heavily used are in the tidyverse such as ggplot2, dplyr, and tidyr. Working through tutorials that use the functions in these packages will be helpful in terms of prepartion.

This course will be very hands-on and focused on analyzing real-world data both in the lectures and as homework assignments. We will not go into too many details about the statistical methods themselves (just will refer students to other courses where you can learn about the theoretical aspects of the methods). Rather, we hope to provide the intuition of why a particular method is relevant, appropriate, or not appropriate for a given type of data, data set or application. If you feel uncomfortable with certain statistical concepts, we would suggest refresh this summer. As a reminder, this is a PhD level class for our PhD students in Biostatistics, it assumes knowledge of R and some statistical knowledge and moves relatively quickly. Generally speaking, biostatistics 620 is not sufficient to prepare you for this data science series.

Previous versions of the class

Books of interest

Blog posts of interest

Journal articles of interest

Other resources