Instructors

TAs

Class Information

Resources

Books

Course Description

Provides an intensive introduction to applied statistics and data analysis. Trains students to become data scientists capable of both applied data analysis and critical evaluation of the next generation next generation of statistical methods. Since both data analysis and methods development require substantial hands-on experience, focuses on hands-on data analysis.

Course Objectives

Upon successfully completing this course, students will be able to:

  1. Obtain, clean, transform, and process raw data into usable formats
  2. Formulate quantitative models to address scientific questions
  3. Organize and perform a complete data analysis, from exploration, to analysis, to synthesis, to communication
  4. Apply a range of statistical methods for inference and prediction

Evaluation and feedback

Our goal is to return feedback to you on projects within 1 week of submission. This is an intensive process because it requires individualized feedback so we hope you can bear with us if there are minor delays.

Grading philosophy

I believe the purpose of graduate education is to train you to be able to think for yourself and initiate and complete your own projects. I am super excited to talk to you about ideas, work out solutions with you, and help you to figure out statistical methods and/or data analysis. I don’t think that graduate school grades are important for this purpose. This means that I don’t care very much about graduate student grades.

That being said, I have to give you a grade, so I will use grades to help communicate your progress.

  1. A - Excellent
  2. B - Passing
  3. C - Needs improvement

If you receive A’s and B’s you would have passed our old qualifying exam on this project and are doing acceptable data analyses. If you receive C’s that is my way of letting you know that your work would not pass on the qualifying exam and probably isn’t up to speed. I don’t feel comfortable assigning percentages to data analyses, but to be able to calculate grades at the completion of the course I will use the following percentages: A = 100%, B = 85%, C = 75% of available points. This will be based on the rubric for data analyses.

Data analysis assignments

(For more on my project philosophy see: http://bit.ly/wQT5uI)

Each student will be required to perform two data analysis projects during the course of the class. Students will be given 2 weeks to perform each analysis. The project assignments will consist of a scientific description of the problem. Students are responsible for all stages of each data analysis from obtaining the data to the final report. At the conclusion of each analysis each student must turn in:

  1. A write-up of their data analysis in a synthesized format, with numbered figures and references. (You may also include supplementary material for detailed additional calculations/analyses)
  2. A reproducible Rmd file that produces all of the numbers, figures and results in your write-up.

All documents should be submitted electronically. The grades will be broken down according to the following characterization of your data analysis.

  1. Did you answer the scientific question? (30%)
  2. Did you use appropriate statistical methods? (40%)
  3. Was your write-up simple, clear, and precise? (20%)
  4. Was your code reproducible? (10%)

Keep in mind that this is a data science class In some cases standard methodology will be sufficient to answer the question of interest, in some cases you will need to go beyond the course, and in general the goal is to answer the question and provide an estimate of uncertainty. You may speak to your fellow students about specific statistical questions related to the projects, but the overall idea, analysis, and write-up should be your own individual work. You should cite any help you get from fellow students/TAs in your report in standard citation format.

Data analysis reviews

After each data analysis is turned in, they will be randomly assigned to another student for review. Your review will be due one week after it is assigned. Your comments should have the format of a typical peer review. You can find a template and instructions for these reviews here https://github.com/jtleek/reviews. You should include a summary of the analyses and conclusions in the project you are reviewing, any major revisions, and any minor revisions. We will also evaluate each data analysis independently to assign a grade.

Final Project

The final project will have the same format as the data analyses. It will be slightly longer than the weekly projects in terms of space and more in depth in terms of analysis. You will have an opportunity to submit this analysis, get feedback from the instructors, and re-analyze the data on this project. .

We will give you an option of a final project, but you may also propose a different project to the instructor and as long as it is approved you can do that project. The project should involve data/code that you can obtain, process, analyze, and synthesize yourself. Keep in mind that real scientists make their own data. You may use any of the methods you learn during the course, or any other methods you know/look up etc.

Structure of Class Time

Class will consist of both lectures on statistical methdology and practice, as well as hands activities. The hands on practice will be assigned in advance of each lecture and will give you time to look it over and come up with questions. The plan will be for students to work on the problem and ask questions, followed by the instructor or a chosen student presenting their solution.

Schedule

Day Date Slides Resources
M1 2015-08-31 Introduction, Google Slides Introduction, pdf day_1.R
M1 2015-08-31 Organizing example project
W1 2015-09-02 Introduction, Google Slides
W1 2015-09-02 Version control rmarkdown lab
W1 2015-09-07 Getting data
W2 2015-09-07 Introduction
W2 2015-09-07 Getting data web + api lab
M3 2015-09-14 Introduction
M3 2015-09-14 Tidying data dplyr lab merging lab
W3 2015-09-16 Introduction
W3 2015-09-14 Tidying data regex lab final lab
W4 2015-09-23 Introduction
W4 2015-09-23 Exploratory graphs
M5 2015-05-28 Introduction
M5 2015-09-28 Unsupervised analysis I
M6 2015-10-05 Introduction
M6 2015-10-05 Dimension reduction
W6 2015-10-07 Introduction
W6 2015-10-07 Simulation
M8 2015-10-19 Introduction
M8 2015-10-19 Bootstrapping Stamps example
W8 2015-10-21 Introduction
W8 2015-10-21 Multiple Testing
M9 2015-10-26 Introduction
M9 2015-10-26 Multiple Testing
M10 2015-11-2 Introduction
M10 2015-11-4 Prediction
W10 2015-11-4 Introduction
W10 2015-11-4 Prediction methods
M11 2015-11-9 Introduction
M11 2015-11-9 Prediction methods
W11 2015-11-11 Introduction
W11 2015-11-11 Prediction methods
M12 2015-11-16 Introduction
M12 2015-11-16 Prediction methods Smoothing
W12 2015-11-18 Introduction
W12 2015-11-18 Prediction methods Smoothing
M13 2015-11-23 Introduction
M13 2015-11-23 GAMs EM algorithm EM algorithm R example -html EM algorithm R example -Rmd
M14 2015-12-1 Introduction
M14 2015-12-1 EM algorithm Causality
M15 2015-12-7 Introduction
M15 2015-12-7 Causality
W15 2015-12-9 Introduction
W15 2015-12-9 Causality em algorithm (swirl)
M16 2015-12-14 Introduction
M16 2015-12-14 R packages r package creation
W16 2015-12-16 Introduction
M16 2015-12-14 R packages Wrap-up r package creation

Projects

Project Assigned Date Due Date Link Description
Project 1 2015-09-02 2015-09-14 Project 1 Guns and homicide, email for link
Project 2 2015-09-16 2015-09-27 Project 2 Fitbit activity, email for link
Final Project 2015-10-07 2015-10-16 Final project Please submit via Courseplus
712 Project 1 2015-11-04 2015-11-25 (poster)/ 2015-12-2 (app) Project Please submit via Courseplus/Github
712 Project 1 2015-12-07 2015-12-18 Project Please submit via Courseplus

Miscellaneous

Feel free to submit typos/errors/etc via the github repository associated with the class: https://github.com/jtleek/advdatasci

This web-page is modified from Andrew Jaffe’s Summer 2015 R course, which also has great material if you want to learn R.

This page was last updated on 2015-12-16 13:05:23 Eastern Time.