Philosophy

I believe the purpose of graduate education is to train you to be able to think for yourself and initiate and complete your own projects. I am super excited to talk to you about ideas, work out solutions with you, and help you to figure out statistical methods and/or data analysis. I don’t think that graduate school grades are important for this purpose. This means that I don’t care very much about graduate student grades.

That being said, I have to give you a grade so they will be:

  1. A - Excellent - 90%+
  2. B - Passing - 80%+
  3. C - Needs improvement - 70%+

If you are getting a grade below a C it is because you basically aren’t trying/working. I rarely give them out.

Relative weights

The percentages will be assigned in the following way:

You get the points for the Datacamp modules as long as you complete them before class starts (no exceptions without prior approval). You get 50% of the points for attendance at labs and 50% for having your current version of the code up-to-date. The data analysis assignment will be graded on a 1-5 scale for each category described below and the percentages assigned as described below.

Assignments

Data analysis assignment

(For more on my project philosophy see: http://bit.ly/wQT5uI)

Each student will be required to perform a data analysis project during the course of the class. Students will have the entire term to perform the data analysis. The project assignments will consist of a scientific description of the problem. Students are responsible for all stages of each data analysis from obtaining the data to the final report. At the conclusion of each analysis each student must turn in:

  • A write-up of their data analysis in a synthesized format, with numbered figures and references. (You may also include supplementary material for detailed additional calculations/analyses)
  • A reproducible Rmd file that produces all of the numbers, figures and results in your write-up.

All documents should be submitted electronically. The grades will be broken down according to the following characterization of your data analysis.

  1. Did you answer the scientific question? (30%)
  2. Did you use appropriate statistical methods? (40%)
  3. Was your write-up simple, clear, and precise? (20%)
  4. Was your code reproducible? (10%)

Keep in mind that this is a data science class. In some cases standard methodology will be sufficient to answer the question of interest, in some cases you will need to go beyond the course, and in general the goal is to answer the question and provide an estimate of uncertainty. You may speak to your fellow students about specific statistical questions related to the projects, but the overall idea, analysis, and write-up should be your own individual work. You should cite any help you get from fellow students/TAs in your report in standard citation format.

Data analysis project options

You are required to pick one of the data analysis options below and perform that analysis over the course of the class.

Option 1

A major disaster is currently underway with hurricane Harvey. Use social media data (Twitter, Instagram, Reddit etc.) to identify times and areas that are hardest hit from the hurricane.

Option 2

We teach thousands of students data analysis in online classes on the Coursera platform. Each of these classes includes a final project. Collect the write-ups people use for the Getting and Cleaning Data project and summarize the main sources of variation in how people complete the project.

Option 3

Perform an analysis of “data scientist” jobs listed on job boards and on the employment pages of major companies. What are the most common skills that employers look for? What are the most unique skills that employers look for? Where are the types of companies that employ the most data scientists?

Option 4

Perform an analysis of the statistical analyses in all published PLoS papers. What are the most common techniques? How do they vary by field? Are there any trends over the last 10-15 years?

Option 5

Compete for the Zillow prize and write up your results: https://www.zillow.com/promo/Zillow-prize/

Option 6

You may petition to do your own analysis. You must submit your petition by the 3rd day of class (Wednesday, September 6). We will let you know before the 4th day of class whether you have approval. The minimum requirements for the project include:

  • You must be obtaining your own raw data
  • You must be doing your own data processing
  • The data must be available to be made public by end of class
  • You must specify your own question you are asking from the data
  • You need to provide reasonable justification you can answer that question with your data.

If you are looking for ideas consider these resources:

Data analysis reviews

To keep you on track, starting in the 2nd week you will bring your current writeup (in .Rmd format or later on in pdf format) to the course lab. The labs will be run by John and Stephen. You will take turns projecting your labs and getting detailed feedback from the instructors and the other students. You will receive credit for being prepared each week to present your data analysis even if you aren’t selected on that day.

Lab attendance and participation are mandatory. This is where you will learn how to write up and perform a data analysis. It is also the best way to get “hands dirty” with the projects people are working on.

The times for the reviews each week will be:

  • Stephen: TBD
  • John: TBD

Datacamp modules

Please see individual lectures for assigned DataCamp modules and due dates. You must sign up with John to get on the Datacamp module list. Please visit the Advanced Data Science link for the assignments