Provides an intensive introduction to applied statistics and data analysis. Trains students to become data scientists capable of both applied data analysis and critical evaluation of the next generation next generation of statistical methods. Since both data analysis and methods development require substantial hands-on experience, focuses on hands-on data analysis.
Upon successfully completing this course, students will be able to:
Our goal is to return feedback to you on projects within 1 week of submission. This is an intensive process because it requires individualized feedback so we hope you can bear with us if there are minor delays.
I believe the purpose of graduate education is to train you to be able to think for yourself and initiate and complete your own projects. I am super excited to talk to you about ideas, work out solutions with you, and help you to figure out statistical methods and/or data analysis. I don’t think that graduate school grades are important for this purpose. This means that I don’t care very much about graduate student grades.
That being said, I have to give you a grade, so I will use grades to help communicate your progress.
If you receive A’s and B’s you would have passed our old qualifying exam on this project and are doing acceptable data analyses. If you receive C’s that is my way of letting you know that your work would not pass on the qualifying exam and probably isn’t up to speed. I don’t feel comfortable assigning percentages to data analyses, but to be able to calculate grades at the completion of the course I will use the following percentages: A = 100%, B = 85%, C = 75% of available points. This will be based on the rubric for data analyses.
(For more on my project philosophy see: http://bit.ly/wQT5uI)
Each student will be required to perform two data analysis projects during the course of the class. Students will be given 2 weeks to perform each analysis. The project assignments will consist of a scientific description of the problem. Students are responsible for all stages of each data analysis from obtaining the data to the final report. At the conclusion of each analysis each student must turn in:
All documents should be submitted electronically. The grades will be broken down according to the following characterization of your data analysis.
Keep in mind that this is a data science class In some cases standard methodology will be sufficient to answer the question of interest, in some cases you will need to go beyond the course, and in general the goal is to answer the question and provide an estimate of uncertainty. You may speak to your fellow students about specific statistical questions related to the projects, but the overall idea, analysis, and write-up should be your own individual work. You should cite any help you get from fellow students/TAs in your report in standard citation format.
After each data analysis is turned in, they will be randomly assigned to another student for review. Your review will be due one week after it is assigned. Your comments should have the format of a typical peer review. You can find a template and instructions for these reviews here https://github.com/jtleek/reviews. You should include a summary of the analyses and conclusions in the project you are reviewing, any major revisions, and any minor revisions. We will also evaluate each data analysis independently to assign a grade.
The final project will have the same format as the data analyses. It will be slightly longer than the weekly projects in terms of space and more in depth in terms of analysis. You will have an opportunity to submit this analysis, get feedback from the instructors, and re-analyze the data on this project. .
We will give you an option of a final project, but you may also propose a different project to the instructor and as long as it is approved you can do that project. The project should involve data/code that you can obtain, process, analyze, and synthesize yourself. Keep in mind that real scientists make their own data. You may use any of the methods you learn during the course, or any other methods you know/look up etc.
Class will consist of both lectures on statistical methdology and practice, as well as hands activities. The hands on practice will be assigned in advance of each lecture and will give you time to look it over and come up with questions. The plan will be for students to work on the problem and ask questions, followed by the instructor or a chosen student presenting their solution.
|M1||2015-08-31||Introduction, Google Slides Introduction, pdf||day_1.R|
|W1||2015-09-02||Introduction, Google Slides|
|W1||2015-09-02||Version control||rmarkdown lab|
|W2||2015-09-07||Getting data||web + api lab|
|M3||2015-09-14||Tidying data||dplyr lab merging lab|
|W3||2015-09-14||Tidying data||regex lab final lab|
|M5||2015-09-28||Unsupervised analysis I|
|M12||2015-11-16||Prediction methods Smoothing|
|W12||2015-11-18||Prediction methods Smoothing|
|M13||2015-11-23||GAMs EM algorithm||EM algorithm R example -html EM algorithm R example -Rmd|
|M14||2015-12-1||EM algorithm Causality|
|W15||2015-12-9||Causality||em algorithm (swirl)|
|M16||2015-12-14||R packages||r package creation|
|M16||2015-12-14||R packages Wrap-up||r package creation|
|Project||Assigned Date||Due Date||Link||Description|
|Project 1||2015-09-02||2015-09-14||Project 1||Guns and homicide, email for link|
|Project 2||2015-09-16||2015-09-27||Project 2||Fitbit activity, email for link|
|Final Project||2015-10-07||2015-10-16||Final project||Please submit via Courseplus|
|712 Project 1||2015-11-04||2015-11-25 (poster)/ 2015-12-2 (app)||Project||Please submit via Courseplus/Github|
|712 Project 1||2015-12-07||2015-12-18||Project||Please submit via Courseplus|
Feel free to submit typos/errors/etc via the github repository associated with the class: https://github.com/jtleek/advdatasci
This web-page is modified from Andrew Jaffe’s Summer 2015 R course, which also has great material if you want to learn R.
This page was last updated on 2015-12-16 13:05:23 Eastern Time.