The Data Scientist's Toolbox


Johns Hopkins Bloomberg School of Public Health

What do data scientists do?

  • Define the question
  • Define the ideal data set
  • Determine what data you can access
  • Obtain the data
  • Clean the data
  • Exploratory data analysis
  • Statistical prediction/modeling
  • Interpret results
  • Challenge results
  • Synthesize/write up results
  • Create reproducible code
  • Distribute results to other people

The main workhorse of data science

Where we will work on coding

Rstudio's interface

Primary file types - R script

Primary file types - R markdown document

Sharing your results - Github & Git

Where to run Github commands - the shell