Once you know something there is temptation to be a snob. Don't be.
http://simplystatistics.org/2014/01/28/marie-curie-says-stop-hating-on-quilt-plots-already/
Jeffrey Leek
Johns Hopkins Bloomberg School of Public Health
Once you know something there is temptation to be a snob. Don't be.
http://simplystatistics.org/2014/01/28/marie-curie-says-stop-hating-on-quilt-plots-already/
Statistical inference for exploratory data analysis and model diagnostics
"We propose to furnish visual statistical methods with an inferential framework and protocol, modelled on confirmatory statistical testing. In this framework, plots take on the role of test statistics, and human cognition the role of statistical tests."
http://rsta.royalsocietypublishing.org/content/367/1906/4361.full
http://seanjtaylor.com/post/41463778912/real-scientists-make-their-own-data
http://brianknaus.com/software/srtoolbox/s_4_1_sequence80.txt
http://spectrum.ieee.org/tech-talk/biomedical/imaging/sorting-fmri-with-textmining-software
http://rickosborne.org/blog/2010/02/infographic-migrating-from-sql-to-mapreduce-with-mongodb/
Data are values of qualitative or quantitative variables, belonging to a set of items.
Data are values of qualitative or quantitative variables, belonging to a
http://en.wikipedia.org/wiki/Data
Set of items: Sometimes called the population; the set of objects you are interested in
Data are values of qualitative or quantitative
http://en.wikipedia.org/wiki/Data
Variables: A measurement or characteristic of an item.
Data are values of
http://en.wikipedia.org/wiki/Data
Qualitative: Country of origin, sex, treatment
Quantitative: Height, weight, blood pressure
Raw data
http://en.wikipedia.org/wiki/Raw_data
Processed data
http://www.illumina.com.cn/support/sequencing/sequencing_instruments/hiseq_1000.asp
http://www.cbcb.umd.edu/~hcorrada/CMSC858B/lectures/lect22_seqIntro/seqIntro.pdf
You know the raw data is in the right format if you
Some other important tips
Some other important tips
In some cases it will not be possible to script every step. In that case you should provide instructions like:
getwd()
and setwd()
. setwd("./data")
, setwd("../")
setwd("/Users/jtleek/data/")
setwd("C:\\Users\\Andrew\\Downloads")
file.exists("directoryName")
will check to see if the directory existsdir.create("directoryName")
will create a directory if it doesn't existif(!file.exists("data")){
dir.create("data")
}
https://data.baltimorecity.gov/Transportation/Baltimore-Fixed-Speed-Cameras/dz54-2aru
https://data.baltimorecity.gov/Transportation/Baltimore-Fixed-Speed-Cameras/dz54-2aru
fileUrl <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./data/cameras.csv",method="curl")
list.files("./data")
[1] "cameras.csv"
dateDownloaded <- date()
dateDownloaded
[1] "Tue Feb 4 09:19:23 2014"
cameraData <- read.table("./data/cameras.csv",sep=",",header=TRUE)
head(cameraData)
address direction street crossStreet intersection
1 S CATON AVE & BENSON AVE N/B Caton Ave Benson Ave Caton Ave & Benson Ave
2 S CATON AVE & BENSON AVE S/B Caton Ave Benson Ave Caton Ave & Benson Ave
3 WILKENS AVE & PINE HEIGHTS AVE E/B Wilkens Ave Pine Heights Wilkens Ave & Pine Heights
4 THE ALAMEDA & E 33RD ST S/B The Alameda 33rd St The Alameda & 33rd St
5 E 33RD ST & THE ALAMEDA E/B E 33rd The Alameda E 33rd & The Alameda
6 ERDMAN AVE & N MACON ST E/B Erdman Macon St Erdman & Macon St
Location.1
1 (39.2693779962, -76.6688185297)
2 (39.2693157898, -76.6689698176)
3 (39.2720252302, -76.676960806)
4 (39.3285013141, -76.5953545714)
5 (39.3283410623, -76.5953594625)
6 (39.3068045671, -76.5593167803)
read.csv sets sep="," and header=TRUE
cameraData <- read.csv("./data/cameras.csv")
head(cameraData)
address direction street crossStreet intersection
1 S CATON AVE & BENSON AVE N/B Caton Ave Benson Ave Caton Ave & Benson Ave
2 S CATON AVE & BENSON AVE S/B Caton Ave Benson Ave Caton Ave & Benson Ave
3 WILKENS AVE & PINE HEIGHTS AVE E/B Wilkens Ave Pine Heights Wilkens Ave & Pine Heights
4 THE ALAMEDA & E 33RD ST S/B The Alameda 33rd St The Alameda & 33rd St
5 E 33RD ST & THE ALAMEDA E/B E 33rd The Alameda E 33rd & The Alameda
6 ERDMAN AVE & N MACON ST E/B Erdman Macon St Erdman & Macon St
Location.1
1 (39.2693779962, -76.6688185297)
2 (39.2693157898, -76.6689698176)
3 (39.2720252302, -76.676960806)
4 (39.3285013141, -76.5953545714)
5 (39.3283410623, -76.5953594625)
6 (39.3068045671, -76.5593167803)
In my experience, the biggest trouble with reading flat files are quotation marks ` or " placed in data values, setting quote="" often resolves these.
read.csv sets sep="," and header=TRUE
cameraData <- scan("./data/cameras.csv",sep=",",what="character")
str(cameraData)
chr [1:486] "address" "direction" "street" "crossStreet" "intersection" ...
cameraData <- matrix(cameraData,byrow=TRUE,ncol=6)
Still probably the most widely used format for sharing data
if(!file.exists("data")){dir.create("data")}
fileUrl <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.xlsx?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./data/cameras.xlsx",method="curl")
dateDownloaded <- date()
library(xlsx)
cameraData <- read.xlsx("./data/cameras.xlsx",sheetIndex=1,header=TRUE)
head(cameraData)
address direction street crossStreet intersection
1 S CATON AVE & BENSON AVE N/B Caton Ave Benson Ave Caton Ave & Benson Ave
2 S CATON AVE & BENSON AVE S/B Caton Ave Benson Ave Caton Ave & Benson Ave
3 WILKENS AVE & PINE HEIGHTS AVE E/B Wilkens Ave Pine Heights Wilkens Ave & Pine Heights
4 THE ALAMEDA & E 33RD ST S/B The Alameda 33rd St The Alameda & 33rd St
5 E 33RD ST & THE ALAMEDA E/B E 33rd The Alameda E 33rd & The Alameda
6 ERDMAN AVE & N MACON ST E/B Erdman Macon St Erdman & Macon St
Location.1
1 (39.2693779962, -76.6688185297)
2 (39.2693157898, -76.6689698176)
3 (39.2720252302, -76.676960806)
4 (39.3285013141, -76.5953545714)
5 (39.3283410623, -76.5953594625)
6 (39.3068045671, -76.5593167803)
colIndex <- 2:3
rowIndex <- 1:4
cameraDataSubset <- read.xlsx("./data/cameras.xlsx",sheetIndex=1,
colIndex=colIndex,rowIndex=rowIndex)
cameraDataSubset
direction street
1 N/B Caton Ave
2 S/B Caton Ave
3 E/B Wilkens Ave
<section>
</section>
<line-break />
<Greeting> Hello, world </Greeting>
<img src="jeff.jpg" alt="instructor"/>
<step number="3"> Connect A to B. </step>
library(XML)
fileUrl <- "http://www.w3schools.com/xml/simple.xml"
doc <- xmlTreeParse(fileUrl,useInternal=TRUE)
rootNode <- xmlRoot(doc)
xmlName(rootNode)
[1] "breakfast_menu"
names(rootNode)
food food food food food
"food" "food" "food" "food" "food"
rootNode[[1]]
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
rootNode[[1]][[1]]
<name>Belgian Waffles</name>
xmlSApply(rootNode,xmlValue)
food
"Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650"
food
"Strawberry Belgian Waffles$7.95Light Belgian waffles covered with strawberries and whipped cream900"
food
"Berry-Berry Belgian Waffles$8.95Light Belgian waffles covered with an assortment of fresh berries and whipped cream900"
food
"French Toast$4.50Thick slices made from our homemade sourdough bread600"
food
"Homestyle Breakfast$6.95Two eggs, bacon or sausage, toast, and our ever-popular hash browns950"
xmlSApply(rootNode,xmlValue)
food
"Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650"
food
"Strawberry Belgian Waffles$7.95Light Belgian waffles covered with strawberries and whipped cream900"
food
"Berry-Berry Belgian Waffles$8.95Light Belgian waffles covered with an assortment of fresh berries and whipped cream900"
food
"French Toast$4.50Thick slices made from our homemade sourdough bread600"
food
"Homestyle Breakfast$6.95Two eggs, bacon or sausage, toast, and our ever-popular hash browns950"
Information from: http://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/XML.pdf
xpathSApply(rootNode,"//name",xmlValue)
[1] "Belgian Waffles" "Strawberry Belgian Waffles" "Berry-Berry Belgian Waffles"
[4] "French Toast" "Homestyle Breakfast"
xpathSApply(rootNode,"//price",xmlValue)
[1] "$5.95" "$7.95" "$8.95" "$4.50" "$6.95"
fileUrl <- "http://espn.go.com/nfl/team/_/name/bal/baltimore-ravens"
doc <- htmlTreeParse(fileUrl,useInternal=TRUE)
scores <- xpathSApply(doc,"//li[@class='score']",xmlValue)
teams <- xpathSApply(doc,"//li[@class='team-name']",xmlValue)
scores
[1] "49-27" "14-6" "30-9" "23-20" "26-23" "19-17" "19-16" "24-18"
[9] "20-17 OT" "23-20 OT" "19-3" "22-20" "29-26" "18-16" "41-7" "34-17"
teams
[1] "Denver" "Cleveland" "Houston" "Buffalo" "Miami" "Green Bay"
[7] "Pittsburgh" "Cleveland" "Cincinnati" "Chicago" "New York" "Pittsburgh"
[13] "Minnesota" "Detroit" "New England" "Cincinnati"
library(jsonlite)
jsonData <- fromJSON("https://api.github.com/users/jtleek/repos")
names(jsonData)
[1] "id" "name" "full_name" "owner"
[5] "private" "html_url" "description" "fork"
[9] "url" "forks_url" "keys_url" "collaborators_url"
[13] "teams_url" "hooks_url" "issue_events_url" "events_url"
[17] "assignees_url" "branches_url" "tags_url" "blobs_url"
[21] "git_tags_url" "git_refs_url" "trees_url" "statuses_url"
[25] "languages_url" "stargazers_url" "contributors_url" "subscribers_url"
[29] "subscription_url" "commits_url" "git_commits_url" "comments_url"
[33] "issue_comment_url" "contents_url" "compare_url" "merges_url"
[37] "archive_url" "downloads_url" "issues_url" "pulls_url"
[41] "milestones_url" "notifications_url" "labels_url" "releases_url"
[45] "created_at" "updated_at" "pushed_at" "git_url"
[49] "ssh_url" "clone_url" "svn_url" "homepage"
[53] "size" "stargazers_count" "watchers_count" "language"
[57] "has_issues" "has_downloads" "has_wiki" "forks_count"
[61] "mirror_url" "open_issues_count" "forks" "open_issues"
[65] "watchers" "default_branch" "master_branch"
jsonData$name
[1] "ballgown" "dataanalysis" "datascientist" "datasharing" "derfinder"
[6] "derfinder-1" "DSM" "EDA-Project" "futureofstats" "googleCite"
[11] "graduate" "healthvis" "jhsph753" "jhsph753and4" "leekasso"
[16] "modules" "portfolio" "rdsmGeneSig" "reviews" "rfitbit"
[21] "rpackages" "sva" "swfdr" "talks" "testrepository"
[26] "tornado" "tsp-devel" "tspreg"
names(jsonData$owner)
[1] "login" "id" "avatar_url" "gravatar_id"
[5] "url" "html_url" "followers_url" "following_url"
[9] "gists_url" "starred_url" "subscriptions_url" "organizations_url"
[13] "repos_url" "events_url" "received_events_url" "type"
[17] "site_admin"
jsonData$owner$login
[1] "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek"
[11] "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek"
[21] "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek" "jtleek"
myjson <- toJSON(iris, pretty=TRUE)
cat(myjson)
[
{
"Sepal.Length" : 5.1,
"Sepal.Width" : 3.5,
"Petal.Length" : 1.4,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.9,
"Sepal.Width" : 3,
"Petal.Length" : 1.4,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.7,
"Sepal.Width" : 3.2,
"Petal.Length" : 1.3,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.6,
"Sepal.Width" : 3.1,
"Petal.Length" : 1.5,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 3.6,
"Petal.Length" : 1.4,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.4,
"Sepal.Width" : 3.9,
"Petal.Length" : 1.7,
"Petal.Width" : 0.4,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.6,
"Sepal.Width" : 3.4,
"Petal.Length" : 1.4,
"Petal.Width" : 0.3,
"Species" : "setosa"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 3.4,
"Petal.Length" : 1.5,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.4,
"Sepal.Width" : 2.9,
"Petal.Length" : 1.4,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.9,
"Sepal.Width" : 3.1,
"Petal.Length" : 1.5,
"Petal.Width" : 0.1,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.4,
"Sepal.Width" : 3.7,
"Petal.Length" : 1.5,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.8,
"Sepal.Width" : 3.4,
"Petal.Length" : 1.6,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.8,
"Sepal.Width" : 3,
"Petal.Length" : 1.4,
"Petal.Width" : 0.1,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.3,
"Sepal.Width" : 3,
"Petal.Length" : 1.1,
"Petal.Width" : 0.1,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.8,
"Sepal.Width" : 4,
"Petal.Length" : 1.2,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.7,
"Sepal.Width" : 4.4,
"Petal.Length" : 1.5,
"Petal.Width" : 0.4,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.4,
"Sepal.Width" : 3.9,
"Petal.Length" : 1.3,
"Petal.Width" : 0.4,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.1,
"Sepal.Width" : 3.5,
"Petal.Length" : 1.4,
"Petal.Width" : 0.3,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.7,
"Sepal.Width" : 3.8,
"Petal.Length" : 1.7,
"Petal.Width" : 0.3,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.1,
"Sepal.Width" : 3.8,
"Petal.Length" : 1.5,
"Petal.Width" : 0.3,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.4,
"Sepal.Width" : 3.4,
"Petal.Length" : 1.7,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.1,
"Sepal.Width" : 3.7,
"Petal.Length" : 1.5,
"Petal.Width" : 0.4,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.6,
"Sepal.Width" : 3.6,
"Petal.Length" : 1,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.1,
"Sepal.Width" : 3.3,
"Petal.Length" : 1.7,
"Petal.Width" : 0.5,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.8,
"Sepal.Width" : 3.4,
"Petal.Length" : 1.9,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 3,
"Petal.Length" : 1.6,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 3.4,
"Petal.Length" : 1.6,
"Petal.Width" : 0.4,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.2,
"Sepal.Width" : 3.5,
"Petal.Length" : 1.5,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.2,
"Sepal.Width" : 3.4,
"Petal.Length" : 1.4,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.7,
"Sepal.Width" : 3.2,
"Petal.Length" : 1.6,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.8,
"Sepal.Width" : 3.1,
"Petal.Length" : 1.6,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.4,
"Sepal.Width" : 3.4,
"Petal.Length" : 1.5,
"Petal.Width" : 0.4,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.2,
"Sepal.Width" : 4.1,
"Petal.Length" : 1.5,
"Petal.Width" : 0.1,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.5,
"Sepal.Width" : 4.2,
"Petal.Length" : 1.4,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.9,
"Sepal.Width" : 3.1,
"Petal.Length" : 1.5,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 3.2,
"Petal.Length" : 1.2,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.5,
"Sepal.Width" : 3.5,
"Petal.Length" : 1.3,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.9,
"Sepal.Width" : 3.6,
"Petal.Length" : 1.4,
"Petal.Width" : 0.1,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.4,
"Sepal.Width" : 3,
"Petal.Length" : 1.3,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.1,
"Sepal.Width" : 3.4,
"Petal.Length" : 1.5,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 3.5,
"Petal.Length" : 1.3,
"Petal.Width" : 0.3,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.5,
"Sepal.Width" : 2.3,
"Petal.Length" : 1.3,
"Petal.Width" : 0.3,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.4,
"Sepal.Width" : 3.2,
"Petal.Length" : 1.3,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 3.5,
"Petal.Length" : 1.6,
"Petal.Width" : 0.6,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.1,
"Sepal.Width" : 3.8,
"Petal.Length" : 1.9,
"Petal.Width" : 0.4,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.8,
"Sepal.Width" : 3,
"Petal.Length" : 1.4,
"Petal.Width" : 0.3,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.1,
"Sepal.Width" : 3.8,
"Petal.Length" : 1.6,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 4.6,
"Sepal.Width" : 3.2,
"Petal.Length" : 1.4,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5.3,
"Sepal.Width" : 3.7,
"Petal.Length" : 1.5,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 3.3,
"Petal.Length" : 1.4,
"Petal.Width" : 0.2,
"Species" : "setosa"
},
{
"Sepal.Length" : 7,
"Sepal.Width" : 3.2,
"Petal.Length" : 4.7,
"Petal.Width" : 1.4,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.4,
"Sepal.Width" : 3.2,
"Petal.Length" : 4.5,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.9,
"Sepal.Width" : 3.1,
"Petal.Length" : 4.9,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.5,
"Sepal.Width" : 2.3,
"Petal.Length" : 4,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.5,
"Sepal.Width" : 2.8,
"Petal.Length" : 4.6,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.7,
"Sepal.Width" : 2.8,
"Petal.Length" : 4.5,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.3,
"Sepal.Width" : 3.3,
"Petal.Length" : 4.7,
"Petal.Width" : 1.6,
"Species" : "versicolor"
},
{
"Sepal.Length" : 4.9,
"Sepal.Width" : 2.4,
"Petal.Length" : 3.3,
"Petal.Width" : 1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.6,
"Sepal.Width" : 2.9,
"Petal.Length" : 4.6,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.2,
"Sepal.Width" : 2.7,
"Petal.Length" : 3.9,
"Petal.Width" : 1.4,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 2,
"Petal.Length" : 3.5,
"Petal.Width" : 1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.9,
"Sepal.Width" : 3,
"Petal.Length" : 4.2,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6,
"Sepal.Width" : 2.2,
"Petal.Length" : 4,
"Petal.Width" : 1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.1,
"Sepal.Width" : 2.9,
"Petal.Length" : 4.7,
"Petal.Width" : 1.4,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.6,
"Sepal.Width" : 2.9,
"Petal.Length" : 3.6,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.7,
"Sepal.Width" : 3.1,
"Petal.Length" : 4.4,
"Petal.Width" : 1.4,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.6,
"Sepal.Width" : 3,
"Petal.Length" : 4.5,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.8,
"Sepal.Width" : 2.7,
"Petal.Length" : 4.1,
"Petal.Width" : 1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.2,
"Sepal.Width" : 2.2,
"Petal.Length" : 4.5,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.6,
"Sepal.Width" : 2.5,
"Petal.Length" : 3.9,
"Petal.Width" : 1.1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.9,
"Sepal.Width" : 3.2,
"Petal.Length" : 4.8,
"Petal.Width" : 1.8,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.1,
"Sepal.Width" : 2.8,
"Petal.Length" : 4,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.3,
"Sepal.Width" : 2.5,
"Petal.Length" : 4.9,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.1,
"Sepal.Width" : 2.8,
"Petal.Length" : 4.7,
"Petal.Width" : 1.2,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.4,
"Sepal.Width" : 2.9,
"Petal.Length" : 4.3,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.6,
"Sepal.Width" : 3,
"Petal.Length" : 4.4,
"Petal.Width" : 1.4,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.8,
"Sepal.Width" : 2.8,
"Petal.Length" : 4.8,
"Petal.Width" : 1.4,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.7,
"Sepal.Width" : 3,
"Petal.Length" : 5,
"Petal.Width" : 1.7,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6,
"Sepal.Width" : 2.9,
"Petal.Length" : 4.5,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.7,
"Sepal.Width" : 2.6,
"Petal.Length" : 3.5,
"Petal.Width" : 1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.5,
"Sepal.Width" : 2.4,
"Petal.Length" : 3.8,
"Petal.Width" : 1.1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.5,
"Sepal.Width" : 2.4,
"Petal.Length" : 3.7,
"Petal.Width" : 1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.8,
"Sepal.Width" : 2.7,
"Petal.Length" : 3.9,
"Petal.Width" : 1.2,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6,
"Sepal.Width" : 2.7,
"Petal.Length" : 5.1,
"Petal.Width" : 1.6,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.4,
"Sepal.Width" : 3,
"Petal.Length" : 4.5,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6,
"Sepal.Width" : 3.4,
"Petal.Length" : 4.5,
"Petal.Width" : 1.6,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.7,
"Sepal.Width" : 3.1,
"Petal.Length" : 4.7,
"Petal.Width" : 1.5,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.3,
"Sepal.Width" : 2.3,
"Petal.Length" : 4.4,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.6,
"Sepal.Width" : 3,
"Petal.Length" : 4.1,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.5,
"Sepal.Width" : 2.5,
"Petal.Length" : 4,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.5,
"Sepal.Width" : 2.6,
"Petal.Length" : 4.4,
"Petal.Width" : 1.2,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.1,
"Sepal.Width" : 3,
"Petal.Length" : 4.6,
"Petal.Width" : 1.4,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.8,
"Sepal.Width" : 2.6,
"Petal.Length" : 4,
"Petal.Width" : 1.2,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5,
"Sepal.Width" : 2.3,
"Petal.Length" : 3.3,
"Petal.Width" : 1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.6,
"Sepal.Width" : 2.7,
"Petal.Length" : 4.2,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.7,
"Sepal.Width" : 3,
"Petal.Length" : 4.2,
"Petal.Width" : 1.2,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.7,
"Sepal.Width" : 2.9,
"Petal.Length" : 4.2,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.2,
"Sepal.Width" : 2.9,
"Petal.Length" : 4.3,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.1,
"Sepal.Width" : 2.5,
"Petal.Length" : 3,
"Petal.Width" : 1.1,
"Species" : "versicolor"
},
{
"Sepal.Length" : 5.7,
"Sepal.Width" : 2.8,
"Petal.Length" : 4.1,
"Petal.Width" : 1.3,
"Species" : "versicolor"
},
{
"Sepal.Length" : 6.3,
"Sepal.Width" : 3.3,
"Petal.Length" : 6,
"Petal.Width" : 2.5,
"Species" : "virginica"
},
{
"Sepal.Length" : 5.8,
"Sepal.Width" : 2.7,
"Petal.Length" : 5.1,
"Petal.Width" : 1.9,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.1,
"Sepal.Width" : 3,
"Petal.Length" : 5.9,
"Petal.Width" : 2.1,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.3,
"Sepal.Width" : 2.9,
"Petal.Length" : 5.6,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.5,
"Sepal.Width" : 3,
"Petal.Length" : 5.8,
"Petal.Width" : 2.2,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.6,
"Sepal.Width" : 3,
"Petal.Length" : 6.6,
"Petal.Width" : 2.1,
"Species" : "virginica"
},
{
"Sepal.Length" : 4.9,
"Sepal.Width" : 2.5,
"Petal.Length" : 4.5,
"Petal.Width" : 1.7,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.3,
"Sepal.Width" : 2.9,
"Petal.Length" : 6.3,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.7,
"Sepal.Width" : 2.5,
"Petal.Length" : 5.8,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.2,
"Sepal.Width" : 3.6,
"Petal.Length" : 6.1,
"Petal.Width" : 2.5,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.5,
"Sepal.Width" : 3.2,
"Petal.Length" : 5.1,
"Petal.Width" : 2,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.4,
"Sepal.Width" : 2.7,
"Petal.Length" : 5.3,
"Petal.Width" : 1.9,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.8,
"Sepal.Width" : 3,
"Petal.Length" : 5.5,
"Petal.Width" : 2.1,
"Species" : "virginica"
},
{
"Sepal.Length" : 5.7,
"Sepal.Width" : 2.5,
"Petal.Length" : 5,
"Petal.Width" : 2,
"Species" : "virginica"
},
{
"Sepal.Length" : 5.8,
"Sepal.Width" : 2.8,
"Petal.Length" : 5.1,
"Petal.Width" : 2.4,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.4,
"Sepal.Width" : 3.2,
"Petal.Length" : 5.3,
"Petal.Width" : 2.3,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.5,
"Sepal.Width" : 3,
"Petal.Length" : 5.5,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.7,
"Sepal.Width" : 3.8,
"Petal.Length" : 6.7,
"Petal.Width" : 2.2,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.7,
"Sepal.Width" : 2.6,
"Petal.Length" : 6.9,
"Petal.Width" : 2.3,
"Species" : "virginica"
},
{
"Sepal.Length" : 6,
"Sepal.Width" : 2.2,
"Petal.Length" : 5,
"Petal.Width" : 1.5,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.9,
"Sepal.Width" : 3.2,
"Petal.Length" : 5.7,
"Petal.Width" : 2.3,
"Species" : "virginica"
},
{
"Sepal.Length" : 5.6,
"Sepal.Width" : 2.8,
"Petal.Length" : 4.9,
"Petal.Width" : 2,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.7,
"Sepal.Width" : 2.8,
"Petal.Length" : 6.7,
"Petal.Width" : 2,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.3,
"Sepal.Width" : 2.7,
"Petal.Length" : 4.9,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.7,
"Sepal.Width" : 3.3,
"Petal.Length" : 5.7,
"Petal.Width" : 2.1,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.2,
"Sepal.Width" : 3.2,
"Petal.Length" : 6,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.2,
"Sepal.Width" : 2.8,
"Petal.Length" : 4.8,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.1,
"Sepal.Width" : 3,
"Petal.Length" : 4.9,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.4,
"Sepal.Width" : 2.8,
"Petal.Length" : 5.6,
"Petal.Width" : 2.1,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.2,
"Sepal.Width" : 3,
"Petal.Length" : 5.8,
"Petal.Width" : 1.6,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.4,
"Sepal.Width" : 2.8,
"Petal.Length" : 6.1,
"Petal.Width" : 1.9,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.9,
"Sepal.Width" : 3.8,
"Petal.Length" : 6.4,
"Petal.Width" : 2,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.4,
"Sepal.Width" : 2.8,
"Petal.Length" : 5.6,
"Petal.Width" : 2.2,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.3,
"Sepal.Width" : 2.8,
"Petal.Length" : 5.1,
"Petal.Width" : 1.5,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.1,
"Sepal.Width" : 2.6,
"Petal.Length" : 5.6,
"Petal.Width" : 1.4,
"Species" : "virginica"
},
{
"Sepal.Length" : 7.7,
"Sepal.Width" : 3,
"Petal.Length" : 6.1,
"Petal.Width" : 2.3,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.3,
"Sepal.Width" : 3.4,
"Petal.Length" : 5.6,
"Petal.Width" : 2.4,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.4,
"Sepal.Width" : 3.1,
"Petal.Length" : 5.5,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 6,
"Sepal.Width" : 3,
"Petal.Length" : 4.8,
"Petal.Width" : 1.8,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.9,
"Sepal.Width" : 3.1,
"Petal.Length" : 5.4,
"Petal.Width" : 2.1,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.7,
"Sepal.Width" : 3.1,
"Petal.Length" : 5.6,
"Petal.Width" : 2.4,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.9,
"Sepal.Width" : 3.1,
"Petal.Length" : 5.1,
"Petal.Width" : 2.3,
"Species" : "virginica"
},
{
"Sepal.Length" : 5.8,
"Sepal.Width" : 2.7,
"Petal.Length" : 5.1,
"Petal.Width" : 1.9,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.8,
"Sepal.Width" : 3.2,
"Petal.Length" : 5.9,
"Petal.Width" : 2.3,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.7,
"Sepal.Width" : 3.3,
"Petal.Length" : 5.7,
"Petal.Width" : 2.5,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.7,
"Sepal.Width" : 3,
"Petal.Length" : 5.2,
"Petal.Width" : 2.3,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.3,
"Sepal.Width" : 2.5,
"Petal.Length" : 5,
"Petal.Width" : 1.9,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.5,
"Sepal.Width" : 3,
"Petal.Length" : 5.2,
"Petal.Width" : 2,
"Species" : "virginica"
},
{
"Sepal.Length" : 6.2,
"Sepal.Width" : 3.4,
"Petal.Length" : 5.4,
"Petal.Width" : 2.3,
"Species" : "virginica"
},
{
"Sepal.Length" : 5.9,
"Sepal.Width" : 3,
"Petal.Length" : 5.1,
"Petal.Width" : 1.8,
"Species" : "virginica"
}
]
http://www.r-bloggers.com/new-package-jsonlite-a-smarter-json-encoderdecoder/
iris2 <- fromJSON(myjson)
head(iris2)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
http://www.r-bloggers.com/new-package-jsonlite-a-smarter-json-encoderdecoder/
install.packages("RMySQL")
library(RMySQL)
ucscDb <- dbConnect(MySQL(),user="genome",
host="genome-mysql.cse.ucsc.edu")
result <- dbGetQuery(ucscDb,"show databases;"); dbDisconnect(ucscDb);
[1] TRUE
result
Database
1 information_schema
2 ailMel1
3 allMis1
4 anoCar1
5 anoCar2
6 anoGam1
7 apiMel1
8 apiMel2
9 aplCal1
10 bosTau2
11 bosTau3
12 bosTau4
13 bosTau5
14 bosTau6
15 bosTau7
16 bosTauMd3
17 braFlo1
18 caeJap1
19 caePb1
20 caePb2
21 caeRem2
22 caeRem3
23 calJac1
24 calJac3
25 canFam1
26 canFam2
27 canFam3
28 cavPor3
29 cb1
30 cb3
31 ce10
32 ce2
33 ce4
34 ce6
35 cerSim1
36 choHof1
37 chrPic1
38 ci1
39 ci2
40 danRer1
41 danRer2
42 danRer3
43 danRer4
44 danRer5
45 danRer6
46 danRer7
47 dasNov3
48 dipOrd1
49 dm1
50 dm2
51 dm3
52 dp2
53 dp3
54 droAna1
55 droAna2
56 droEre1
57 droGri1
58 droMoj1
59 droMoj2
60 droPer1
61 droSec1
62 droSim1
63 droVir1
64 droVir2
65 droYak1
66 droYak2
67 echTel1
68 echTel2
69 equCab1
70 equCab2
71 eriEur1
72 felCat3
73 felCat4
74 felCat5
75 fr1
76 fr2
77 fr3
78 gadMor1
79 galGal2
80 galGal3
81 galGal4
82 gasAcu1
83 geoFor1
84 go
85 go080130
86 gorGor3
87 hetGla1
88 hetGla2
89 hg16
90 hg17
91 hg18
92 hg19
93 hg19Patch10
94 hg19Patch2
95 hg19Patch5
96 hg19Patch9
97 hgFixed
98 hgTemp
99 hgcentral
100 latCha1
101 loxAfr3
102 macEug1
103 macEug2
104 melGal1
105 melUnd1
106 micMur1
107 mm10
108 mm10Patch1
109 mm5
110 mm6
111 mm7
112 mm8
113 mm9
114 monDom1
115 monDom4
116 monDom5
117 musFur1
118 myoLuc2
119 nomLeu1
120 nomLeu2
121 nomLeu3
122 ochPri2
123 oreNil1
124 oreNil2
125 ornAna1
126 oryCun2
127 oryLat2
128 otoGar3
129 oviAri1
130 oviAri3
131 panTro1
132 panTro2
133 panTro3
134 panTro4
135 papAnu2
136 papHam1
137 performance_schema
138 petMar1
139 petMar2
140 ponAbe2
141 priPac1
142 proCap1
143 proteins120806
144 proteins121210
145 proteome
146 pteVam1
147 rheMac1
148 rheMac2
149 rheMac3
150 rn3
151 rn4
152 rn5
153 sacCer1
154 sacCer2
155 sacCer3
156 saiBol1
157 sarHar1
158 sorAra1
159 sp120323
160 sp121210
161 speTri2
162 strPur1
163 strPur2
164 susScr2
165 susScr3
166 taeGut1
167 tarSyr1
168 test
169 tetNig1
170 tetNig2
171 triMan1
172 tupBel1
173 turTru2
174 uniProt
175 vicPac1
176 vicPac2
177 visiGene
178 xenTro1
179 xenTro2
180 xenTro3
hg19 <- dbConnect(MySQL(),user="genome", db="hg19",
host="genome-mysql.cse.ucsc.edu")
allTables <- dbListTables(hg19)
length(allTables)
[1] 10968
allTables[1:5]
[1] "HInv" "HInvGeneMrna" "acembly" "acemblyClass" "acemblyPep"
dbListFields(hg19,"affyU133Plus2")
[1] "bin" "matches" "misMatches" "repMatches" "nCount" "qNumInsert"
[7] "qBaseInsert" "tNumInsert" "tBaseInsert" "strand" "qName" "qSize"
[13] "qStart" "qEnd" "tName" "tSize" "tStart" "tEnd"
[19] "blockCount" "blockSizes" "qStarts" "tStarts"
dbGetQuery(hg19, "select count(*) from affyU133Plus2")
count(*)
1 58463
affyData <- dbReadTable(hg19, "affyU133Plus2")
head(affyData)
bin matches misMatches repMatches nCount qNumInsert qBaseInsert tNumInsert tBaseInsert strand
1 585 530 4 0 23 3 41 3 898 -
2 585 3355 17 0 109 9 67 9 11621 -
3 585 4156 14 0 83 16 18 2 93 -
4 585 4667 9 0 68 21 42 3 5743 -
5 585 5180 14 0 167 10 38 1 29 -
6 585 468 5 0 14 0 0 0 0 -
qName qSize qStart qEnd tName tSize tStart tEnd blockCount
1 225995_x_at 637 5 603 chr1 249250621 14361 15816 5
2 225035_x_at 3635 0 3548 chr1 249250621 14381 29483 17
3 226340_x_at 4318 3 4274 chr1 249250621 14399 18745 18
4 1557034_s_at 4834 48 4834 chr1 249250621 14406 24893 23
5 231811_at 5399 0 5399 chr1 249250621 19688 25078 11
6 236841_at 487 0 487 chr1 249250621 27542 28029 1
blockSizes
1 93,144,229,70,21,
2 73,375,71,165,303,360,198,661,201,1,260,250,74,73,98,155,163,
3 690,10,32,33,376,4,5,15,5,11,7,41,277,859,141,51,443,1253,
4 99,352,286,24,49,14,6,5,8,149,14,44,98,12,10,355,837,59,8,1500,133,624,58,
5 131,26,1300,6,4,11,4,7,358,3359,155,
6 487,
qStarts
1 34,132,278,541,611,
2 87,165,540,647,818,1123,1484,1682,2343,2545,2546,2808,3058,3133,3206,3317,3472,
3 44,735,746,779,813,1190,1195,1201,1217,1223,1235,1243,1285,1564,2423,2565,2617,3062,
4 0,99,452,739,764,814,829,836,842,851,1001,1016,1061,1160,1173,1184,1540,2381,2441,2450,3951,4103,4728,
5 0,132,159,1460,1467,1472,1484,1489,1497,1856,5244,
6 0,
tStarts
1 14361,14454,14599,14968,15795,
2 14381,14454,14969,15075,15240,15543,15903,16104,16853,17054,17232,17492,17914,17988,18267,24736,29320,
3 14399,15089,15099,15131,15164,15540,15544,15549,15564,15569,15580,15587,15628,15906,16857,16998,17049,17492,
4 14406,20227,20579,20865,20889,20938,20952,20958,20963,20971,21120,21134,21178,21276,21288,21298,21653,22492,22551,22559,24059,24211,24835,
5 19688,19819,19845,21145,21151,21155,21166,21170,21177,21535,24923,
6 27542,
query <- dbSendQuery(hg19, "select * from affyU133Plus2 where misMatches between 1 and 3")
affyMis <- fetch(query); quantile(affyMis$misMatches)
0% 25% 50% 75% 100%
1 1 2 2 3
affyMisSmall <- fetch(query,n=10); dbClearResult(query);
[1] TRUE
dim(affyMisSmall)
[1] 10 22
dbDisconnect(hg19)
[1] TRUE
myapp = oauth_app("twitter",
key="yourConsumerKeyHere",secret="yourConsumerSecretHere")
sig = sign_oauth1.0(myapp,
token = "yourTokenHere",
token_secret = "yourTokenSecretHere")
homeTL = GET("https://api.twitter.com/1.1/statuses/home_timeline.json", sig)
json1 = content(homeTL)
json2 = jsonlite::fromJSON(toJSON(json1))
json2[1,1:4]
created_at id id_str
1 Mon Jan 13 05:18:04 +0000 2014 4.225984e+17 422598398940684288
text
1 Now that P. Norvig's regex golf IPython notebook hit Slashdot, let's see if our traffic spike tops the previous one: http://t.co/Vc6JhZXOo8
GET
, POST
, PUT
, DELETE
requests if you are authorizedlibrary(data.table)
DF = data.frame(x=rnorm(9),y=rep(c("a","b","c"),each=3),z=rnorm(9))
head(DF,3)
x y z
1 -0.9690 a -0.6361
2 0.8895 a 0.4879
3 -1.2139 a -0.1279
DT = data.table(x=rnorm(9),y=rep(c("a","b","c"),each=3),z=rnorm(9))
head(DT,3)
x y z
1: -1.8915 a -0.06235
2: -0.4859 a 1.37020
3: 0.2364 a -0.63040
tables()
NAME NROW MB COLS KEY
[1,] DT 9 1 x,y,z
Total: 1MB
DT[2,]
x y z
1: -0.4859 a 1.37
DT[DT$y=="a",]
x y z
1: -1.8915 a -0.06235
2: -0.4859 a 1.37020
3: 0.2364 a -0.63040
DT[c(2,3)]
x y z
1: -0.4859 a 1.3702
2: 0.2364 a -0.6304
DT[,c(2,3)]
[1] 2 3
{
x = 1
y = 2
}
k = {print(10); 5}
[1] 10
print(k)
[1] 5
DT = data.table(x=rnorm(9),y=rep(c("a","b","c"),each=3),z=rnorm(9))
DT[,list(mean(x),sum(z))]
V1 V2
1: -0.5263 -1.42
DT[,table(y)]
y
a b c
3 3 3
DT[,w:=z^2]
x y z w
1: -1.23685 a 0.06448 0.004157
2: 0.94369 a 0.95965 0.920931
3: -0.79338 a -1.35306 1.830771
4: 0.06876 b 0.21998 0.048390
5: -2.52831 b 0.77248 0.596728
6: -0.25771 b -2.13156 4.543566
7: 0.44889 c -0.44320 0.196423
8: -0.68266 c 1.08423 1.175562
9: -0.69946 c -0.59315 0.351831
DT2 <- DT
DT[, y:= 2]
x y z w
1: -1.23685 2 0.06448 0.004157
2: 0.94369 2 0.95965 0.920931
3: -0.79338 2 -1.35306 1.830771
4: 0.06876 2 0.21998 0.048390
5: -2.52831 2 0.77248 0.596728
6: -0.25771 2 -2.13156 4.543566
7: 0.44889 2 -0.44320 0.196423
8: -0.68266 2 1.08423 1.175562
9: -0.69946 2 -0.59315 0.351831
head(DT,n=3)
x y z w
1: -1.2369 2 0.06448 0.004157
2: 0.9437 2 0.95965 0.920931
3: -0.7934 2 -1.35306 1.830771
head(DT2,n=3)
x y z w
1: -1.2369 2 0.06448 0.004157
2: 0.9437 2 0.95965 0.920931
3: -0.7934 2 -1.35306 1.830771
DT[,m:= {tmp <- (x+z); log2(tmp+5)}]
x y z w m
1: -1.23685 2 0.06448 0.004157 1.936
2: 0.94369 2 0.95965 0.920931 2.787
3: -0.79338 2 -1.35306 1.830771 1.513
4: 0.06876 2 0.21998 0.048390 2.403
5: -2.52831 2 0.77248 0.596728 1.698
6: -0.25771 2 -2.13156 4.543566 1.384
7: 0.44889 2 -0.44320 0.196423 2.324
8: -0.68266 2 1.08423 1.175562 2.433
9: -0.69946 2 -0.59315 0.351831 1.890
DT[,a:=x>0]
x y z w m a
1: -1.23685 2 0.06448 0.004157 1.936 FALSE
2: 0.94369 2 0.95965 0.920931 2.787 TRUE
3: -0.79338 2 -1.35306 1.830771 1.513 FALSE
4: 0.06876 2 0.21998 0.048390 2.403 TRUE
5: -2.52831 2 0.77248 0.596728 1.698 FALSE
6: -0.25771 2 -2.13156 4.543566 1.384 FALSE
7: 0.44889 2 -0.44320 0.196423 2.324 TRUE
8: -0.68266 2 1.08423 1.175562 2.433 FALSE
9: -0.69946 2 -0.59315 0.351831 1.890 FALSE
DT[,b:= mean(x+w),by=a]
x y z w m a b
1: -1.23685 2 0.06448 0.004157 1.936 FALSE 0.3840
2: 0.94369 2 0.95965 0.920931 2.787 TRUE 0.8757
3: -0.79338 2 -1.35306 1.830771 1.513 FALSE 0.3840
4: 0.06876 2 0.21998 0.048390 2.403 TRUE 0.8757
5: -2.52831 2 0.77248 0.596728 1.698 FALSE 0.3840
6: -0.25771 2 -2.13156 4.543566 1.384 FALSE 0.3840
7: 0.44889 2 -0.44320 0.196423 2.324 TRUE 0.8757
8: -0.68266 2 1.08423 1.175562 2.433 FALSE 0.3840
9: -0.69946 2 -0.59315 0.351831 1.890 FALSE 0.3840
.N
An integer, length 1, containing the numbe r
set.seed(123);
DT <- data.table(x=sample(letters[1:3], 1E5, TRUE))
DT[, .N, by=x]
x N
1: a 33387
2: c 33201
3: b 33412
DT <- data.table(x=rep(c("a","b","c"),each=100), y=rnorm(300))
setkey(DT, x)
DT['a']
x y
1: a 0.25959
2: a 0.91751
3: a -0.72232
4: a -0.80828
5: a -0.14135
6: a 2.25701
7: a -2.37955
8: a -0.45425
9: a -0.06007
10: a 0.86090
11: a -1.78466
12: a -0.13074
13: a -0.36984
14: a -0.18066
15: a -1.04973
16: a 0.37832
17: a -1.37079
18: a -0.31612
19: a 0.39435
20: a -1.68988
21: a -1.46234
22: a 2.55838
23: a 0.08789
24: a 1.73141
25: a 1.21513
26: a 0.29954
27: a -0.17246
28: a 1.13250
29: a 0.02320
30: a 1.33587
31: a -1.09879
32: a -0.58176
33: a 0.03892
34: a 1.07315
35: a 1.34970
36: a 1.19528
37: a -0.02218
38: a 0.69849
39: a 0.67241
40: a -0.79165
41: a -0.21791
42: a 0.02307
43: a 0.11539
44: a -0.27708
45: a 0.03688
46: a 0.47520
47: a 1.70749
48: a 1.07601
49: a -1.34571
50: a -1.44025
51: a -0.39393
52: a 0.58106
53: a -0.17079
54: a -0.90585
55: a 0.15621
56: a -0.37323
57: a -0.34587
58: a -0.35829
59: a -0.13307
60: a -0.08960
61: a 0.62793
62: a -1.42883
63: a 0.17255
64: a -0.79115
65: a 1.26204
66: a -0.26941
67: a 0.15698
68: a -0.76060
69: a 1.37060
70: a 0.03758
71: a 0.44949
72: a 2.78869
73: a -0.46849
74: a 1.01261
75: a -0.04374
76: a 1.40670
77: a 0.41993
78: a 0.31009
79: a 1.11905
80: a -1.29814
81: a -1.28248
82: a 1.65943
83: a 0.78375
84: a 0.57771
85: a -0.26725
86: a -0.64569
87: a -0.44953
88: a -0.82620
89: a 1.05504
90: a -0.87927
91: a -1.27713
92: a -0.63412
93: a 0.66470
94: a -0.50958
95: a 0.40736
96: a 1.67775
97: a -1.05206
98: a -0.63691
99: a 0.56539
100: a 0.38016
x y
DT1 <- data.table(x=c('a', 'a', 'b', 'dt1'), y=1:4)
DT2 <- data.table(x=c('a', 'b', 'dt2'), z=5:7)
setkey(DT1, x); setkey(DT2, x)
merge(DT1, DT2)
x y z
1: a 1 5
2: a 2 5
3: b 3 6
big_df <- data.frame(x=rnorm(1E6), y=rnorm(1E6))
file <- tempfile()
write.table(big_df, file=file, row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)
system.time(fread(file))
user system elapsed
0.301 0.012 0.313
system.time(read.table(file, header=TRUE, sep="\t"))
user system elapsed
8.519 0.063 8.622
melt
and dcast
for data.tables