## Dependencies

This document depends on the following packages:

  library(devtools)
library(Biobase)

To install these packages you can use the code (or if you are compiling the document, remove the eval=FALSE from the chunk.)

install.packages(c("devtools"))
source("http://www.bioconductor.org/biocLite.R")
biocLite(c("Biobase"))

## General principles

• Can we find patterns in matrices of data?

We will use this expression set that combines two studies Transcriptome genetics using second generation sequencing in a Caucasian population. and Understanding mechanisms underlying human gene expression variation with RNA sequencing.. These studies are different populations but we counted the same genes for both. Then we’ll explore the differences.

con =url("http://bowtie-bio.sourceforge.net/recount/ExpressionSets/montpick_eset.RData")
close(con)
mp = montpick.eset
pdata=pData(mp)
edata=as.data.frame(exprs(mp))
fdata = fData(mp)
ls()
## [1] "con"           "edata"         "fdata"         "montpick.eset"
## [5] "mp"            "pdata"         "tropical"

## Calculate the singular vectors

Here we calculate the singular vectors:

edata = edata[rowMeans(edata) > 100, ]
edata = log2(edata + 1)
edata_centered = edata - rowMeans(edata)
svd1 = svd(edata_centered)
names(svd1)
## [1] "d" "u" "v"

## Look at the percent variance explained

The percent of variance explained is given by $$\frac{d_{ii}}{\sum_{j}d_{jj}^2}$$