Lecture11
========================================================
author: John S
date: October 19, 2015

The "apply family"" and timing code
========================================================

- Often things that can be done in a loop can often be done with an "apply type" function.
- Sometimes this is faster.
- It can be easier to program and read.

Today
=====
- How to time code.
- apply() versus a loop to compute functions across rows or columns
- tapply(): apply a function to different splits of the data
- split(): exisits too! 
- Can do with a loop, but tapply() is easier.


Timing a block of code
========================================================

```{r, tidy=T}
t1 <- proc.time()
  for (i in 1:1000000)
    runif(10)
t2 <- proc.time()
t2-t1
```

In general, for loops are slow
========================================================

Have 1000000 numbers. How many are >= 0.8? Use a loop:
```{r,tidy=T,cache=T}
x <- rnorm(n=1e6)
t1 <- proc.time()
count <- 0
for (i in 1:length(x)){if (x[i]>=0.8) count <- count+1}
count
proc.time()-t1
```

In general, loops are slow
========================================================
Solution 1: use a function that acts on a vector
```{r,tidy=T,cache=T}
t1 <- proc.time()
sum((x>=0.8))
  proc.time()-t1
```

apply()
========
x has 10000 rows, 1000 columns, want mean for each row.
```{r,tidy=T,cache=T}
x <- matrix(rexp(1000*10000),10000,1000)
x[1:3,1:5]  
```

Try with a loop
===
```{r,tidy=T}
t1 <- proc.time()
sums <- rep(NA,dim(x)[1])
for (i in 1:dim(x)[1])
  sums[i] <- mean(x[i,])
proc.time()-t1
sums[1:5]
```

Try with a apply()
===
```{r,tidy=T}
t1 <- proc.time()
sums <- apply(x,1,mean)
proc.time()-t1
sums[1:5]
```

tapply()
========
Apply functions elments of a vector rows defined by unique values of another variable.
```{r,tidy=T,cache=T}
library(MASS)
data(cats)
head(cats)
```

tapply()
========
```{r,tidy=T}
tapply(cats$Bwt,cats$Sex,mean)
```

tapply(): two factors!
========
```{r,tidy=T,cache=T}
data(Cars93)
tapply(Cars93$Price,list(Cars93$Type,Cars93$Origin),mean)
```

related: split()
========
```{r,tidy=T,cache=T}
divided <- split(Cars93,list(Cars93$Type,Cars93$Origin))
names(divided)
```

replated: split()
=======
```{r,tidy=T}
  divided$`Van.non-USA`[1:3,1:5]
```
We'll see more about split() later.

Example: 
========
Long time series. Want sd of each 10 sequential measurements.

```{r,tidy=T,cache=T}
x <- rgamma(1e6,1,3)
t1 <- proc.time()
sds <- rep(NA,1e5)
for (i in 1:1e5)
  sds[i] <- sd(x[(1+(i-1)*10):(i*10)])
proc.time()-t1
sds[1:3]
```


Try with tapply(): simpler, but not always faster...
======
```{r,tidy=T,cache=T}
t1 <- proc.time()
sds <- tapply(x,rep(1:1e5,each=10),sd)
proc.time()-t1
sds[1:3]
```

Next: 
==========
- Have data from 18 countries.
- Want to fit a separate model to each country's data.
- Could loop. 
- Other options: sapply(), by(), 
- We will see both and time them.