Advanced statistical computing (140.778)

(3 units) Second term, 2000-2001
Mon and Wed, 9:00 - 10:30am
W4007 Hygiene

Summary

This is a course in modern statistics (i.e., statistics using the computer), for the sophisticated user of statistics and computers. We will introduce topics in numerical analysis useful for statistical modeling and analysis. We will focus on computing above statistics and algorithms above programming. Example methods include deterministic and stochastic methods for optimization and integration, the EM algorithm, Monte Carlo simulation (both non-iterative and iterative), and kernel density estimation. Applications include Bayesian hierarchical models, mixture models, time series, nonlinear regression, smoothing, classification, and modern variable selection.

Prerequisites: 140.751-752 and 140.771-772 or equivalent; computer programming required (e.g. R/S-Plus/Matlab and/or C/C++/Fortran).

Student evaluation will be based on two programming projects (to be done in R/S-Plus/Matlab or whatever else you may prefer).

Note: I will be trying to prepare documents to help people get started with C, perl and R. See the following links. [ C | perl | R ]

Syllabus (from 2000-2001)

October30 Introduction; statistical computing in practice
Notes: [ pdf (560k) ]
November1 R (in brief)
Notes: [ pdf (191k) ]
R problem set: [ Data | Problems (pdf 13k) | Solutions: Part A / Part B ]
Reading: MASS (ch 1-4)
Additional comments
6 Random number generation
Notes: [ pdf (362k) ]
Reading: NAS (ch 20); NRC (ch 7); MASS (§ 5.2)
8 Permutation test and the bootstrap
Notes: [ pdf (208k) ]
Reading: NAS (ch 22), Efron and Tibshirani (§ 9.5)
Additional comments
13 Numerical linear algebra
Notes: [ pdf (301k) ]
Reading: NAS (ch 8-9), Thisted (ch 3)
Additional comments
Assignment 1 (due Nov 29): [ latex (3.6k) | pdf (17k) | data ]
15 EM algorithm
Notes: [ pdf (243k) ]
Reading: NAS (ch 10)
20 Newton-Raphson, Fisher scoring
Notes: [ pdf (133k) ]
Reading: NAS (ch 11), Thisted (ch 4)
22 Nonlinear regression, iteratively reweighted least squares
Notes: [ pdf (254k) ]
Reading: NAS (§ 11.4, 11.5), Thisted (§ 4.5.5, 4.5.6)
27 EM algorithm extensions
Notes: [ pdf (222k) ]
Reading: NAS (ch 12)
29 Downhill simplex method, Lp regression and constrained optimization
Notes: [ pdf (269k) ]
Reading: NAS (ch 14), NRC (§ 10.4), Thisted (§ 4.5.7)
Assignment 2 (due Dec 20): [ latex (2.4k) | pdf (13k) | data | code ]
December4 Numerical integration
Notes: [ pdf (345k) ]
Reading: NAS (ch 16), NRC (ch 4)
6 Hidden Markov models
Notes: [ pdf (231k) ]
Reading: NAS (§ 23.3)
11 Markov chain Monte Carlo I
Notes: [ pdf (749k) ]
Reading: NAS (ch 24)
Additional comments
13 Markov chain Monte Carlo II
Notes: [ pdf (1,334k) ]
Additional comments
18 Tree-based models (aka recursive partitioning) and neural networks
Notes: [ pdf (220k) ]
Reading: MASS (ch 10, § 9.4)
20 Program design
Notes: [ pdf (423k) ]
Reading: S programming (§ 8.4), Writing R extensions [pdf (387k; 60 pgs)]

Recommended books

Recommended web sites


kbroman at jhsph.edu
http://www.biostat.wisc.edu/~kbroman

Last modified: Wed Apr 15 23:02:19 2009