Here is the R code for creating the table and calculating the expected counts under the null hypothesis of pi=1/10 for all i.
mydata <- c(98, 99, 100, 89, 107, 114, 100, 112, 85, 96) n <- sum(mydata) ex <- n * rep(1/10, 10)
Here is the code for calculating the chi-square test statistic and corresponding P-value.
chi <- sum( (mydata-ex)^2 / ex) # value = 7.56 1 - pchisq(chi, 9) # P-value = 0.58
Here is the code for calculating the LRT statistic and corresponding P-value.
lrt <- 2 * sum( mydata * log(mydata/ex) ) # value = 7.58 1 - pchisq(lrt, 9) # P-value = 0.58
Here is some code for using computer simulations to estimate P-values.
n.sim <- 1000 results <- matrix(ncol=2, nrow=n.sim) for(i in 1:n.sim) { # simulate data simdat <- rmultinom(1, n, rep(0.1,10)) # calculate statistics chi.sim <- sum( (simdat - ex)^2 / ex) lrt.sim <- 2 * sum( simdat * log(simdat/ex) ) results[i,] <- c(chi.sim, lrt.sim) } # p-value for chi-square test mean(results[,1] >= chi) # p-value = 0.57 (for my simulations) # p-value for likelihood ratio test (LRT) mean(results[,2] >= lrt) # p-value = 0.58 (for my simulations)
With the second version of the table, in applying either the chi-square test or likelihood ratio test, we would get identical statistics and P-values. However, these tests look for general differences from equal frequencies, and not the sort of trend in this observed table.
With such data, we would want to use a different statistical test, such as of whether the mean (or median) is equal to 4.5 or not. Of course, one would ideally choose to perform such a test before obtaining the data rather than after having observed such a trend.
[ Main page | 4th term syllabus | R for Windows ] | Last modified: Tue Mar 28 17:52:44 EST 2006 |