1. Load the data
First we load the R/qtl package and print the version we’re using.
> library(qtl)
> qtlversion()
[1] "1.25-15"
We load the raw genotype data, which includes data on the parental strains and F1. We drop the control mouse with a blank for “strain.”
> rawg <- read.cross("csv", "Data", "genotypes4rqtl.csv", genotypes=0:2,
+ convertXdata=FALSE, alleles=c("B","R"))
> rawg <- subset(rawg, ind=(rawg$pheno$Strain!=""))
The mouse with missing strain also had missing sex, we clean up the sex and strain phenotypes as follows:
> rawg$pheno$Sex <- factor(as.character(rawg$pheno$Sex), levels=c("Female", "Male"))
> rawg$pheno$Strain <- factor(as.character(rawg$pheno$Strain), levels=unique(as.character(rawg$pheno$Strain)))
A quick summary:
> summary(rawg)
F2 intercross
No. individuals: 565
No. phenotypes: 5
Percent phenotyped: 100 100 100 100 100
No. chromosomes: 21
Autosomes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 un
X chr: X
Total markers: 4853
No. markers: 386 343 340 282 256 274 234 248 216 219 257 218 230 240 197 243 183 194 117 169 7
Percent genotyped: 98.6
Genotypes (%): BB:40 BR:21.3 RR:38.8 not RR:0 not BB:0
Note that the chromosome "un"
is a group of 7
markers that I couldn’t find in the mouse genome build 37.
2. Missing data pattern
We first look at the number of missing genotypes at each mouse and each marker.
> par(mfrow=c(2,1), las=1)
> hist(nmissing(rawg), main="No. missing genotypes by mouse",
+ xlab="No. missing genotypes", breaks=140)
> rug(nmissing(rawg), col="blue")
> hist(nmissing(rawg, what="mar"), main="No. missing genotypes by marker",
+ xlab="No. missing genotypes", breaks=50)
> rug(nmissing(rawg, what="mar"), col="blue")
There are three mice with no genotype data.
> rawg$pheno[ntyped(rawg)==0,]
longid MouseNum Sex Strain PlatePos
265 tid28116_Mouse3377_F2_Male Mouse3377 Male F2 SMP4_0001630:C04
311 tid28077_Mouse3338_F2_Male Mouse3338 Male F2 SMP4_0001632:B10
498 tid28004_Mouse3265_F2_Female Mouse3265 Female F2 SMP4_0001632:A01
I might as well drop these.
> rawg <- subset(rawg, ind=(ntyped(rawg)>0))
We should look at the histogram of the number of missing genotypes per mouse again.
> par(las=1)
> hist(nmissing(rawg), main="No. genotypes by mouse",
+ xlab="No. markers genotyped", breaks=140)
> rug(nmissing(rawg), col="blue")
There remain a number of mice with a good amount of missing data, which might suggest poor-quality DNAs or other problems with the genotyping arrays. There are 13 mice missing genotypes at more than 5% of the markers (that is, at more than 242 of the 4853 markers). There are also a number of markers with elevated rates of missing data. There are 188 markers with more than 5% missing data (that is, at more than 28 of the 562 mice). But none of these seem terrible.
3. Study X chromosome genotypes
Let’s start by looking at the X chromosome genotypes. The cross was (BTBR x B6) x (BTBR x B6), with females listed first, and so the F1 males should all be hemizygous B6 or BTBR while the F1 females should be heterozygous. The F2 males should be hemizygous B6 or BTBR and the F2 females should be heterozygous or homozygous BTBR.
We pull out the X chromosome genotypes and the “strain” and sex for each mouse.
> gx <- pull.geno(rawg, chr="X")
> dim(gx)
[1] 562 169
> strain <- rawg$pheno$Strain
> sex <- rawg$pheno$Sex
3.1. Not segregating
First, let’s look at which markers seem to segregate in the F2. We create a table of the three possible genotypes (1, 2, 3) and look at the number of mice with the least-frequent genotype. Since there are 289 F2 males and 262 F2 females, all markers should have an appreciable number of mice with each of the three genotypes.
> ming <- apply(apply(gx[strain=="F2",], 2, function(a) table(factor(a, levels=1:3))), 2, min)
> table(ming)
ming
0 1 2 3 6 120 125 126 131 132 133 134 136 138 143
138 6 2 2 1 1 2 1 5 2 3 3 1 1 1
There are 149 markers that appear to not be
segregating. Let’s go ahead and omit those. We’ll create a new cross
object cleang
that will contain the cleaned genotypes.
> todrop <- names(ming[ming<120])
> cleang <- drop.markers(rawg, todrop)
> gx <- pull.geno(cleang, chr="X")
> dim(gx)
[1] 562 20
We have just 20 markers left on the X chromosome.
3.2. Genotypes in B6 and BTBR
Let’s first look at the B6 mice. There are 3 such; while 2 are male and 1 is female, they should all be homozygous for the same allele.
> all(apply(gx[strain=="B6",], 2, function(a) length(unique(a[!is.na(a)]))) == 1)
[1] TRUE
> b6xg <- apply(gx[strain=="B6",], 2, function(a) unique(a[!is.na(a)]))
All of the B6 mice are homozygous for the same allele, on the X chromosome.
We repeat this for the 4 BTBR mice, of which 2 are male and 2 are female.
> all(apply(gx[strain=="BTBR",], 2, function(a) length(unique(a[!is.na(a)])) == 1))
[1] TRUE
> btbrxg <- apply(gx[strain=="BTBR",], 2, function(a) unique(a[!is.na(a)]))
All of the BTBR mice are homozygous for the same allele, on the X chromosome.
We further check that the B6 and BTBR genotypes are really homozygous and for different alleles at these 20 markers.
> all((b6xg == 1 & btbrxg==3) | (b6xg == 3 & btbrxg==1))
[1] TRUE
Let’s swap the genotype codes for the markers where the B6 mice have
the 3
genotype (to make it so that 1
is homozygous B6 and 3
is homozygous BTBR).
> toswap <- names(b6xg)[b6xg==3]
> for(mar in toswap)
+ cleang$geno[["X"]]$data[,mar] <- 4 - cleang$geno[["X"]]$data[,mar]
> gx <- pull.geno(cleang, chr="X")
3.3. Genotypes in F1
Let’s look at the 4 F1 mice, of which 2 are male and 2 are female. The males should all by homozygous BTBR and the females should all by heterozygous.
> f1male.gx <- gx[strain=="F1" & sex=="Male",]
> f1female.gx <- gx[strain=="F1" & sex=="Female",]
> all(is.na(f1male.gx) | f1male.gx==3)
[1] TRUE
> all(is.na(f1female.gx) | f1female.gx==2)
[1] TRUE
Everything looks good.
3.4. Genotypes in F2 males
We now turn to the F2 males. They should all be hemizgous B6 (1
)
or BTBR (3
).
> f2male.gx <- gx[strain=="F2" & sex=="Male",]
> sum(!is.na(f2male.gx) & f2male.gx==2)
[1] 171
There are 171 heterozygous genotypes in the 289 male mice. Let’s look at the males that have heterozygous calls.
> wh <- apply(f2male.gx, 1, function(a) any(!is.na(a) & a==2))
> sum(wh)
[1] 18
> tab <- apply(f2male.gx[wh,], 1, function(a) table(factor(a[!is.na(a)], levels=1:3)))
> tab[,order(tab[2,])]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
1 7 5 13 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 1 4 4 5 7 8 10 11 13 14 15 16 20 20 20
3 12 13 5 3 15 16 14 13 12 9 9 7 6 5 4 0 0 0
The first 4 of these look to be male but with a single genotyping error. The other 14 look to be really female. Let’s save the IDs for these mice.
> malef2id <- as.character(cleang$pheno$MouseNum)[strain=="F2" & sex=="Male"]
> notmale <- malef2id[apply(f2male.gx, 1, function(a) sum(!is.na(a) & a==2)>1)]
Note that there are additional male mice that might really be females; if all of their genotypes on the X chromosome are homozygous/hemizygous BTBR, we can’t tell.
> sum(apply(f2male.gx, 1, function(a) all(is.na(a) | a==3)))
[1] 50
3.5. Genotypes in F2 females
We turn to the F2 females. They should all be heterozygous (2
)
or homozygous BTBR (3
).
> f2female.gx <- gx[strain=="F2" & sex=="Female",]
> sum(!is.na(f2female.gx) & f2female.gx==1)
[1] 270
There are 270 homozygous B6 genotypes in the 262 female mice. Let’s look at the females that have homozygous B6 calls.
> wh <- apply(f2female.gx, 1, function(a) any(!is.na(a) & a==1))
> sum(wh)
[1] 21
> tab <- apply(f2female.gx[wh,], 1, function(a) table(factor(a[!is.na(a)], levels=1:3)))
> tab[,order(tab[1,])]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
1 5 5 5 7 8 9 10 11 11 13 13 13 15 16 16 16 18 19 20 20 20
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 15 15 15 13 8 10 10 9 9 7 7 7 5 4 4 4 2 0 0 0 0
These all look to be clearly males. Let’s save the IDs for these mice.
> femalef2id <- as.character(cleang$pheno$MouseNum)[strain=="F2" & sex=="Female"]
> notfemale <- femalef2id[apply(f2female.gx, 1, function(a) sum(!is.na(a) & a==1)>1)]
Note that there are additional female mice that might really be males; if all of their genotypes on the X chromosome are homozygous/hemizygous BTBR, we can’t tell.
> sum(apply(f2female.gx, 1, function(a) all(is.na(a) | a==3)))
[1] 53
3.6. Swap sexes
Let’s go ahead and swap the sexes of these mice whose X chromosome genotypes
indicate they are the opposite sex. We’ll change the phenotype Sex
to SexID
and make a new “phenotype” Sex
.
> names(cleang$pheno)[names(cleang$pheno)=="Sex"] <- "SexID"
> cleang$pheno$Sex <- cleang$pheno$SexID
> cleang$pheno$Sex[match(notmale, cleang$pheno$MouseNum)] <- "Female"
> cleang$pheno$Sex[match(notfemale, cleang$pheno$MouseNum)] <- "Male"
> table(cleang$pheno$SexID, cleang$pheno$Sex)
Female Male
Female 246 21
Male 14 281
I’ll re-create my sex
vector to match the new values.
> sex <- cleang$pheno$Sex
3.7. Re-code genotypes
Now let’s recode the X chromosome genotypes. F2 females should have
genotype codes 1
for homozygous BTBR and 2
for heterozygous. F2
males should have genotype codes 1
for homozygous B6 and 2
for
homozygous BTBR.
First, we handle the males.
> gx <- pull.geno(cleang, chr="X")
> whmale <- strain=="F2" & sex=="Male"
> gxmale <- gx[whmale,]
> sum(!is.na(gxmale) & gxmale==2)
[1] 4
> gxmale[!is.na(gxmale) & gxmale==2] <- NA
> gxmale[!is.na(gxmale) & gxmale==3] <- 2
> gx[whmale,] <- gxmale
Now, we handle the females.
> whfemale <- strain=="F2" & sex=="Female"
> gxfemale <- gx[whfemale,]
> sum(!is.na(gxfemale) & gxfemale==1)
[1] 0
> gxfemale[!is.na(gxfemale) & gxfemale==1] <- NA
> gxfemale[!is.na(gxfemale) & gxfemale==3] <- 1
> gx[whfemale,] <- gxfemale
Now we paste the genotypes back into the cross object.
> cleang$geno[["X"]]$data <- gx
Let’s further add a pgm
phenotype, indicating cross
direction, which will be 1 for all F2 mice.
> cleang$pheno$pgm <- rep(NA, nind(cleang))
> cleang$pheno$pgm[strain=="F2"] <- 1
The summary of the cross (with just the F2 mice) should be clean. (But then, the warnings don’t get printed here anyway!)
> summary(subset(cleang, ind=(strain=="F2")))
F2 intercross
No. individuals: 551
No. phenotypes: 7
Percent phenotyped: 100 100 100 100 100 100 100
No. chromosomes: 21
Autosomes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 un
X chr: X
Total markers: 4704
No. markers: 386 343 340 282 256 274 234 248 216 219 257 218 230 240 197 243 183 194 117 20 7
Percent genotyped: 99.1
Genotypes (%): BB:39.6 BR:22.2 RR:38.3 not RR:0 not BB:0
3.8. Segregation patterns
Let’s look at the segregation of the genotypes on the X chromosome. First, let’s look at the output of geno.table. What we’re looking for here are departures from 1:1 segregation that might indicate markers with high rates of genotyping errors.
> geno.table(subset(cleang, ind=(strain=="F2")), chr="X")
chr missing BB BR RR not.RR not.BB BY RY P.value
rs13483704 X 2 0 120 134 0 0 150 145 0.6517
rs13483715 X 1 0 125 130 0 0 149 146 0.9377
rs13483716 X 3 0 125 128 0 0 150 145 0.9416
rs13483718 X 0 0 126 129 0 0 155 141 0.7056
rs13483737 X 1 0 131 123 0 0 151 145 0.8296
rs13483738 X 1 0 131 124 0 0 149 146 0.8946
rs13483741 X 3 0 131 123 0 0 148 146 0.8757
rs13483746 X 1 0 134 121 0 0 147 148 0.7167
rs13483750 X 19 0 132 120 0 0 146 134 0.5811
rs13483765 X 2 0 131 123 0 0 154 141 0.6620
rs13483766 X 7 0 132 122 0 0 152 138 0.5858
rs13483767 X 1 0 132 122 0 0 154 142 0.6440
rs13483770 X 1 0 134 121 0 0 148 147 0.7167
rs13484015 X 12 0 138 112 0 0 151 138 0.1931
rs13484018 X 25 0 141 107 0 0 144 134 0.0812
rs13459160 X 2 0 133 122 0 0 152 142 0.6654
rs3715531 X 3 0 133 122 0 0 150 143 0.7255
rs13484105 X 6 0 133 121 0 0 152 139 0.5634
rs13459161 X 3 0 131 124 0 0 147 146 0.9068
rs13484113 X 1 0 136 119 0 0 156 139 0.3477
Everything looks okay. Let’s make a couple of plots, of genotype frequencies along the X chromosome for each sex.
> par(mfrow=c(2,1), las=1)
> f2male <- subset(cleang, ind=(strain=="F2" & sex=="Male"))
> plot(geno.table(f2male, chr="X", scanone=TRUE), lod=8:9,
+ main="X chr genotype frequencies; F2 males",
+ ylab="Genotype frequency", col=c("blue","red"),
+ ylim=c(0.3, 0.7))
> abline(h=0.5, lty=2, col="gray")
> legend("topleft", lwd=2, col=c("blue","red"),
+ legend=c("hem B6", "hem BTBR"))
> f2female <- subset(cleang, ind=(strain=="F2" & sex=="Female"))
> plot(geno.table(f2female, chr="X", scanone=TRUE), lod=4:5,
+ main="X chr genotype frequencies; F2 females",
+ ylab="Genotype frequency", col=c("green", "red"),
+ ylim=c(0.3, 0.7))
> abline(h=0.5, lty=2, col="gray")
> legend("topleft", lwd=2, col=c("green","red"),
+ legend=c("het", "hom BTBR"))
The departures from 1:1 segregation seem not too worrisome.
3.9. Counts of crossovers
Let’s look at the number of crossovers in the F2 mice on the X chromosome.
> nxo <- countXO(subset(cleang, ind=(strain=="F2")), chr="X")
> table(nxo)
nxo
0 1 2 3 4 6
216 261 65 3 3 3
There are 9 mice with >2 crossovers. Let’s plot
the genotype data for these mice. We run calc.errorlod
to avoid a
warning message. (The genotyping error LOD scores are used in
plot.geno
to flag potential genotyping errors.)
> temp <- subset(cleang, ind=(strain=="F2"))
> temp <- subset(temp, ind=(nxo>2), chr="X")
> temp <- calc.errorlod(temp, error.prob=0.002, map.function="c-f")
> plot.geno(temp, chr="X")
These look unusual, but there are no obvious problems.
3.10. Re-estimate map
Finally, let’s re-estimate the genetic map for the X chromosome and see how it differs from the map from Cox et al. (Genetics, 182:1335-1344, 2009). I’ll use a genotype error rate of 2/1000, and the Carter-Falconer map function (which corresponds, approximately, to the high level of crossover interference in the mouse).
> f2g <- subset(cleang, ind=(strain=="F2"), chr="X")
> newmap <- est.map(f2g, err=0.002, map.function="c-f")
Let’s look at summaries of the original and the newly estimated maps.
> summary.map(f2g)[1,,drop=FALSE]
n.mar length ave.spacing max.spacing
20.0 75.5 4.0 23.9
> summary.map(newmap)[1,,drop=FALSE]
n.mar length ave.spacing max.spacing
20.0 77.7 4.1 24.0
Overall, the map hasn’t changed much.
Let’s plot the map next to the one embedded in the cross (which corresponds to that of Cox et al. 2009). The map on the bottom is the newly estimated one
> plot.map(f2g, newmap, horizontal=TRUE)
It sure looks similar. It might be better to study the individual interval lengths.
> dom <- diff(pull.map(f2g)[[1]])
> dnm <- diff(newmap[[1]])
> par(las=1)
> plot(dom, dnm-dom, type="n", xlab="Original interval length (cM)",
+ ylab="Change in interval length (cM)")
> abline(h=0, lty=2, col="gray")
> text(dom, dnm-dom, seq(along=dnm))
Note that the numbers indicate the intervals. Some of the smaller intervals have changed a great deal in length, but overall it looks pretty good.
4. Study autosomal genotypes
We now turn to the autosomal genotypes.
> ga <- pull.geno(cleang, chr="-X")
> dim(ga)
[1] 562 4684
4.1. Not segregating
First, let’s look at which markers seem to segregate in the F2. We create a table of the three possible genotypes (1, 2, 3) and look at the proportion of mice with the least-frequent genotype. Since there are 4684 markers, we’ll look at a histogram.
> ming <- apply(apply(ga[strain=="F2",], 2, function(a) table(factor(a, levels=1:3))/sum(!is.na(a))), 2, min)
> par(las=1)
> hist(ming, breaks=seq(0, ceiling(max(ming)*100)/100, by=0.005),
+ xlab="Frequency of least-frequent genotype", main="")
> rug(ming, col="blue")
Note the big gap between 4% and 17%. There are 2640 markers that appear not to be segregating, as the least-frequent genotype is present in <10% of mice. There are 2044 markers that do appear to be segregating, as the least-frequent genotype is present in >10% of mice.
Let’s go ahead and omit the non-segregating markers.
> todrop <- names(ming[ming<0.1])
> cleang <- drop.markers(cleang, todrop)
> ga <- pull.geno(cleang, chr="-X")
> dim(ga)
[1] 562 2044
4.2. Genotypes in B6 and BTBR
Let’s look at the B6 mice. There are 3 such; they should all be homozygous for the same allele.
> bad.b6 <- apply(ga[strain=="B6",], 2, function(a) (length(unique(a[!is.na(a)]))) != 1 || all(is.na(a) | a==2))
> sum(bad.b6)
[1] 8
> ga[strain=="B6",bad.b6]
rs6163246 rs13478222 rs13478387 rs4226443 rs6188470 rs8259478 rs13481107 rs6358703
[1,] 3 2 2 2 3 2 2 2
[2,] 2 NA 3 3 2 NA 1 3
[3,] NA NA 3 3 2 3 1 3
There are 8 markers with heterozygous calls, including one with no homozygous calls.
Let’s grab the inferred genotype (1
, 3
, or NA
) for B6 mice.
> b6ga <- apply(ga[strain=="B6",], 2, function(a) if(all(is.na(a) | a==2)) return(NA) else return(unique(a[!is.na(a) & a!=2])))
> sum(is.na(b6ga))
[1] 1
> table(b6ga)
b6ga
1 3
1036 1007
We repeat this for the 4 BTBR mice.
> bad.btbr <- apply(ga[strain=="BTBR",], 2, function(a) (length(unique(a[!is.na(a)]))) != 1 || all(is.na(a) | a==2))
> sum(bad.btbr)
[1] 7
> ga[strain=="BTBR",bad.btbr]
rs3022901 rs6183172 rs6173859 rs8259478 rs13481105 rs6302629 rs13483472
[1,] 1 1 1 1 1 3 2
[2,] 1 1 1 1 1 3 3
[3,] 1 1 1 1 1 3 3
[4,] 2 2 2 2 2 2 3
There are 7 markers with heterozygous calls.
Again, we pull out the inferred genotype for BTBR mice.
> btbrga <- apply(ga[strain=="BTBR",], 2, function(a) if(all(is.na(a) | a==2)) return(NA) else return(unique(a[!is.na(a) & a!=2])))
> sum(is.na(btbrga))
[1] 0
> table(btbrga)
btbrga
1 3
1007 1037
We further check that the B6 and BTBR genotypes are homozygous for different alleles at these 2044 markers.
> all(is.na(b6ga) | (b6ga==1 & btbrga==3) | (b6ga==3 & btbrga==1))
[1] TRUE
Let’s swap the genotype codes for the markers where the BTBR mice have
the 1
genotype (to make it so that 1
is homozygous B6 and 3
is homozygous BTBR).
> toswap <- names(btbrga)[btbrga==1]
> for(chr in names(cleang$geno)[-20]) {
+ thisswap <- toswap[!is.na(match(toswap, markernames(cleang, chr=chr)))]
+ cleang$geno[[chr]]$data[,thisswap] <- 4 - cleang$geno[[chr]]$data[,thisswap]
+ }
> ga <- pull.geno(cleang, chr="-X")
4.3. Genotypes in F1
Let’s look at the 4 F1 mice. They should all be heterozygous.
> f1ga <- ga[strain=="F1",]
> sum(!is.na(f1ga) & f1ga != 2)
[1] 18
> f1ga[,apply(f1ga, 2, function(a) !all(is.na(a) | a==2))]
rs13476089 rs13476669 rs4223511 rs6183172 rs13477660 rs6355453 rs13478004 rs13478983 rs13479059 rs13479197 rs13479440
[1,] 1 1 2 2 3 2 2 2 2 2 2
[2,] 2 2 1 2 2 3 NA 2 3 1 2
[3,] 2 2 2 1 2 2 2 3 2 2 2
[4,] 2 2 2 2 2 2 3 2 2 2 3
rs13480554 rs3023809 rs13481105 rs3698807 rs13481939 rs13483373 rs3090325
[1,] 2 1 2 2 2 1 1
[2,] 3 2 2 3 1 2 NA
[3,] 2 2 2 2 2 2 2
[4,] 2 2 1 2 2 2 2
There are 18 markers with a homozygous F1 mouse, but at each of these markers there is just one such mouse.
4.4. Segregation in F2
We now turn to the segregation patterns of these autosomal markers in
the F2 mice. As in our study of the X-linked markers, we are
looking for markers whose aberrant segregation indicates a high
genotyping error rate. We use geno.table
to inspect the segregation
patterns; the p-values are for tests of 1:2:1 segregation. We do the
calculation for all markers, and then pull out the rows that show
significant departure at a 5% level, after a Bonferroni correction for
the multiple tests.
> gt <- geno.table(subset(cleang, chr="-X", ind=(strain=="F2")))
> bad <- gt$P.value < 0.05/nrow(gt)
> gt[bad,]
chr missing BB BR RR not.RR not.BB P.value
rs3689700 3 29 196 197 129 0 0 2.82e-11
rs13477546 4 46 120 212 173 0 0 5.80e-06
rs13478004 4 73 116 196 166 0 0 2.34e-06
rs13459096 6 0 219 228 104 0 0 1.05e-14
rs13478602 6 0 219 227 105 0 0 1.12e-14
rs6172481 6 0 220 226 105 0 0 5.17e-15
rs13478606 6 0 221 225 105 0 0 2.37e-15
rs13478610 6 1 224 230 96 0 0 7.32e-17
rs13478667 6 1 197 243 110 0 0 2.55e-08
rs13481814 13 46 174 207 124 0 0 1.95e-06
The markers on chromosome 6 show real segregation distortion rather than genotyping errors, due to the study design (that’s where leptin sits). The other 4 markers are probably error-prone. (The two markers on chromosome 4 are not near each other.) It’s probably best to remove them.
> cleang <- drop.markers(cleang, rownames(gt)[bad & gt$chr != 6])
Let’s look at plots of the segregation patterns. First, we re-run
geno.table
with the argument scanone=TRUE
(more formally,
scanone.output=TRUE
), so that the output is as that from a genome
scan.
> gt <- geno.table(subset(cleang, chr="-X", ind=(strain=="F2")), scanone=TRUE)
Now let’s plot the -log (base 10) p-values and the genotype frequencies, for each chromosome. For the p-values, I’ll use a different y-axis for chromosome 6 than for the others.
This is a very long set of figures; click here to jump to below them.
> par(mfrow=c(20,2))
> ym1a <- max(gt$neglog10P)
> ym1b <- max(gt$neglog10P[gt$chr != 6])
> ym2 <- max(gt$BR)
> for(i in c(1:19,"un")){
+ plot(gt, chr=i, main=paste("Chr", i), ylim=c(0,ifelse(i==6,ym1a,ym1b)))
+ plot(gt, chr=i, lod=3:5, main=paste("Chr", i), ylab="Genotype frequency",
+ ylim=c(0, ym2), col=c("green","blue","red"))
+ legend("bottomleft", lwd=2, col=c("green","blue","red"), c("BB","BR","RR"))
+ }
(Click here to jump back to the top of those figures.)
There are lots of little spikes in these figures (e.g., chromosome 13 at 46.4 cM): single markers with somewhat aberrant segregation patterns relative to the surrounding markers. These likely correspond to a small proportion of genotyping errors.
The bad marker on chromosome 13 is quite clearly indicated to be due to a large batch of double-crossovers (indicating likely genotyping errors). To show that, we look at two-locus genotype tables for the problem marker and the adjacent markers to the left and right.
> f2g <- subset(cleang, ind=(strain=="F2"))
> themar <- rownames(gt)[gt$chr==13 & gt[,3]>4]
> c13mar <- markernames(cleang, chr=13)
> themarnum <- which(c13mar == themar)
> c13mar[themarnum+c(-1,0,1)]
[1] "rs6322774" "rs13481939" "rs6288319"
> geno.crosstab(f2g, c13mar[themarnum+c(-1,1)])
rs6288319
rs6322774 - BB BR RR
- 0 1 0 0
BB 1 135 0 0
BR 0 0 264 1
RR 0 0 0 149
> geno.crosstab(f2g, c13mar[themarnum+c(-1,0)])
rs13481939
rs6322774 - BB BR RR
- 0 1 0 0
BB 1 134 1 0
BR 22 31 212 0
RR 9 0 1 139
> geno.crosstab(f2g, c13mar[themarnum+c(0,1)])
rs6288319
rs13481939 - BB BR RR
- 0 1 22 9
BB 1 134 30 1
BR 0 1 212 1
RR 0 0 0 139
As seen in the above tables, the markers to the left and right of rs13481939 show just one recombinant, but there are about 30 double-recombination events. This marker appears to have a genotyping error rate of about 5%. It is interesting to note that it also has 5% missing data. Probably I should look at the quality scores for this and other markers, and perhaps also the intensities from the genotyping arrays.
For now, it is probably okay to leave this and similar markers in the
data, as if we run calc.genoprob
with error.prob=0.002
, the errors
will be smoothed over. We can check that, for this marker, by looking
at the two-locus genotype tables after imputing in the genotypes with
fill.geno
.
> f2gi <- fill.geno(f2g, error.prob=0.002, map.function="c-f", method="argmax")
> geno.crosstab(f2gi, c13mar[themarnum+c(-1,1)])
rs6288319
rs6322774 - BB BR RR
- 0 0 0 0
BB 0 137 0 0
BR 0 0 264 1
RR 0 0 0 149
> geno.crosstab(f2gi, c13mar[themarnum+c(-1,0)])
rs13481939
rs6322774 - BB BR RR
- 0 0 0 0
BB 0 137 0 0
BR 0 0 265 0
RR 0 0 0 149
> geno.crosstab(f2gi, c13mar[themarnum+c(0,1)])
rs6288319
rs13481939 - BB BR RR
- 0 0 0 0
BB 0 137 0 0
BR 0 0 264 1
RR 0 0 0 149
As seen in these tables, in the imputed data, the genotypes that led to double-crossovers were deemed to be errors.
4.5. Linkage to other chromosomes
We next study whether the markers are all linked to their respective
chromosomes and not to other chromosomes. We do so by first
considering each pair of markers and calculating the
estimated recombination fractions and LOD scores indicating evidence
for linkage. This is accomplished with est.rf
. The calculation can
take quite a long time, since with have a total of
2040 autosomal markers (including
the 3 markers on chromosome "un"
).
We use pull.rf
to pull out matrices of the LOD scores and
estimated recombination fractions.
> temp <- est.rf(f2g)
> lod <- pull.rf(temp, "lod")
> rf <- pull.rf(temp, "rf")
We now calculate the maximum LOD score between a marker another on its chromosome, and then the maximum LOD score between a marker and another on some other chromosome.
> start <- cumsum(c(1, nmar(f2g)))
> end <- cumsum(nmar(f2g))
> maxlodsame <- maxloddiff <- rep(NA, sum(nmar(f2g)))
> names(maxlodsame) <- names(maxloddiff) <- markernames(f2g)
> for(i in seq(along=end)) {
+ maxlodsame[start[i]:end[i]] <- apply(lod[start[i]:end[i],start[i]:end[i]],1,max, na.rm=TRUE)
+ maxloddiff[start[i]:end[i]] <- apply(lod[start[i]:end[i],-c(start[i]:end[i],start[21]:end[21])],1,max, na.rm=TRUE)
+ }
> max(maxloddiff[-(start[21]:end[21])])
[1] 3.94
For unlinked marker pairs (ignoring the markers on chromosome "un"
), the biggest
LOD score is
3.94.
So everything looks clean.
The three markers on the "un"
chromosome map quite cleanly to
chromosomes 7 and 8, but since we don’t have reliable build 37
positions for them, it’s probably best to just ignore them.
Let me clarify that a bit. Let’s find, for each of the markers on
chromosome "un"
, the marker with the strongest evidence for linkage.
> unmar <- markernames(f2g, chr="un")
> linkedmar <- unmar
> for(i in 1:3)
+ linkedmar[i] <- colnames(lod)[!is.na(lod[,end[20]+i]) & lod[,end[20]+i]==max(lod[,end[20]+i], na.rm=TRUE)]
> find.markerpos(f2g, linkedmar)
chr pos
rs13479902 8 43.1
rs13479254 7 33.1
rs13479825 8 39.6
Now let’s plot the LOD score for each of those markers against each other marker in the genome.
> unmar <- markernames(f2g, chr="un")
> par(mfrow=c(3,1))
> for(i in unmar)
+ qtl:::plot.rfmatrix(lod, i, chr="-un", ylim=c(0,260), main=i)
4.6. Counts of crossovers
Let’s look again at the observed number of crossovers in each
individual. We’ll first consider just the autosomes (and ignore the
"un"
chromosome). These will be greatly affected by apparent
genotyping errors, and so we’ll do this both with the “raw” data as
well as with imputed genotypes.
> nxo.raw <- countXO(f2g, chr=c("-X", "-un"))
> nxo.clean <- countXO(fill.geno(f2g, method="argmax", error.prob=0.002, map.function="c-f"),
+ chr=c("-X", "-un"))
> sort(nxo.clean, decreasing=TRUE)[1:10]
235 312 419 512 356 132 245 298 43 242
376 347 342 334 327 320 52 39 38 38
There are a handful of individuals with a large number of crossovers. Let’s look at scatter plots of the raw counts against those from the imputed data.
> par(mfrow=c(1,2), las=1)
> plot(nxo.clean, nxo.raw, xlab="No. crossovers (imputed)",
+ ylab="No. crossovers (raw data)")
> plot(nxo.clean, nxo.raw, xlab="No. crossovers (imputed)",
+ ylab="No. crossovers (raw data)", xlim=c(0,60), ylim=c(0,340))
4.7. Apparent genotyping errors
Let’s look more closely at the apparent genotyping errors. We’ll look
at all of the markers except those in the "un"
group, and we’ll
compare the observed genotypes to the inferred genotypes allowing a
small amount of genotyping error (using fill.geno
with
method="argmax"
).
> gi <- pull.geno(f2gi, chr="-un")
> g <- pull.geno(f2g, chr="-un")
> rownames(g) <- rownames(gi) <- as.character(f2g$pheno$MouseNum)
Now let’s count the number of apparent errors (that is, where the observed genotype doesn’t match the imputed genotype) by marker and by mouse.
> err.by.mar <- apply((!is.na(g) & gi != g), 2, function(a) sum(a))
> err.by.mar.prop <- err.by.mar/nind(f2g)
> err.by.ind <- apply((!is.na(g) & gi != g), 1, function(a) sum(a))
> tot.mar <- totmar(subset(f2g, chr="-un"))
> err.by.ind.prop <- err.by.ind/tot.mar
Now let’s plot histograms of the proportion of errors, by marker and by mouse.
> par(las=1, mfrow=c(2,1))
> hist(err.by.mar.prop, breaks=seq(-0.25, max(err.by.mar)+ 0.25, by=0.5)/nind(f2g),
+ main="Errors by marker", xlab="Proportion of apparent genotyping errors")
> rug(err.by.mar.prop, col="blue")
> u <- par("usr")
> text(mean(u[1:2]), mean(u[3:4]), paste(sum(err.by.mar.prop > 0.02), "markers with > 2% errors"), col="red")
> text(mean(u[1:2]), mean(u[3:4])*0.7, paste(sum(err.by.mar.prop < 0.01), "markers with < 1% errors"), col="blue")
> hist(err.by.ind.prop, breaks=seq(-1, max(err.by.ind)+ 1, by=2)/tot.mar,
+ main="Errors by mouse", xlab="Proportion of apparent genotyping errors")
> rug(err.by.ind.prop, col="blue")
> u <- par("usr")
> text(mean(u[1:2]), mean(u[3:4]), paste(sum(err.by.ind.prop > 0.05), "mice with > 5% errors"), col="red")
Let’s now look at the proportion of genotyping errors by marker, after we omit the 7 mice with error rates > 5%.
> err.by.mar.dropbadmice <- apply((!is.na(g) & gi != g)[err.by.ind.prop < 0.05,], 2, function(a) sum(a))
> err.by.mar.dropbadmice.prop <- err.by.mar.dropbadmice/sum(err.by.ind.prop < 0.05)
> par(las=1)
> hist(err.by.mar.dropbadmice.prop, breaks=seq(-0.25, max(err.by.mar.dropbadmice)+ 0.25, by=0.5)/sum(err.by.ind.prop < 0.05),
+ main="Errors by marker, dropping bad mice", xlab="Proportion of apparent genotyping errors")
> rug(err.by.mar.dropbadmice.prop, col="blue")
> u <- par("usr")
> text(mean(u[1:2]), mean(u[3:4]), paste(sum(err.by.mar.dropbadmice.prop > 0.02), "markers with > 2% errors"), col="red")
> text(mean(u[1:2]), mean(u[3:4])*0.7, paste(sum(err.by.mar.dropbadmice.prop < 0.01), "markers with < 1% errors"), col="blue")
5. Study the raw genotype array information
In the last section above, we saw that there are a number of mice with very high error rates, and also a number of markers with very high error rates.
In order to figure out what’s going on with these markers and mice, it seemed best to look at the raw genotype information. The big genotype data file from Rosetta includes the allele intensities for each mouse at each marker on the Affymetrix genotyping arrays, as well as a quality score (which itself is of dubious quality).
Via a Perl script, I pulled out just the 2060 segregating markers for all mice (except the control mouse). At each marker, I grabbed the intensities for the two alleles, and also all of the other quality-related columns. The resulting file is still quite large (about 160 Mb) and takes a while to load (about 2 min). After loading the file, we’ll split it into separate data frames, one for each marker.
> allgeno <- read.csv("Data/detailed_genotypes.csv", as.is=TRUE)
> allgeno <- split(allgeno, allgeno$snp)
Let’s now sort the markers and mice by their position in cleang
.
> allgeno <- allgeno[markernames(cleang)]
> allgeno <- lapply(allgeno, function(a,b) a[match(b,a$Sample.Name),], cleang$pheno$longid)
Since these data include not just the F2 mice but also the F1, B6 and BTBR mice, let’s get imputed genotypes for all mice.
> cleangi <- fill.geno(cleang, err=0.002, map.function="c-f", method="argmax")
> cg <- pull.geno(cleang, chr="-un")
> cgi <- pull.geno(cleangi, chr="-un")
5.1. The bad marker on chromosome 13
Let’s first look at that badly behaved marker on chromosome 13, rs13481939, and the adjacent markers to the left and right.
> mnam <- markernames(cleang)
> wh <- which(mnam == themar)
> c13mar <- mnam[wh+c(-1,0,1)]
> par(mfrow=c(3,1), las=1)
> for(i in c13mar) {
+ plot(A2.Normalized.Signal ~ A1.Normalized.Signal, data=allgeno[[i]],
+ main=i, xlab="Allele 1 intensity", ylab="Allele 2 intensity",
+ col=c("blue","red","green","","","orange")[allgeno[[i]]$Measured.Genotype+1],
+ lwd=2)
+ legend("topright", pt.lwd=2, pch=c(1,1,1,1,16),
+ col=c("blue","red","green","orange", "hotpink"),
+ legend=c("11","12","22","missing","error"))
+ points(A2.Normalized.Signal ~ A1.Normalized.Signal,
+ data=allgeno[[i]], col="hotpink", pch=16,
+ subset=!is.na(cg[,i]) & cg[,i] != cgi[,i])
+ }
It’s clear, from this, why marker rs13481939 is badly behaved. The two adjacent markers show clear separation of the three clusters of genotypes, while for rs13481939, the clusters blend together. The inferred genotyping errors, highlighted in pink in the middle figure above, were called homozygous but were probably heterozygotes that melded into one of the clusters of homoyzgotes.
5.2. The best and worst markers
Let’s look at the same sort of figures for the top five and bottom five markers.
> worst5 <- names(sort(err.by.mar.dropbadmice, decreasing=TRUE)[1:5])
> best5 <- names(sort(err.by.mar.dropbadmice, decreasing=FALSE)[1:5])
> par(mfrow=c(5,2), las=1)
> for(i in 1:5) {
+ plot(A2.Normalized.Signal ~ A1.Normalized.Signal, data=allgeno[[worst5[i]]],
+ main=worst5[i], xlab="Allele 1 intensity", ylab="Allele 2 intensity",
+ col=c("blue","red","green","","","orange")[allgeno[[worst5[i]]]$Measured.Genotype+1],
+ lwd=2)
+ legend("topright", pt.lwd=2, pch=c(1,1,1,1,16),
+ col=c("blue","red","green","orange", "hotpink"),
+ legend=c("11","12","22","missing","error"))
+ points(A2.Normalized.Signal ~ A1.Normalized.Signal,
+ data=allgeno[[worst5[i]]], col="hotpink", pch=16,
+ subset=!is.na(cg[,worst5[i]]) & cg[,worst5[i]] != cgi[,worst5[i]])
+
+ plot(A2.Normalized.Signal ~ A1.Normalized.Signal, data=allgeno[[best5[i]]],
+ main=best5[i], xlab="Allele 1 intensity", ylab="Allele 2 intensity",
+ col=c("blue","red","green","","","orange")[allgeno[[best5[i]]]$Measured.Genotype+1],
+ lwd=2)
+ legend("topright", pt.lwd=2, pch=c(1,1,1,1,16),
+ col=c("blue","red","green","orange", "hotpink"),
+ legend=c("11","12","22","missing","error"))
+ points(A2.Normalized.Signal ~ A1.Normalized.Signal,
+ data=allgeno[[best5[i]]], col="hotpink", pch=16,
+ subset=!is.na(cg[,best5[i]]) & cg[,best5[i]] != cgi[,best5[i]])
+ }
The figures on the left, above, correspond to the bad markers (as defined by their apparent genotyping error rate); the figures on the right correspond to the good markers. The genotyping errors all seem to derive from the same problem: the intensities for heterozygotes overlap those for the two homozygotes.
5.3. Those seven bad mice
Now let’s try to figure out what’s going on with those seven bad
mice. I’ll grab the seven bad mice and a random seven of the other
“good” mice. We’ll first look at the "contrast"
column, which I
take to be a measure distinguishing between the two homozygous
clusters.
> badmice <- match(as.character(f2g$pheno$longid)[err.by.ind.prop > 0.05], allgeno[[1]]$Sample.Name)
> f2g$pheno$MouseNum[badmice]
[1] Mouse3635 Mouse3121 Mouse3225 Mouse3188 Mouse3580 Mouse3450 Mouse3566
566 Levels: 755356 Mouse133 Mouse134 Mouse135 Mouse136 Mouse2000 Mouse2001 Mouse2011 Mouse2012 Mouse2031 Mouse2272 ... Mouse3670
> goodmice <- sample(match(as.character(f2g$pheno$longid)[err.by.ind.prop < 0.05], allgeno[[1]]$Sample.Name), length(badmice))
> contr <- sapply(allgeno, function(a,b) a$Contrast[b], c(badmice, goodmice))
> par(mfrow=c(7,2), las=1)
> for(i in 1:7) {
+ hist(contr[i,], breaks=seq(-1.5, 1.5, len=101), xlab="Contrast",
+ main=cleang$pheno$MouseNum[badmice[i]])
+ hist(contr[i+7,], breaks=seq(-1.5, 1.5, len=101), xlab="Contrast",
+ main=cleang$pheno$MouseNum[goodmice[i]])
+ }
It looks like the “bad” mice are enriched for apparent homozygous
genotypes. Let’s look at that more directly by plotting the
number of genotyping errors against the proportion of markers with
"contrast"
< 0.5.
> allcontr <- sapply(allgeno, function(a) a$Contrast)
> plot(apply(allcontr, 1, function(a) mean(abs(a) < 0.5)),
+ apply(!is.na(cg) & (cg != cgi), 1, sum),
+ col=c("black", "blue", "red","green")[as.numeric(strain)],
+ ylab="Number of errors", xlab="Prop contrast < 0.5")
> legend("topright", col=c("black","blue","red","green"), pch=1,
+ pt.lwd=2, c("F2","B6","BTBR","F1"))
Six of the seven “bad” mice are at the lower range of the proportion of heterozygous genotypes.
We can use the a triangle plot (aka Holmans' triangle) to depict the genotype frequencies for each mouse, as another way to indicate this aspect of the problem in these mice.
> library(broman)
> gf <- apply(pull.geno(cleang, chr=c("-un", "-X")), 1,
+ function(a) table(factor(a, levels=1:3))/sum(!is.na(a)))
> triplot(labels=c("BB","BR","RR"))
> tripoints(gf[,-badmice], col=c("black", "blue", "red", "green")[as.numeric(strain)][-badmice], lwd=2)
> tripoints(gf[,badmice], col="orange", lwd=2)
> legend("topright", col=c("black","blue","red","green","orange"), pch=1,
+ pt.lwd=2, c("F2","B6","BTBR","F1","bad"))
6. How to deal with the genotyping errors?
I’m inclined to drop the seven mice with high genotyping error rates and so high numbers of observed crossovers. We might further seek to omit particularly bad markers, and even omit particular genotypes that look to be errors, but I think for the markers and the individual genotyping errors, it is sufficient to rely on the hidden Markov model (HMM) with assumed genotyping error rate.
So, let’s just drop the seven bad mice (having genotyping error rate > 5%).
> badmice <- names(err.by.ind.prop)[err.by.ind.prop > 0.05]
> badg <- subset(cleang, ind=!is.na(match(cleang$pheno$MouseNum, badmice)))
> cleang <- subset(cleang, ind=is.na(match(cleang$pheno$MouseNum, badmice)))
> f2g <- subset(cleang, ind=(cleang$pheno$Strain=="F2"))
7. Duplicate individuals
Let’s look for F2 individuals with nearly duplicate genotype data.
> cg <- comparegeno(f2g)
> cgu <- cg[lower.tri(cg)]
> par(las=1, mar=c(5.1,4.1,0.6,0.6))
> hist(cgu, main="", xlab="Proportion of matching genotypes",
+ breaks=seq(0, 1, len=201))
> rug(c(sort(cgu)[1:100], sort(cgu, dec=TRUE)[1:100]), col="blue")
There are five pairs of mice with nearly matching genotypes.
> wh <- which(cg > 0.8, arr=TRUE)
> wh <- wh[wh[,1] < wh[,2],]
> cbind(as.character(f2g$pheno$MouseNum)[wh[,1]], as.character(f2g$pheno$MouseNum)[wh[,2]])
[,1] [,2]
[1,] "Mouse3318" "Mouse3317"
[2,] "Mouse3287" "Mouse3290"
[3,] "Mouse3354" "Mouse3353"
[4,] "Mouse3559" "Mouse3553"
[5,] "Mouse3259" "Mouse3269"
It’s safe to assume that these are sample mix-ups, and that any mismatches are genotyping errors in one or the other cases. We’ll omit genotypes that mismatch.
> n.omit <- rep(0,nrow(wh))
> for(i in 1:nrow(wh)) {
+ for(j in 1:nchr(f2g)) {
+ g1 <- f2g$geno[[j]]$data[wh[i,1],]
+ g2 <- f2g$geno[[j]]$data[wh[i,2],]
+ toomit <- !is.na(g1) & !is.na(g2) & g1 != g2
+ f2g$geno[[j]]$data[wh[i,1], toomit] <- NA
+ f2g$geno[[j]]$data[wh[i,2], toomit] <- NA
+ n.omit[i] <- n.omit[i] + sum(toomit)
+ }
+ }
> n.omit
[1] 32 4 5 10 5
As seen, the five pairs have between 4 and 32 genotyping errors.
8. Estimated genetic map
Let’s re-estimate the genetic map to see how it differs from the map from Cox et al. (Genetics, 182:1335-1344, 2009). I’ll use a genotype error rate of 2/1000, and the Carter-Falconer map function (which corresponds, approximately, to the high level of crossover interference in the mouse).
First we should drop one mouse from each of the 5 duplicate pairs.
> f2g.nodup <- subset(f2g, ind=(-wh[,2]))
Now we pull out the current map and re-estimate the map with the
current data. In est.map()
, I use n.cluster=8
to run multiple
chromosomes in parallel (a feature added in R/qtl version 1.20-4).
> origmap <- pull.map(f2g.nodup, chr="-un")
> newmap <- est.map(f2g.nodup, chr="-un", err=0.002,
+ map.function="c-f", n.cluster=8)
-Running est.map via a cluster of 8 nodes.
Let’s look at summaries of the original and the newly estimated maps.
> cbind(summary.map(origmap), summary.map(newmap))[,c(1,5,2,6,3,7,4,8)]
n.mar n.mar.1 length length.1 ave.spacing ave.spacing.1 max.spacing max.spacing.1
1 156 156 96.8 98.2 0.625 0.634 8.18 5.89
2 135 135 98.1 100.8 0.732 0.752 11.06 9.78
3 157 157 79.9 75.0 0.512 0.481 5.35 4.77
4 126 126 80.6 83.0 0.645 0.664 9.31 7.65
5 125 125 87.0 92.7 0.701 0.747 18.44 18.40
6 102 102 76.7 96.0 0.759 0.950 12.42 18.15
7 109 109 86.7 85.3 0.803 0.789 12.03 14.34
8 91 91 73.3 76.6 0.814 0.851 8.67 10.80
9 93 93 69.6 70.0 0.756 0.760 6.40 5.56
10 123 123 74.7 84.2 0.612 0.690 5.15 6.59
11 124 124 81.6 77.8 0.664 0.632 7.73 9.37
12 116 116 50.1 50.4 0.436 0.439 3.09 3.69
13 116 116 65.2 66.7 0.567 0.580 5.37 5.47
14 91 91 57.5 60.6 0.639 0.674 5.27 6.00
15 102 102 56.2 57.6 0.556 0.571 5.65 5.25
16 66 66 55.2 53.3 0.849 0.820 11.45 15.99
17 60 60 55.3 54.2 0.938 0.919 4.65 5.83
18 95 95 56.8 53.2 0.604 0.566 5.05 4.08
19 50 50 43.7 46.5 0.891 0.950 6.91 8.67
X 20 20 75.5 73.5 3.972 3.868 23.90 23.77
overall 2057 2057 1420.4 1455.6 0.697 0.715 23.90 23.77
Overall, the map hasn’t changed much. Let’s plot the chromosome lengths against each other.
> par(las=1, mar=c(5.1,4.1,0.6,0.6))
> plot(summary.map(origmap)[-21,2], summary.map(newmap)[-21,2],
+ type="n", xlab="Original chr length (cM)",
+ ylab="New chr length (cM)")
> abline(0,1, lty=2, col="gray")
> text(summary.map(origmap)[-21,2], summary.map(newmap)[-21,2], c(1:19,"X"))
Let’s plot the map next to the one embedded in the cross (which corresponds to that of Cox et al. 2009). The map on the bottom is the newly estimated one.
> par(mar=c(5.1,4.1,0.6,0.6))
> plot.map(origmap, newmap, horizontal=TRUE, main="")
We can also plot the individual interval lengths against each other. It’s better to plot the difference against either the average or against one of the two lengths. We’ll plot the difference versus the original length (that is, in the Cox et al. maps).
I’ve highlighted in red those intervals that increased in length by more than 4 cM; the numbers indicate the chromosome for the interval.
> par(las=1, mar=c(5.1,4.1,0.6,0.6))
> do <- unlist(lapply(origmap, diff))
> dn <- unlist(lapply(newmap, diff))
> thechr <- rep(c(1:19,"X"), lapply(newmap, function(a) length(a)-1))
> plot(do, dn-do,
+ type="n", xlab="Original interval length (cM)",
+ ylab="Change (new - old) in interval length (cM)", ylim=c(-5,15))
> abline(h=0, lty=2, col="gray")
> points(do, (dn-do), col=c("black","red")[(dn-do>4)+1])
> text(do[dn-do>4]+0.25, (dn-do)[dn-do>4], thechr[dn-do>4], col="red", adj=c(0,0.5))
9. Load and reformat the physical map
Load and reformat the physical map, so that we can save it with the cross data.
> temp <- read.csv("Maps/finalmap.csv", as.is=TRUE)
> temp <- temp[!is.na(match(temp$marker, markernames(f2g))),]
> pmap <- vector("list", 20)
> names(pmap) <- c(1:19,"X")
> for(i in c(1:19,"X")) {
+ pmap[[i]] <- temp[temp[,2]==i,3]/10^6
+ names(pmap[[i]]) <- temp[temp[,2]==i,1]
+ class(pmap[[i]]) <- ifelse(i=="X", "X", "A")
+ }
> class(pmap) <- "map"
Are there really the same number of markers per chromosome and are the markers in the same order?
> all(sapply(pmap, length) == nmar(f2g["-un",]))
[1] TRUE
> all(unlist(lapply(pmap, names)) == markernames(f2g["-un",]))
[1] TRUE
10. Save the clean cross object
> save(cleang, f2g, newmap, pmap, file="Data/clean_cross.RData")
> save(badg, file="Data/bad_mice.RData")