Whole-Genome Homology Maps (Old methods)
[Introduction]
[Terminology]
[Methods]
[Downloads]
[Acknowledgements]
[References]
Whole-genome homology maps attempt to identify the evolutionary relationships
between and within multiple genomes. The term "syntenic" is often used to
describe regions of multiple genomes that are believed to have evolved from the
same region in an ancestral genome. However, it has been pointed out that this
use of the term is incorrect (Passarge et al. 1999) and thus
we will use the terms "homologous", "orthologous", and "paralogous" instead.
Ideally, given N genomes, we would like to identify all orthologous genomic
regions as well as paralogous regions within each genome and hypothetical
ancestral genome. Maps listing these relationships are extremely valuable to
researchers performing comparative analyses of genomic sequence. Presented
here is initial work on creating an orthology map for the human, mouse, and rat
genomes.
- Resolution
- The length of the smallest rearrangement that is identified by the
map.
- Segmentation
- The maximum length of mapped regions for each map piece. Segmented maps
are useful for comparative programs that require smaller input sizes.
-
Human (NCBI build 30, Jun. 2002),
Mouse (MGSC v3, Feb. 2002),
Rat (RGSC v2, Nov. 2002) Map
-
The symmetric human-mouse map used in the mouse paper (Mouse Genome Sequencing Consortium 2002, map constructed
by Michael Kamal) was used as the
basis for the three-way map.
-
The human-mouse map was broken into 300kb pieces, with care taken not to break
up known and predicted genes.
-
Using the Berkeley genome alignment (Couronne et al. 2003, Bray et
al. 2003) of the mouse and rat genomes, candidate orthologous rat regions
were determined for each piece of the human-mouse map. For each piece, the rat
region with the largest number of matches in the alignment with the mouse
region was selected.
-
Human (NCBI build 31, Nov. 2002),
Mouse (MGSC v3, Feb. 2002),
Rat (RGSC v2, Nov. 2002) Map
-
The Jun. 2002 human, Feb. 2002 mouse, and Nov. 2002 rat map described
above was updated to the Nov. 2002 human freeze by the same methods used
to extend the human/mouse map to the rat. The Berkeley genome
alignment of the Nov. 2002 human and Feb. 2002 mouse genomes was used
to update the map.
Lines in the map files are of the form:
[Segment Label] [Piece #] [Chrom] [Start] [End] [Strand] ...
where the last 4 fields are repeated for each genome in the map. The
fields are tab-delimited. For coordinates on the reverse strand "-",
the start coordinate is greater than the end coordinate. Pieces for
which no orthologous region could be identified in one of the genomes
have "NA" in the fields for the appropriate genomes. The order of the
genomes in each line is given by the order of the genomes in the name
of the map file.
-
Human (NCBI build 30, Jun. 2002),
Mouse (MGSC v3, Feb. 2002),
Rat (RGSC v2, Nov. 2002) Map
-
Human (NCBI build 31, Nov. 2002),
Mouse (MGSC v3, Feb. 2002),
Rat (RGSC v2, Nov. 2002) Map
-
The map construction was done by Colin Dewey, with assistance from
Lior Pachter.
-
Thanks to Alex
Poliakov for help with accessing the Berkeley genome alignments.
-
Thanks to Michael Kamal for his human-mouse map.
-
Bray, N., Dubchak, I., and Pachter, L. 2003. AVID: A global
alignment program. Genome Research,
13:97-102.
-
Couronne, O., Poliakov, A., Bray, N., Ishkhanov, T., Ryaboy, D., Rubin, E.,
Pachter, L., Dubchak, I. 2003. Strategies and Tools
for Whole-Genome Alignments. Genome
Research, 13:73-80.
-
Mouse Genome Consortium. 2002.
Initial sequencing and comparative analysis of the mouse genome. Nature, 420(6915):520-562.
-
Passarge, E., Horsthemke, B., and Farber, R.A. 1999.
Incorrect use of the term synteny.
Nature Genetics, 23:387
baboon.math.berkeley.edu