Mercator/MAVID CAF1 Alignments
Description
Multiple whole-genome alignments of Drosophila species were
generated by Mercator (an
orthology mapping program) and MAVID (a multiple
alignment program). These alignments were engineered by Anat Caspi.
Alignments
Click on a node
to download the alignment of
its descendants. Click on a species
to
download its pairwise alignment with D. melanogaster.
File formats
Orthology Maps
The file "map" gives a symmetric, 1-to-1 mapping (indicative of
monotopoorthology) between regions in the different genomes.
Lines in the map files are of the form:
[Segment #] [Chrom] [Start] [End] [Strand] ...
where the last 4 fields are repeated for each genome in the map. The
fields are tab-delimited. Coordinates are 0-based and half-open (the
end coordinate is one more than the coordinate of the last base
included in the segment). Pieces for which no orthologous region
could be identified in one of the genomes have "NA" in the fields for
the appropriate genomes. The order of the genomes in each line is
given by the order of the genomes in the name of the map (also given
in the file "genomes").
AGP Files
Mercator (the orthology mapping program) assembles
draft genomes during the construction of the orthology map.
Therefore, for draft genomes, the coordinates given in the map file
are in terms of the assembled contigs. For each draft genome
assembled in this way, and for a given map, there is a corresponding
AGP file specifying the mapping between the original sequence
contigs/scaffolds and the Mercator assembled contigs. A description
of the AGP file format can be found at NCBI
and UCSC.
Alignments
Alignments are provided for each colinear orthologous segment set
identified by the map. The multiple alignment for segment number
n is given in a single multi-fasta file "n/mavid.mfa", which
contains a record for each genome that is part of the segment. Note
that the order of the records in the multi-fasta file does not
necessarily correspond with the order in which the genomes are given
in the map file. The title of each record in the multi-fasta file is
the genome from which the sequence for that record is obtained.