The name of the query motif, which is unique in the motif database file.
An alternate name for the query motif, which may be provided in the motif database file.
The width of the motif. No gaps are allowed in motifs supplied to MAST
as it only works for motifs of a fixed width.
The sequence that would achieve the best possible match score (and its
reverse complement for nucleotide motifs).
MAST computes the pairwise correlations between each pair of motifs.
The correlation between two motifs is the maximum sum of Pearson's
correlation coefficients for aligned columns divided by the width of
the shorter motif. The maximum is found by trying all alignments of the
Motifs with correlations below 0.60 have little effect on
the accuracy of the E-values computed by MAST. Motifs with higher
correlations with other motifs should be removed from the query. You can
also request MAST to remove redundant motifs from its analysis
under Advanced options from the MAST web page,
or by specifying --remcorr
when running MAST on your own computer.
The name of the (FASTA) sequence database file.
The number of sequences in the database.
The number of letters in the sequence database.
The date of the last modification to the sequence database.
The name of a file of motifs ("motif database file") that contains the (MEME-formatted) motifs used in the search.
The date of the last modification to the motif database.
The name of the alphabet symbol.
The frequency of the alphabet symbol as defined by the background model.
The score for the match of a position in a sequence to a motif is
computed by summing the appropriate entry from each column of the
position-dependent scoring matrix that represents the motif. Sequences shorter than one or more of the motifs are skipped.
The p-value of a motif match is the probability of a single random
subsequence of the length of the motif
at least as well as the observed match.
The identifier of the sequence (from the FASTA sequence header line). This maybe be linked to search a sequence database for the sequence name.
The description appearing after the identifier of the sequence in the FASTA header line.
This diagram shows the normal spacing of the motifs specified to MAST.
MAST will calculate larger p-values for sites that diverge from the order and spacing in the diagram.
If strands were scored separately then there will be two
E-values for the sequence separated by a slash (/). The score for the
provided sequence will be first and the score for the reverse-complement
will be second.
The block diagram shows the best non-overlapping tiling of motif matches on the sequence.
These motif matches are the ones used by MAST to compute the E-value for the sequence.
Hovering the mouse cursor over a motif match causes the display of the motif name,
position p-value of the match and other details in the hovering text.
The length of the line shows the length of a sequence relative to all the other sequences.
A block is shown where the position p-value
of a motif is less (more significant) than the significance threshold,
which is 0.0001 by default.
If a significant motif match (as specified above) overlaps other
significant motif matches, then it is only displayed as a block if its
position p-value is less (more significant) then the
product of the position p-values of the significant matches that it
The position of a block shows where a motif has matched the sequence.
Complementable alphabets (like DNA) only: Blocks displayed above the line are a match on the given sequence, whereas blocks
displayed below the line are matches to the reverse-complement of the given sequence.
Complementable alphabets (like DNA) only: When strands are scored separately, then blocks may overlap on opposing strands.
The width of a block shows the width of the motif relative to the length of the sequence.
The colour and border of a block identifies the matching motif as in the legend.
Note: You can change the color of a motif by clicking on the motif in the legend.
The height of a block gives an indication of the significance of the match as
taller blocks are more significant. The height is calculated to be proportional
to the negative logarithm of the position p-value,
truncated at the height for a p-value of 1e-10.
If strands were scored separately with a complementable alphabet then
there will be two p-values for the sequence separated by a slash (/).
The score for the given sequence will be first and the score for the
reverse-complement will be second.
This indicates the offset used for translation of the DNA.
The annotated sequence shows a portion of the sequence with the
matching motif sequences displayed above.
The displayed portion of the sequence can be modified by sliding the
two buttons below the sequence block diagram so that the portion you want
to see is between the two needles attached to the buttons. By default the
two buttons move together but you can drag one individually by holding
shift before you start the drag.
If the strands were scored separately then overlaps in motif sites may
occur so you can choose to display only one strand at a time. This is done
by selecting "Matches on given strand" or "Matches on opposite strand"
from the drop-down list.
The sequence p-value of a score is defined as the probability of a
random sequence of the same length containing some match with as good or
better a score.
The combined p-value of a sequence measures the strength of the match
of the sequence to all the motifs and is calculated by
finding the score
of the single best match of each motif to the sequence (best matches
The E-value of a sequence is the expected number of sequences in a
random database of the same size that would match the motifs as well as
the sequence does and is equal to the combined p-value of
the sequence times the number of sequences in the database.
Change the portion of annotated sequence by dragging the buttons; hold shift to drag them individually.
If you use MAST in your research, please cite the following paper:
Timothy L. Bailey and Michael Gribskov,
"Combining evidence using p-values: application to sequence homology searches",
Bioinformatics, 14(1):48-54, 1998.
Each of the following 56 sequences has an E-value less than
The motif matches shown have a position p-value less than 0.0001. Hover the cursor over the sequence name to view more information about a sequence. Hover the cursor over a motif for more information about the match. Click on the arrow (↧) next to the E-value to see the sequence surrounding each match.