Motif regular expression is a notation commonly used for representing motifs of amino acids
or nucleotides. The following conventions are used for motif regular expression:
Each character, written by itself, denotes a specific amino acid or nucleotide;
{X} denotes that any amino acid or nucleotide may be used except for 'X';
[XY] denotes that either 'X' or 'Y' may be used;
The question mark symbol ('?') denotes that the previous item may or may not appear;
Thus, the amino acid pattern "AR[ND]C?E" encompasses the four protein strings "ARNCE", "ARDCE", "ARNE", and "ARDE";
the DNA pattern "CC{T}AG" encompasses the three DNA strings "CCAAG", "CCCAG", and "CCGAG".