The most usual way to use PHYSCObase would be searching EST, genomic sequence, and full-length cDNA clones with blast programs. We usually provide three datasets for blast searches.
Every database contains nucleotide sequences and three variation of blast program can be used; namely, blastn, tbastn, and tblastx. If you are searching for protein coding sequences, tblastn is recommended. In this case, you enter the amino acid sequence for query. You may use blastn search when you want to know if some genomic or cDNA sequence have corresponding EST. Blastn will be also useful when you have sequenced 3' RACE products and want to know if full-length clones are available.
An example tblastn result can be seen as cuc2-blast.html. CUC2 amino acid sequence was used as a query. A description list of the hit is as following.
Score E Sequences producing significant alignments: (bits) Value gnl|contig|Contig3017 Contig3017 162 1e-40 gnl|contig|pphn33o16 pphn33o16 128 2e-30 gnl|contig|pphb14p21 pphb14p21 57 5e-09 gnl|contig|Contig3604 Contig3604 46 1e-05 gnl|contig|pphb2h16 pphb2h16 30 1.1 gnl|contig|Contig3526 Contig3526 28 2.4 gnl|contig|Contig12137 Contig12137 27 5.3 gnl|contig|Contig7880 Contig7880 27 6.9 gnl|contig|pphn35f03 pphn35f03 27 9.0 gnl|contig|Contig3272 Contig3272 27 9.0
You see two contigs and two 5' end sequences hit with E values less than 1e-3 and four contigs and two 5' end sequences hit with E values above 1. The similarity with E value above 1 usually happen just by chance. If a strong similarity in a short region is observed, the match may be meaningful despite large E value. Such infomation can be read from the alignment. The alignment are shwon following the list of significant matches. You can jump to the corresponding alignment by clicking the score, which is left to the E value and usually displayed blue and underlined. In this case, I consider only the four sequences with E value less than 1e-3 to have significant similarity.
Once you find contigs or clones, you will see what clones the contig contains and the sequences of those clones from the other end. To do this you enther the contig number or clone name in the search box on the top page http://moss.nibb.ac.jp/.
This will search for matches in a table which contains clone name, putative transcript id, 5' contig number, and 3' contig number. If your query was a clone name, you will get just one line containing the clone name, putative transcript id, the identification number of contig to which the 5' sequence of the clone belongs (preceded by contig1), and the identification number of the contig to which the 3' sequence of the clone belongs (preceded with contig2).
A putative transcipt id is in the form of Pnnnnnn, where n represent a digit, and identifies a pair of contigs or a contig. When 5' and 3' seqnece of a clone belonged to a different contig, the two contigs are considered to represent 5' and 3' end sequences of a single transcript. When both end sequences are contained in a contig, the contig likely constitute itself a putative transcript. Inconsistency among clones nessesitated to use a bit more complex rule. When some clone ties contigA (5') and contigB (3') but another clone ties contigC (5') and contigB (3'), we treated them as different putative transcripts. When yet another clone have both sequences in contigB, we cannot specify which transcript it belongs and assign a new putative transcript id.
For example you enter Contig3017 to search for clones which has either end sequence in Contig3017. This will return a list as following. Only the first two lines are shown here, but you can see the complete page in another window.
pphb13d01 P007036 contig1 003017 pphb14e10 P007036 contig1 003017 contig2 003018
The putative transcript ID is linked to the putative transcript information page. A putative transcript page begins with links to blastx results with the conceptual putative transcript, 5' contig, and 3' contig; links to 5' and 3' contig information pages. The blastx result can tell you if your query is among the strong hit in nr dataset. In the case of P007036, the top hits are NAC proteins from arabidopsis and supports that P007036 represents a member of NAC family. The result is just like the original NCBI blast, but the taxon from which the gene was isolated are shown to the right of the E value. The taxon name is colored according to its phylogenetic position. Since the blast results are precalculated and stored, it is much faster to see the result than performing actual blast search. On the other hand, if you want to know the result with latest databse, retrieve the sequence of both contigs and perform the blast search elsewhere; for example, at NCBI.
Then a list of clones, produced in our EST project, belonging to the putative transcript follows. The list of clones contains the clone name, link to their sequences (Seq.), and description of the best hit sequence in a blastx search agaist the nr dataset.
Finally Alignment section contains a brief overview of relative position of clones belonging to both contigs. In the Alignment section, clones that has both end sequences in the contig defining the putative transcript are shown first, from the longest clone to the shortest clone. The relative length are estimated by the start point and end point in the 5' and 3' contigs respectively. Then clones with only one end belonging to the contig.
The clone name are linked to a search program, so that you can find the putative transcript the clone belongs when only one end of the clone was in one of the contigs. 5' sequneces are colored blue and 3' sequences are colored red. Dead clones which did not grow on replica plates are black and badly growing clones are gray. Genbank entries are shown green and linked to the entry.
Last modified: Wed Oct 8 10:27:34 JST 2003 $Id: usage.html,v 1.3 2004/03/22 02:08:14 tomoaki Exp $