10.0  Retrieving Genomic Sequences for Gene Targeting Using the Available Web-Based Browsers

Kari Thompson and Yuji Hiwatashi




In order to construct expression and knockout moss lines first you must obtain DNA fragments located in the 5' and 3' regions of a targeting coding sequence. The size of the DNA fragments is usually between 1 and 2 kb. 


There are two websites; you can use to get these sequences:

JGI: http://genome.jgi-psf.org/Phypa1_1/Phypa1_1.home.html

Physcobase: http://moss.nibb.ac.jp/ http://moss.nibb.ac.jp/cgi-bin/blast-assemble



1.) Using JGI to retrieve genomic sequence:




1.)   Click on the BLAST tab.

2.)   Choose the alignment program blastn: blast nucleotide vs. nucleotide.

3.)   Leave the defaults for expect and word size OK for beginning.

4.)   Choose the database Physcomitrella patens v1.1 repeat masked main genome sequence. 

5.)   Paste the genomic or cDNA sequence of your gene of interest into the box for query sequence. 

6.)   You may enter your e-mail address if you would like the result e-mailed to you, but it is not necessary.

7.)   Click Submit job.

8.)   Please wait while the server searches for sequences that show similarity.  This may take a few minutes depending on queries ahead of yours.

9.)   The output will show you scaffolds that have similarity to your sequence in a graphical format.  Scaffolds that are red show the most similarity to your sequence, the first scaffold likely contains your gene.  Make a note of the scaffold number.

10.) Mouse over the graph and click on the red line for the scaffold. 

11.) At the top of the screen you will see a table.  This contains your query length, the scaffold number, and where your query sequence lies in the scaffold. 


For example:

This means that your query length was 564 bp and it is located in scaffold 41 between 556327 and 556890 bp.  It is best to make a note of the location of your cDNA sequence; it will be useful in the following steps.  You can click on others to view other scaffolds that contain your gene. 


Beneath the table you can see a graph that shows the similarity between your sequence and the scaffold.  


For example:



      Below this you can see the alignment of your query and the scaffold, if you click on seq next to scaffold seq you can retrieve the scaffold sequence the corresponds to your query. 


12.) Click on the scaffold name found in the first table (shown in step 11).  In this example you would click on scaffold_41.

13.) In this table you can see all of the contigs that were used to assemble the scaffold. 

14.) You can click on get sequence to retrieve the entire genomic sequence for the scaffold, but it is easier to scroll down and find the contig that contains the sequence of your cDNA.  In this case find the contig that contains the fragment mentioned in the first table: from 556327 to 556890 bp (shown in step 11).  This contig should contain enough genomic information for you to design your desired constructs.  If your gene is located at the beginning or the end of the contig you may need to get the entire sequence to make a good judgment of the genomic sequence. 


2.) Using PHYSCObase to retrieve genomic sequence. 




Blast Assemble Data Submissionhttp://moss.nibb.ac.jp/cgi-bin/blast-assemble


If you cannot find a genomic DNA sequence corresponding to your gene by blast searching with the JGI database, you should try to identify the sequence using “Blast Assemble Data Submission




1.)   Click on the link, or use this address: http://moss.nibb.ac.jp/cgi-bin/blast-assemble.  Or if you start at the homepage for PHSYCObase, click “DNA database” and then click “BLAST raw WGS sequence and assemble into contig

2.)   You may enter the name of your gene in the box for sequence name if you wish, but it is not necessary.

3.)   Paste the genomic or cDNA sequence (fasta format) of your gene of interest into the box for sequence; this is your query. 

4.)   Choose “nucleotide” and “Physcomitrella patens”.

5.)   Click Construct Contigs.

6.)   Please wait while the server searches for sequences that show similarity.  This may take about 10 minutes depending on queries ahead of yours.

7.)   After a few minutes, click get the highest score scaffold to retrieve a genomic DNA sequence corresponding to your gene.

8.)   Usually you can see three sequences in the result. The first and third sequences are represented by lowercase letters above or under ‘nnnn…nnnn’, respectively.  These represent possible, but not guaranteed, extreme 5’ and 3’ genomic sequences of your gene.  The second sequence is shown in capital letters between  nnnn…nnnn’. The second sequence should be overlapping with the sequence that you used as query and is usually enough for designing primers to make constructs.

9.)   Copy and paste the second sequence into a new file and use it to construct a contig with your original query sequence.