Assignment 6
Select one of your interesting sequences from the database (sequence should be longer than 300 base pair) to do the BLAST search and answer the following questions:
a. What are the different between 6 BLASTs(blastn, blastp, blastx, tblastn, tblastx, PSI-BLAST)?
blastn: Search a nucleotide database using a nucleotide query
blastp: Search protein database simply compares a protein query to a protein database using a protein query
blastx: Search protein database using a translated nucleotide query
tblastn: Search translated nucleotide database using a protein query
tblastx: Search translated nucleotide database using a translated nucleotide query
PSI-BLAST (protein-specific iterated BLAST): Search protein database using a protein query, allowing the user to build a PSSM (position-specific scoring matrix) using the results of the first BlastP run.)
b. Use your sequence to do 3 out of 6 BLASTs and discuss “What’s the strength and weakness of BLAST you have selected?”
Human hexose-6-phosphate dehydrogenase is chosen. Retrieve the nucleotide sequence from GenBank (http://www.ncbi.nlm.nih.gov/).
Search nucleotide for Homo sapiens hexose-6-phosphate dehydrogenase (glucose 1-dehydrogenase) (H6PD). NCBI accession number NM_004285.2
Click on “FASTA” and save the sequence as a text file. Then, blast the nucleotide sequence with BLAST program available on http://blast.ncbi.nlm.nih.gov/Blast.cgi.
The BLAST programs chosen are blastn, blastx, and tblastx.
Click on “blastn” from this page. Paste the nucleotide sequence in FASTA format.
Under “Choose Search Set”, choose “others” checkbox to choose nucleotide database including every organism.
Under “Program Selection” section, choose “somewhat similar sequence (blastn)” checkbox. Then, click on “BLAST” button. The result page will show up as following:
The BLAST result will be shown
blastx is done in a similar way to blastn.
Paste the nucleotide sequence
Choose database as “nr”
Then, click on “BLAST” button.
The result is shown
tblastx: paste the nucleotide sequence and choose database as “nr/nt”
The error occurs. The program cannot operated within the time allowed because the search is too large. No result is received.
The strength and weakness of the BLASTs chosen are:
blastn: it searches nucleotide query in nucleotide database. Consequently, it does not require much of time to operate since it aligns nucleotide query to nucleotide database. The nucleotide of query and result must be exact to be scored. Consequently, it is rather specific but if there is a polymorphism of nucleotide(s), that position is not scored. As a result, the total score is less than it should be since that position might not be significantly different as they give the same amino acid.
blastx: it seaches translated nucleotide in protein database. It takes sometimes to process the translation of the nucleotide query in all reading frame. However, the same amino acid may result from different codons. Translatinging nucleotide sequence into amino acid sequence is probably increasing the chance to identify a protein that their nucleotide sequences may differ due to genetic variation of codons. Moreover, the reading frame of translation might not be corrected as all reading frame are employed. It can be distinguish whether which reading frame is corresponded to the real reading frame of that gene.
tblastx: it searches translated nucleotide query in translated nucleotide database. Hence, it takes a plenty of time to process as well as much of CPU usage. This program essentially increases the chance of finding possible result as all reading frame of translated nucleotide in database and the nucleotide query are aligned. Incorrect reading frame may result but it provides all the possibility of the result that could be.
c. Show us the first hit on each BLAST with their identity or/and similarity scores.
blastn: NM_004285.3 Homo sapiens hexose-6-phosphate dehydrogenase, E-value 0.0, Maximum identity 100%
blastx: NP_004276.2 hexose-6-phosphate dehydrogenase precursor, E-value 0.0
tblastx: no result is obtained.
d. Summarize the result from 3 BLASTs you select.
blastn and blastx gave out the same result, which is hexose-6-phosphate dehydrogenase of human, with E-value = 0.0. Zero E-value means that the sequence of query is identical to that of the result, giving its reliability. tblastx could not operate the request as it requires too much CPU usage to translate a long nucleotide sequence and locally aligns them to the translated nucleotide database. blastn could be a potential tool since it is fast and accurate.











