Home > Research > Bioinformatics > FAQ
Research Info " User Research " Staff Research " Get an Account " Researcher Spotlight " Staff Spotlight " Video Library OSC Research Groups " Bioinformatics Resources " Web Portal Access " Software & Environments " BRC " Bioinformatics Users Group " Biomedical Applications Research Group " Program for Computational Reactive Mechanics Research Opportunities Click here to view available research opportunities by date. Check back often as this list updates frequently.
Bioinformatics Resources
Frequently Asked Questions What is the E Value in BLAST? Typically, one will find E−values in a BLAST search. (BLAST is a program that finds similar protein or nucleotide sequences to your target sequence). Here is a sample BLAST search using the arbitrary target nucleotide sequence: ggtaagtcctctagtacaaacacccccaatattgtgatataattaaaattatattcatat tctgttgccagaaaaaacacttttaggctatattagagccatcttctttgaagcgttgtc
Sequences producing significant alignments: gi|40730|emb|X07547.1|CTPLAS75 C. trachomatis plasmid DNA f... gi|4691224|emb|X06707.2|CTPLASCR Chlamydia trachomatis cryp... gi|144462|gb|J03321.1|CH1L1CG Plasmid pCHL1 (from C.trachom... gi|144607|gb|M19487.1|PLMORF Chlamydia trachomatis plasmid ... gi|473196|emb|X78726.1|CTPLORF Chlamydia muridarum Nigg II ... gi|7190951|gb|AE002162.1|AE002162 Chlamydia muridarum plasm... gi|23494922|gb|AE014830.1| Plasmodium falciparum 3D7 chromo... gi|16555445|emb|AL139418.9| Human DNA sequence from clone R...
Score (S) Value
E (bits)
238
2e−60
224
3e−56
214
3e−53
206
7e−51
52
2e−04
52
2e−04
44
0.058
44
0.058
• The S score is a measure of the similarity of the query to the sequence shown. • The E−value is a measure of the reliability of the S score. • The definition of the E−value is: The probability due to chance, that there is another alignment with a similarity greater than the given S score. E−value Equation • The actual equation is E=Kmn(e−λS) ♦ The parameters K and λ represent natural scales for the search space and the scoring system respectively. ♦ The rest of the equation represents the size of the query (m), the size of the database (n), and of course the S score. The Size of the E−value • The typical threshold for a good E−value from a BLAST search is e−5=(10−5) or lower.
1
• The reason for such low values is that an E=0.001 in a million entry database would still leave 1000 entries due to chance. An E=e−6 would only leave one entry due to chance. Problems with the E−value 1. Tends to be conservative when the query sequence is short (simply cannot achieve high S scores). 2. Statistical theory breaks down with gaps in sequences, so gap scores are used. 3. Some sequences have areas of “low complexity,” that will show artificial similarity with other sequences. BLAST attempts to control for all of these problems, however, they are important to keep in mind. E−value Summary • Ideally one wants to run a query on BLAST with a long, unified sequence. • An E<e−5 of an alignment means that that alignment is highly unique, and not due to error. • An E≥e−6 means that the alignment might be strong, but more research is needed to verify. References • http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul−1.html • http://bioinfo.tau.ac.il/Bioinfo−Course/Un_course2/420,53,Smith and Waterman • http://folk.uio.no/einarro/Presentations/blast_statistics.html Click here for a printable version of this page. About OSC " Support " Staff Directory " Contact Us " Site Map " Springfield OSC, 1224 Kinnear Road, Columbus, OH 43212 ph: 614.292.9248 fax: 614.292.7168 OSC is an initiative of the Ohio Board of Regents Copyright © 2005 OSC All Rights Reserved
2