RESTRICTION MAPPING BY USING BIO EDIT TOOL AIM: To do the restriction mapping of the given sequence by using Bio Edit tool. DESCRIPTION: Bioedit is a biological sequence alignment editor written for windows of 51981 NT. A rich intuitive multiple document interface with many convenient features makes alignment, manipulation and viewing of sequences relatively quick and easy on desktop. Several sequences manipulation and analysis options and fully automated links to local and www- based analysis programs facilitate an integrated working environment which allows to view, align and analyze sequences from a single application with simple point and click operations. SOURCE: http://www.mbio.ncsu.edu/bio edit/page2.Html
METHOD: 1) Collect the sequence for which restriction mapping has to be done in ‘fasta’ format from NCBI. 2) Open the source website www.mbio.ncsu.edu/bio edit/page2.Html 3) Download the bio edit tool by using the source website. 4) Open the query sequence inside the tool in the given space. 5) Select the sequence and then do the editing and restriction mapping by clicking restriction mapping. 6) Save the result page in which sequence has been mapped.
1
INPUT: ACCESSION NO – EF 183474 Oryza sativa
OUTPUT:
2
INTERPRETATION: Restriction mapping of the given sequence has been done, it gives the cutting number of the various restriction enzymes like BsmI, XcaI, etc. It shows the location of the restriction site of various enzymes also. This tool is used for recombinant DNA technology for finding the cutting sites of restriction enzymes present in particular sequence.
3
PRIMER DESIGNING AIM: To design the primer of the given query sequence by the using ‘PRIMER 3’ primer
design tool.
DESCRIPTION: Primer 3 is a tool used to choose primes for PCR reactions. Primer 3’s design is heavily based on earlier implementations of similar programs: Prime (0.5) and primer V2. Primer 3 can also design hybridization probes and sequencing primers. SOURCE: http:// biotools.umassmed.edu/bioapps/primer 3_www.cgi. METHOD: 1) Collect the sequence for which primer has to design, in Fasta format from NCBI home page. 2) Open the source website: biotools.umassmed.edu/bioapps/primer 3_www.cgi. 3) Paste the sequence in fasta format in the space of the home page of the website. 4) Set the defaults and click ‘pick primers’ to get the result. INPUT: ACCESSION NO –EF 183474 Oryza sativa
4
OUTPUT:
INTERPRETATION: Primers were designed by using tools. Left primer and Right primer have designed, some other oligos also used for designing.
5
SEQUENCE RETRIVEL NCBI AIM: To retrieve the nucleotide for the given accession number from the NCBI nucleotide sequence database .DESCRIPTION :ethods for determining DNA sequences were first described in 1972. since then, a wealth of sequence information has been obtained and deposited
in several essential centralized
locations. These generalized databases includes: Genbank EMBL
DDBJ Databases and databases analysis tools allow a researcher to probe for a desired sequence. The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health. The NCBI has had responsibility for making available the GenBank DNA sequence database since 1992. GenBank coordinates with individual laboratories and
other sequence databases such as those of the European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ). SOURCE : http://www.ncbi.nlm.nih.gov/ METHOD: 1. The NCBI home page in logged on using the websites. 2. On the home page, nucleotide option was clicked to retrieve nucleotide sequence respectively. 3. The accession no. or our gene of intrest of our query sequence is entered in the search page. 4. ‘Go’ button next to search tool bar was clicked. 5. The page containing the result matching to our query was displayed. 6. The required result is obtained by clicking on the link provided in the result page. 7. The sequence of our interest was selected and copied to a note pad and save.
6
INPUT: ACCESSION NO:- XM 001095299 Brugia malayi protein kinase domain containing protein partial mRNA.
OUTPUT:
INTERPRETATION:
Nucleotide and protein sequence has been retrieved using NCBI sequence database.
7
EMBL AIM: To retrieve the nucleotide for the given accession number from the EMBL nucleotide sequence database. DESCRIPTION: The European Molecular Biology Laboratory (EMBL) is a molecular biology research institution supported by 20 countries comprising nearly all of western Europe and Israel. The cornerstones of EMBL's mission are: to perform basic research in molecular biology, to train scientists, students and visitors at all levels, to offer vital services to scientists in the member states, to develop new instruments and methods in the life sciences, and to actively engage in technology transfer.
SOURCE: http://www.ebi.ac.uk/embl/
METHOD: 1. The EMBL home page in logged on using the websites. 2. The accession no. or our gene of intrest of our query sequence is entered in the search page. 3. ‘Go’ button next to search tool bar was clicked. 4. The page containing the result matching to our query was displayed. 5. The required result is obtained by clicking on the link provided in the result page. 6. The sequence of our interest was selected and copied to a note pad and save.
INPUT: ACCESSION NO:- M36407 It is alpha-tubulin of Tetrahymena thermophila.
8
OUTPUT:
INTERPRETATION: Nucleotide sequence has been retrieved using NCBI sequence database.
9
Swissprot
AIM: To retrieve the nucleotide for the given accession number from the Swissprot nucleotide sequence database. DESCRIPTION
:
Swiss-Prot is a manually curated biological database of protein sequences. Swiss-Prot was created in 1986 by Amos Bairoch during his PhD and developed by the Swiss-Prot and its automatically curated supplement TrEMBL, have joined with the Protein Information Resource protein database to produce the UniProt Knowledgebase, the world's most comprehensive catalogue of information on proteins.[2] As of 3 April 2007, UniProtKB/SwissProt release 52.2 contains 263,525 entries. As of 3 April 2007, the UniProtKB/TrEMBL release 35.2 contains 4,232,122 entries.
SOURCE: http://www.ebi.ac.uk/swissprot/ METHOD: 1.The PIR home page in logged on using the websites. 2.The accession no. or our gene of interest of our query sequence is entered in the search page. 3.‘Go’ button next to search tool bar was clicked. 4.The page containing the result matching to our query was displayed. 5.The required result is obtained by clicking on the link provided in the result page. 6.The sequence of our interest was selected and copied to a note pad and save.
INPUT: ACCESSION NO:- Q7JQD4 It is alpha-3 tubulin protein of Caemorhabditis elegans.
10
OUTPUT:
INTERPRETATION: The sequence for the given accession number has been retrieved from the swissprot protein database.
11
SEQUENCE FORMAT CONVERSION SQUIZZ AIM : To convert the given sequence in NCBI format to EMBL format using SQUIZZ as format conversion tool. DESCRIPTION: All the tools available for analysis of biological data(sequences), requires data in different formats. T o change the same data in different formats to make it acceptable to different sequence analysis tools, we require the sequence format conversion tools. There are different tools available at the web site. SQUIZZ allows the verification of sequence or sequence alignment format and conversion in To the following formats:•
CLUSTAL
•
EMBL
•
FASTA
•
GCG
•
GDE
•
GENBANK
•
NBRF
•
MSF
•
Phyllip
SOURCE; http://bioweb.pasteur.fr/sequenal/interface/squizz.html METHOD: 1. The home page of sequence conversion tool was opened by typing “sequence conversion tool” in the google search tool bar. 2.
Then the sequence format conversion hyperlink was clicked on open page.
3. SQUIZZ hyperlink was clicked to open this page.
4. A nucleotide sequence was taken in NCBI format and put in hyperlink Actual data here. 5. SQUIZZ was run
12
6 . Format was converted into changed format from hyperlink Convert into format. 7. Results in changed format were obtained and saved to notepad.
INPUT: ACCESSIN NO.- NM_0001111544 Zea mays chitinase mRNA
OUTPUT:
INTERPRETATION: Given nucleotide sequence was converted from genbank to EMBL format using SQUIZZ sequence format conversions tool.
13
READSEQ AIM : To convert the given sequence in EMBL format to FASTA format using READSQ format conversion tool.
DESCRIPTION: Sequence format conversion inputs DNA or amino acid sequence of specified format.
Input
format is determined automatically. Automatically detects input format and converts into following formats: •
CLUSTAL
•
EMBL
•
FASTA
•
GCG
•
GDE
•
GENBANK
•
NBRF
•
MSF
•
Phyllip
In the present exercise we have converted EMBL format to FASTA using READseq conversion tool. SOURCE:
http://bioweb.pasteur.fr/sequenal/interface/readseq.cgi METHOD: 1. The home page of sequence conversion tool was opened by typing “sequence conversion tool” in the google search tool bar. 2.
Then the sequence format conversion hyperlink was clicked on open page.
3. READSEQ hyperlink was clicked to open this page.
4. A protein sequence was taken in EMBL format and put in hyperlink Actual data here. 5. SQUIZZ was run.
6. Format was converted into fasta format from hyperlink Convert into format.
14
7. Results in changed format were obtained and saved to notepad. INPUT: ACCESSIN NO.-Q387771 CHITINASE PRECURSOR
OUTPUT;
INTERPRETATION: Given protein sequence was converted from EMBL to FASTA format using READSEQ sequence format conversions tool.
15
MVIEW AIM : To convert the given sequence in FASTA format to GSF format using MVIEW as format conversion tool. DESCRIPTION: All the tools available for analysis of biological data(sequences), requires data in different formats. T o change the same data in different formats to make it acceptable to different sequence analysis tools, we require the sequence format conversion tools. There are different tools available at the web site. SQUIZZ allows the verification of sequence or sequence alignment format and conversion in To the following formats:•
CLUSTAL
•
EMBL
•
FASTA
•
GCG
•
GDE
•
GENBANK
•
NBRF
•
MSF
•
Phyllip
SOURCE; : http://searchlauncher.bcm.tmc.edu/cgl-bin/seq-util/readseq.pl METHOD: 1. The home page of sequence conversion tool was opened by typing “sequence conversion tool” in the google search tool bar. 2.
Then the sequence format conversion hyperlink was clicked on open page.
3. SQUIZZ hyperlink was clicked to open this page.
4. A nucleotide sequence was taken in NCBI format and put in hyperlink Actual data here.
5. Format was converted into format from hyperlink Convert into format. 6. SQUIZZ was run.
16
7. Results in changed format were obtained and saved to notepad.
INPUT: ACCESSION NO.: AAA40590
OUTPUT;
INTERPRETATION: Given nucleotide sequence was converted from genbank to EMBL format using SQUIZZ sequence format conversions tool.
17
FMTSEQ AIM : To convert the given sequence in EMBL format to CLUTAL format using FMTSEQ format conversion tool.
DESCRIPTION: Format conversion tool converts sequence between 22 sequence format types. FMTSEQ converts sequence between many formats including among •
CLUSTAL
•
EMBL
•
FASTA
•
GCG
•
GDE
•
GENBANK
•
NBRF
•
MSF
•
Phyllip
SOURCE; http://evol.biology.mcmaster.ca/seqanal/tmp/fmt.seq/A27358120711907/fmtseq.out
METHOD: 1. The home page of sequence conversion tool was opened by typing “sequence conversion tool” in the google search tool bar. 2.
Then the sequence format conversion hyperlink was clicked on open page.
3. FMTSEQ hyperlink was clicked to open this page.
4. A nucleotide sequence was taken in EMBL format and put in hyperlink Actual data here. 5. FMTSEQ was run.
6. Format was converted into format from hyperlink Convert into format. 7. Results in changed format were obtained and saved to notepad.
18
INPUT: ACCESSIN NO – A3242649 Plasmino protein of rattus norvegicus
OUTPUT;
INTERPRETATION: Given nucleotide sequence was converted from EMBL to CLUSTAL format using FMTSEQ sequence format conversions tool.
19
SREFORMAT AIM : To convert the given sequence in NCBI format to PIR format using SREFORMAT as format conversion tool. DESCRIPTION: SreFormat allows the user to convert one sequence format conversion to another conversion. It can accept the sequence in following format : •
CLUSTAL
•
EMBL
•
FASTA
•
GCG
•
GDE
•
GENBANK
•
NBRF
•
MSF
•
Phyllip
SOURCE; http://bioweb.pasteur.fr/sequenal/interface/SreFormat.html
METHOD: 1. The home page of sequence conversion tool was opened by typing “sequence conversion tool” in the google search tool bar. 2.
Then the sequence format conversion hyperlink was clicked on open page.
3. SreFormat hyperlink was clicked to open this page.
4. A Protein sequence was taken in NCBI format and put in hyperlink Actual data here. 5. Sreformat was run.
6. Format was converted into PIR format from hyperlink Convert into format. 7. Results in changed format were obtained and saved to notepad.
20
INPUT: ACCESSIN NO – NP_001016540 It is plasminogen of Macaca mulatta
OUTPUT;
INTERPRETATION:
21
Given protein sequence was converted from NCBI to PIR format using SreFormat sequence format conversions tool.
ORF FINDER
AIM: To find the open reading frame for the direct and the reverse strand
DESCRIPTION: ORF Finder searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF Finder to search newly sequenced DNA for potential protein encoding segments. ORF Finder supports the entire IUPAC alphabet and several genetic codes.
SOURCE: www.bioinformatics.org/sms2/
INPUT:
ACCESSION NO – EF 183474 Oryza sativa indica
22
OUTPUT:
INTERPRETATION: By using ORF finder tool we have fond out the open reading frame for Oryza sativa gene of accesseion no: EF 183474.
23
HOMOLOGY SEARCH The term sequence analysis in biology implies subjecting a DNA or peptide sequence to sequence alignment, sequence database, repeated sequence searches or other bioinformatics methods on a computer.
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. The BLAST program can either be downloaded and run as a command-line utility "blastall" or accessed for free over the web. The BLAST web server, hosted by the NCBI, allows anyone with a web browser to perform similarity searches against
constantly updated databases of proteins and DNA that include most of the newly sequenced organisms. BLAST is actually a family of programs (all included in the blastall executable). The following are some of the programs, ranked mostly in order of importance: Nucleotide-nucleotide BLAST (blastn) :This program, given a DNA query, returns the most similar DNA sequences from the DNA database that the user specifies. Protein-protein BLAST (blastp) :This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies. Nucleotide 6-frame translation-protein (blastx) : This program compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx) : This program is the slowest of the BLAST family. It translates the query nucleotide sequence in all six possible frames and compares it against the six-frame translations of a nucleotide sequence database. The purpose of tblastx is to find very distant relationships between nucleotide sequences. Protein-nucleotide 6-frame translation (tblastn) : This program compares a protein query against the six-frame translations of a nucleotide sequence database.
24
NUCLEOTIDE BLAST Search a nucleotide database using a nucleotide query AIM: To search a nucleotide similar to more sequences.
DESCRIPTION : BLAST is one of the most widely used bioinformatics programs, because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. To run, BLAST requires two sequences as input: a query sequence (also called the target sequence) and a sequence database. BLAST will find subsequences in the query that are similar to subsequences in the database. Nucleotide-nucleotide BLAST (blastn) : This program, given a DNA query, returns the most similar DNA sequences from the DNA database that the user specifies.
METHOD: 1. Go to NCBI home page. 2. Click on Blast. 3. Click on nucleotide blast. 4. Paste a query sequence in FASTA format. 5. Choose nucleotide collection (nr/nt) in the database. 6. Run blast. 7. Select the most similar sequence which has maximum identity percentage and least ‘e’ value.
SOURCE: http://www.ncbi.nlm.nih.gov/blast/Blast.cgl
25
INPUT: ACCESSIN NO – NM_001111544 Chitinase of zea mays
OUTPUT:
INTERPRETATION:
26
By using nucleotide blast we are able to get nucleotide sequence with maximum similarity. The accession number for homologous sequence is: AY532754
PROTEIN BLAST Search a Protein database using a protein query AIM: To search a protein similar to more sequences.
DESCRIPTION : BLAST is one of the most widely used bioinformatics programs, because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. To run, BLAST requires two sequences as input: a query sequence (also called the target sequence) and a sequence database. BLAST will find subsequences in the query that are similar to subsequences in the database. Protein-protein BLAST (blastp): This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies METHOD: 1. Go to NCBI home page. 2. Click on Blast. 3. Click on protein blast. 4. Paste a query sequence in FASTA format. 5. Choose protein collection (nr) in the database. 6. Run blast. 7. Select the most similar sequence which has maximum identity percentage and least ‘e’ value.
SOURCE: http://www.ncbi.nlm.nih.gov/blast.cgi#24657901
27
INPUT: ACCESSIN NO- NP_566800 Ligase of Arabidopsis thaliana
OUTPUT:
INTERPRETATION: By using proetin blast we are able to get protein sequence with maximum similarity.
28
The accession number for homologous sequence is: NP_563915
BLASTX Search a protein database using a translated nucleotide query AIM: To search a protein similar to more sequences. DESCRIPTION : BLAST is one of the most widely used bioinformatics programs, because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. To run, BLAST requires two sequences as input: a query sequence (also called the target sequence) and a sequence database. BLAST will find subsequences in the query that are similar to subsequences in the database. Nucleotide 6-frame translation-protein (blastx) This program compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.
METHOD: 1. Go to NCBI home page. 2. Click on Blast. 3. Click on blastx. 4. Paste a query EST sequence in FASTA format. 5. Choose non-reductant protein sequence (nr) in the database. 6. Run blast. 7. Select the most similar sequence which has maximum identity percentage and least ‘e’ value.
SOURCE: http://www.ncbi.nih.gov/blast/Blast.cgi
29
INPUT: ACCESSIN NO -NM_001111544 Chitinase of zea mays
OUTPUT:
INTERPRETATION: By using blastx we are able to get protein sequence with maximum similarity. The accession number for homologous sequence is: AAT40013
30
tBLAST N Search a translated nucleotide database using a protein query AIM: To search a translated nucleotide similar to more sequences. DESCRIPTION : BLAST is one of the most widely used bioinformatics programs[2], because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx) This program is the slowest of the BLAST family. It translates the query nucleotide sequence in all six possible frames and compares it against the six-frame translations of a nucleotide sequence database. The purpose of tblastx is to find very distant relationships between nucleotide sequences. METHOD: 1. Go to NCBI home page. 2. Click on Blast. 3. Click on tblastN.. 4. Paste a query sequence in FASTA format. 5. Choose nucleotide collection (nr/nt) in the database. 6. Run blast. 7. Select the most similar sequence which has maximum identity percentage and least ‘e’ value.
SOURCE: http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
INPUT: ACCESSIN NO – AAA32641
31
It is chitinase protein
OUTPUT:
INTERPRETATION: By using tblastN we are able to get translated nucleotide sequence with maximum similarity. The accession number for homologous sequence is: NM_1135932
32
tBLASTX
Search a translated nucleotide database using a translated nucleotide query AIM: To search a translated nucleotide similar to more sequences.
DESCRIPTION : BLAST is one of the most widely used bioinformatics programs[2], because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. To run, BLAST requires two sequences as input: a query sequence (also called the target sequence) and a sequence database. BLAST will find subsequences in the query that are similar to subsequences in the database. Protein-nucleotide 6-frame translation (tblastn) This program compares a protein query against the six-frame translations of a nucleotide sequence database.
METHOD: 1. 2. 3. 4. 5. 6. 7.
Go to NCBI home page. Click on Blast. Click on tblast.. Paste a query sequence in FASTA format. Choose nucleotide collection (nr/nt) in the database. Run blast. Select the most similar sequence which has maximum identity percentage and least ‘e’ value.
SOURCE: http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
33
INPUT: ACCESSIN NO – EF125543
TRIBOLIUM CASTANEAE
OUTPUT:
INTERPRETATION: By using tblastX we are able to get translated nucleotide sequence with maximum similarity. The accession number for homologous sequence is: NM_001080098
34
FASTA FASTA stands for FAST- all, reflecting the fact that it can be used for a fast protein comparison or a fast nucleotide comparison. It is a DNA and protein sequence alignment software package first described as (FASTAP) by David. J. Lipman and William.R. Pearson in 1985. This program achieves a high level of sensitivity for similarity searching at high speed. This is achieved by performing optimized searches for local alignments using a substitution matrix. The high speed is achieved by using the observed pattern of word hits to identify potential matches before attempting the more time consuming optimized search. The trade – off between speed and sensitivity is controlled by the ktup parameter, which specifies the size of the word. Increasing the ktup decreases the number of background hits. Not every word hit is investigated but instead initially looks for segment’s containing several nearby hits.
General FASTA Programs:
Tool
Description
FASTA- protein
Sequence
FASTA- nucleotide.
databases using FASTA. Sequence similarity searching against nucleotide
similarity
searching
against
protein
databases using FASTA.
FASTA – PROTEIN
35
AIM:-To find similarity in the protein sequences for the given query protein sequence
in any format using FASTA- Protein tool.
DESCRIPTION:-
It is about sequence similarity searching against protein databases using FASTA. Provides sequence similarity searching against nucleotide and protein databases using the FASTA programs. FASTA can be very specific when identifying long regions of low similarity especially for highly diverged sequences. We can also conduct sequence similarity searching against proteome or genome database using the FASTA program.
SOURCE: htpp/www.ebi.ac.uk/Fasta33.
METHOD:
1.
Type EBI in Google search (www.ebi.ac.uk/Fasta33).
2.
Click on European Bioinformatics Institute.
3.
Click on sequence similarity and analysis.
4.
Click on FASTA.
5.
Click on FATA Protein.
6.
Paste or browse a protein sequence in any format in the sequence submission box.
7.
Click on Run FASTA3.
INPUTACCESSION NO - ABM86630
36
Tumor protein p53 (Li-Fraumeni syndrome)
OUTPUT:
INTERPRETATION:
Most similar sequence to the query protein sequence was obtained.
37
FASTA- NUCLEOTIDE
AIM:To find similarity in the nucleotide sequences for the given query nucleotide
sequence in any format using FASTA-Nucleotide tool.
DESCRIPTION:
It is about sequence similarity searching against nucleotide databases using FASTA. It provides sequence similarity searching against nucleotide and protein databases using the FASTA programs. FASTA can be very specific when identifying long regions and low similarity especially for highly diverged sequence. We can conduct sequence similarity searching against complete proteome or genome databases using the FASTA program.
SOURCE:
http://www.ebi.ac.uk/fasta33/
METHOD:
1. Type EBI in Google search (www.ebi.ac.uk/Fasta33). 2. Click on European Bioinformatics Institute. 3. Click on sequence similarity and analysis. 4. Click on FASTA. 5. Click on FATA Protein. 6. Paste or browse a protein sequence in any format in the sequence submission box. 7. Click on Run FASTA3.
38
INPUT:
Accession No. - >gi|78286224 P53 of Homo sapience
OUTPUT:
INTERPRETATION:
Most similar sequence to the query nucleotide sequence was obtained.
39
MULTIPLE SEQUENCE ALIGNMENT Biologist often find a protein with approximately the same sequence in different species, suggesting that the proteins have a closely related biological function and that the gene encoding these protein have come from common genetic source. If we align theses genes we find some are alike and some are almost identical. As with aligning a pair of sequence, that difficulty in aligning a group of sequences varies considerably, being much greater as the degree of sequence similarity decreases, when the amount of sequence variation is great, it is difficult to find an optimal alignment of sequences because so many combinations of substitutions, insertion and deletion, each predicting a different alignment are possible. Three commonly used program for multiple sequence alignment are: Clustal W T-Coffee Multalin
40
MULTIPLE SEQUENCE ALIGNMENT USING CLUSTAL W AIM: To align three sequences using Clustal W. DESCRIPTION: Clustal W is a more recent version of clustal with W standing for “weighting” to represent the ability of the program to provide weights to the sequence and program to parameters. Program is designed to provide an adequate alignment of a large number\ of more closely related sequences and a reliable indication of the domain structure of sequences. Once an alignment has been made, a phylogenetic tree can me made by the neighbour-joining method.
METHOD: 1. Select more than two protein or nucleotide sequence from NCBI in FASTA format. 2. Copy the sequences and save in format. 3. Type Clustal W in google search bar. 4. Click on multiplae sequence aignmment- Clustal W.
5. Submit the sequence in enter sequence box. 6. Click on execute multiple alignment. 7. Copy and save the result on notepad.
SOURCE: http://www.ebi.ac.uk/Tools/clustalw2/index.html
41
INPUT: ACCESSION NO:NM 146146 NM 001122899 NM 010704
42
OUTPUT:
RESULTS: Multiple sequence alignment for three insuline protein was performed using Clustal W.
43
MULTIPLE SEQUENCE ALIGNMENT USING T-COFFEE
AIM: To align three sequences using T-Coffee. DESCRIPTION: T-Coffee is an advanced pairwise alignment program that uses a system of sequences position weights to generate an multiple sequence alignment that is the most consistent with pair-wise alignments of all the component sequences ( T-Coffee stands for tree based Consistency based objective function for alignment evaluation). T-Coffee is better than Clustal W at reproducing known alignment of related proteins but is much slower. METHOD:
1. Select more than two protein or nucleotide sequence from NCBI in FASTA format. 2. Copy the sequences and save in format. 3. Type T-Coffee in google search bar. 4. Click on multiplae sequence aignmment- T-Coffee
5. Submit the sequence in enter sequence box. 6. Run the program. 7. Copy and save the result on notepad.
SOURCE: http://www.ch.embnet.org/software/TCoffee.html
INPUT: ACCESSION NO:NM 146146 NM 001122899 NM 010704
44
45
OUTPUT:
RESULTS: Multiple sequence alignment for three insuline protein was performed using T-Coffee.
46
MULTIPLE SEQUENCE ALIGNMENT USING MULTALIGN
AIM: To align three sequences using Multalign. DESCRIPTION: Multalign does a simultaneous alignments for two or more DNA or protein sequences. It introduce a certain number of gaps into either pairwise aligned sequences to find minimal global distance. The program is based on a generalization of the algorithm of WatermannSmith and Beyer by Kreger and Osterburg. METHOD: 1. Select more than two protein or nucleotide sequence from NCBI in FASTA format. 2. Copy the sequences and save in format. 3. Type Multalign in google search bar. 4. Click on multiplae sequence aignmment- Multalign.
5. Submit the sequence in enter sequence box. 6. Run the program. 7. Copy and save the result on notepad.
SOURCE: http://bioinfo.genopole-toulouse.prd.fr/multalin/multalin.html
INPUT: ACCESSION NO - NM 14646 NM 001122899 NM 010704
47
48
OUTPUT:
INTERPRETATION: Multiple sequence alignment for three insuline protein was performed using multalign.
49
GENE PREDICTION With the advent of whole genome sequencing projects, It has become routine to scan genomic DNA sequences t find genes, particularly those that encode protein. Computational methods for gene prediction work by searching through sequences to locate the most likely ones that encodes proteins. Predicating protein-encoding genes is generally easier in prokaryotes than in eukaryotic organisms because prokaryotic generally lack introns and because several quite highly conserved sequences are found in the promoter region and around the start sites of transcription and translation. Three commonly used programs for gene prediction are:•
Webgene
•
Genmark
•
Genscan
50
GENE PREDICTION USING WEBGENE
AIM:-To predict the features of eukaryotic gene using webgene. DESCRIPTION:-
WebGene is a tool which publishes family history information on the Web. It publishes this information from a standard file type used typically to exchange data between genealogy software applications Rex Myer is the founder of WebGene and it has been online since fall of 1995. WebGene indexes the information in the GEDCOM file and presents it in an appealing graphical format suitable for the Internet. Further, it enables the lookup and cross-referencing of surnames and family relationships. SOURCE:http://www.itb.cnr.it/sun/webgene/
METHOD:1. Sequence of human insulin was retrieved from NCBI and saved in note pad. 2. On google search bar ,webgene was typed. 3. Webgene home page was opened.
I.
Gene builder:-
Gene builder bar was clicked. Shown parameters on webpage were set. Sequence was pasted in the given box. Analysis was run Results were saved.
51
II.
Repeat view
Repeat View bar was clicked. Shown parameters on webpage were set. Sequence was pasted in the given box. Analysis was run Results were saved
III.
CpG island
CpG bar was clicked. Shown parameters on webpage were set. Sequence was pasted in the given box. Analysis was run Results were saved.
IV.
Splice View
Splice view bar was clicked. Shown parameters on webpage were set. Sequence was pasted in the given box. Analysis was run Results were saved.
V. HC polyA HC polyA bar was clicked. Shown parameters on webpage were set. Sequence was pasted in the given box. Analysis was run Results were saved.
VI . Hctata
52
HCtata bar was clicked. Shown parameters on webpage were set. Sequence was pasted in the given box. Analysis was run Results were saved
VII . Gen view2 Gene view2 bar was clicked. Shown parameters on webpage were set. Sequence was pasted in the given box. Analysis was run Results were saved.
VIII . AUG_evaluator Gene builder bar was clicked. Shown parameters on webpage were set. Sequence was pasted in the given box. Analysis was run Results were saved.
OUTPUT: Gene Builder
53
OUTPUT: Repeat-View
OUTPUT: CpG island
54
OUTPUT: Splice View
OUTPUT: HC polyA
OUTPUT: . Hctata
55
OUTPUT: Gen view2
OUTPUT: AUG_evaluator
RESULTS AND INTERPRETATION: Eight programs of Webgene were run for human insulin gene to predict: Gene builder- protein coding gene. Repeat view- repeated element mapping. CpG island- CpG island. Splice view- Splicing signal. HcpolyA- for PolyA. Hctata- for TATA signal prediction.
56
Genview- protein coding gene. AUG_evaluator- start codon.
GENE PREDICTION USING GENMARK
AIM:-To predict the features of eukaryotic gene using genemark. DESCRIPTION:-
The GeneMark. hmm algorithm presented here was designed to improve the gene prediction quality in terms of finding exact gene boundaries. The high gene finding accuracy has been found with genmark. This program also use the specially derived ribosome binding site pattern to refine predictions of translation initiation codons. SOURCE:http://exon.gatech.edu/Genmark/genmark_prok_gms_plus.cgi
METHOD:1. Sequence of prokaryotic gene was retrieved from NCBI and saved in note pad. 2. On google search bar ,genmark was typed. 3. Genmark home page was opened. 4. Sequence was pasted in box. 5. Analysis was done. 6. Results were saved. INPUT: ACCESSION NO - AY 813449 Schistosoma japonica
57
OUTPUT:
INTERPRETATION: Genmark program was run to predict gene of prokaryotes. Result was saved.
58
GENE PREDICTION USING GENSCAN
AIM:-To predict the features of eukaryotic gene using Genescan.
DESCRIPTION:Genescan is an example of an approaches for gene prediction which integrate multiple types of information including splice signal sensors, compositional properties of coding and non-coding DNA and in some cases database homology searching in order to predict entire gene structures (sets of spliceable exons) in genomic sequences. Genescan use distinct, explicit, empirically derived sets of model parameters to capture differences in gene structure and composition between distinct C . G compositional regions (isochores) of the human genome. It also has the capacity to predict multiple genes in a sequence, to deal with partial as well as complete genes, and to predict consistent sets of genes occuring on either or both DNA strands.
SOURCE :http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.call.cgi
METHOD:1. Sequence of human insulin was retrieved from NCBI and saved in note pad. 2. On google search bar genescan was typed. 3. Genescan home page was opened. 4. Sequence was pasted in box. 5. Analysis was done. 6. Results were saved.
59
INPUT: ACCESSION NO – EF125543 Tribolinm castane
OUTPUT:
INTERPRETATION: Genscan program was run to predict gene of eukaryotes. Result was saved.
60
PATTERNS AND PROFILE SEARCH OF PROTEINS
AIM: To search patterns and profiles of given protein sequences using various EXPASy tools. DESCRIPTION: The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also produes the protein sequence knowledge base Uniprot and Swissprot. For the prediction of patterns and profiles of proteins Expasy produces tools like
1. ELM 2. FingerPRINTScan 3. Motif Scan 4. Proscan 5. PRATT
Profiles are numerical representation of a multiple sequence alignment. Profiles help find the similarities between these sequences and help in identification and analysis of distant related proteins. Patterns also represent the common characterstics of a protein family but it does not contain any weighing information. Thus, the user can specify what kind of patterns should be searched for, and how many sequences should match a pattern to be repeated- there are option fot pattern conservation, restrictions, number of pattern symbols, flexible spacers etc.
61
Prosite AIM: To perform profile and pattern search using Prosite tool. DESCRIPTION: PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them SOURCE: http://www.expasy.ch/prosite/ METHOD: 1. A protein query sequence is retrieved from NCBI in FASTA format. 2. The retrieved protein sequence is pasted on the Prosite submission form. 3. Click the scan button 4. The tool Prosite was run and the results viewed by clicking on Rich view and saved. INPUT: ACCESSION NO: AAA23226 Cellulase protein
.
62
OUTPUT:
INTERPRETATION: UsingProsite
tool
we
are
able
to
predict
the
secondary
structure
for
cellulase ; AAA23226B.Tool has shown the number of disulphide bridges, active sites and other details of protein structure.
63
ELM AIM: To perform profile and pattern search using ELM tool. DESCRIPTION: ELM stands for Eukaryotic Linear Motif search and is a resource for finding functional sites in proteins. It can find Pfam domain, signal peptide, coiled coil prediction, transmembrane helix as well as loop, helix and strand prediction. SOURCE: http://elm.eu.org/ METHOD: 1. A protein query sequence is retrieved from NCBI in FASTA format. 2. The retrieved protein sequence is pasted on the ELM submission form. 3. The e-mail id was entered. 4. The tool ELM was run and the results viewed and saved. INPUT: ACCESSION NO – AAA32641 CHTINASE PROTEIN
64
OUTPUT:
INPRETATION: Using EML tool we are able to find number of helixes, strands, loops which are present in the secondary structure of chitinase , Accession No. AAA32461
65
FingerPRINTScan
AIM: To perform profile and pattern search using FingerPRINTScan tool. DESCRIPTION:
FingerPRINTScan tool scans a protein sequence against the PRINTS protein finger database. It tells the number of motifs matched to the query sequence, its length and position. SOURCE:
http://www.bioinf.man.ac.uk/fingerPRINTScan/ METHOD:
1. A protein query sequence is retrieved from NCBI in FASTA format. 2. The retrieved protein sequence is pasted on the FingerPRINTScan submission form. 3. The e-mail id was entered. 4. The tool FingerPRINTScan was run and the results viewed and saved.
INPUT: ACCESSION NO – AAA32641 CHITINASE PROTEIN
66
OUTPUT:
INFERENCE: The Fingerprint scan tool was used in order to find out the number of motifs and their positionsin the sequence
67
Motif Scan
AIM: To perform profile and pattern search using Motif Scan tool. DESCRIPTION: Motif or family comparisons are more sensitive because motifs represent a higher level generalization of the features that are imporatnat for a given structural or functional feature. This tool scans a sequence against protein profile databases [including PROSITE]. SOURCE: http://mybits.icb.sib.ch/cgi-bin/motif-scan METHODOLGY: 1. A protein query sequence is retrieved from NCBI in FASTA format. 2. The retrieved protein sequence is pasted on the Motif Scan submission form. 3. The e-mail id was entered. 4. The tool Motif Scan was run and the results viewed and saved.
INPUT: ACCESSION NO – AAA32641 CHITINASE PROTEIN
68
OUTPUT:
INTERPRETATION: we are able to find number of helixes, strands, loops which are present in the secondary structure of chitinase , Accession No. AAA32461 by the using of motif scan
69
PROSCAN AIM: To perform profile and pattern search using PROSCAN tool. DESCRIPTION: This tool developed and run by PBIL in University of Lyon, France scans a sequence against PROITE and allows mismatches as well. It can give information regarding phosphorylation, amidation or any other specific identity characterstic of the given sequence. SOURCE: http://npsa-phil.ibcp.fr/cgi-bin/npsa_automat.pI?page=npsa_prosite.html METHOD: 1. A protein query sequence is retrieved from NCBI in FASTA format. 2. The retrieved protein sequence is pasted on the PROSCAN submission form. 3. The e-mail id was entered. 4. The tool PROSCAN was run and the results viewed and saved.
INPUT: ACCESSION NO –IH4P B CHAIN B , CRYSTAL PROTEIN
70
OUTPUT:
INTERPRETATION: Using the tool proScan the functional sites of a protein sequence can be found . The results are viewed and saved.
71
VISUALIZATION OF PROTEIN STRUCTURE BY USING RASMOL AIM: To visualize the structure of protein sequence by using visualization tool RasMol. DESCRIPTION: RasMol 2 is a molecular graphics program intended for the visualization of proteins, nucleic acids and small molecules. The program is aimed at display, teaching and generation of publication quality images. RasMol runs on Microsoft Windows, Apple, Macintosh, UNIX and VMS systems. The UNIX and VMS systems require an 8,24 or 32 bit colour X Windows display (X11R4 or later). The program reads in a molecule co-ordinate file and interactively displays the molecule on the screen in a variety of colour schemes and molecular representations. Currently available representations include depth cued wireframes, ‘drieding’ sticks, spacefilling (CPK) spheres, ball and stick, solid and strand biomolecular ribbons, atom labels and dot surfaces. SOURCES:
1. http://wbiomed.curtin.edu.au/teach/biochem/help/download.html 2. http://mc2.cchem.berkeley.edu/rasmol/v2.6/ protein structure (.pdb) http://www.pdb.org/pdb/home/home.do METHOD: 1. The NCBI website is logged on.
2. The given accession no. is entered and searched for it. The nucleotide sequence is got from the CoreNucleotide database.
3. The pdb id is collected for the given sequence in the CDS section of the sequence. PDB ID found is.eg.2MM1 4. The PDB website is logged on.
5. The pdb id .is entered and searched for it. 6. The .pdb.gz file is downloaded from the options on the left of the page. 7. The .pdb was extracted from the .pdb.gz file. 8. This .pdb file was opened using RasMol.
72
9. The structure is viewed with different Display options like wireframe, Backbone, Sticks, Spacefill, Ball & Stick, Ribbons, Strands, cartoons that are available on RasMol. 10. In RasMol Command Line, some of the commands like “select helix’ and “colour yellow” are used to view helix structure in that molecule. 11. several other commands can also be used like “set picking distance”, “set picking angle”, set picking tortion”, etc. INPUT: ACCESSION NO – IH4P B CHTINASE PROTEIN
OUTPUT:
73
INTERPRETATION: 1. This protein has total of …atoms. 2. This protein has ..helix structure with …atoms. 3. this protein has no sheets or loops… 4. This protein has ..HOH molecues. Picture with ligands
74
SECONDARY STRUCTURE PREDICTION AIM: Secondary structure prediction of the given protein sequences using Expasy tools. DESCRIPTION: The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also produes the protein sequence knowledge base Uniprot and Swissprot. For the prediction of secondary structure of proteins Expasy produces tools like 1. GOR 2. HNN 3. SOPMA 4. JPred 5. GOR
75
GOR AIM: To predict secondary structure of a given protein using GOR tool from Expasy. DESCRIPTION: GOR predicts the secondary structure of a given amino acid by looking at a window of 8 amino acids before and 8 after the position of interest. This program (named after Garnier, Osguthorpe and Robson) is in its fourth version. SOURCE: http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_gor4.html METHOD: 5. A protein query sequence is retrieved from NCBI in FASTA format. 6. The retrieved protein sequence is pasted on the GOR4 submission form. 7. The e-mail id was entered. 8. The tool GOR4 was run and the results viewed and saved.
INPUT: ACCESSION NO – IH4P B CHTINASE PROTEIN
76
OUTPUT:
INTERPRETATION: Using GOR tool we are able to predict the secondary structure for chitinase Accession no. IH4P B.Tool has shown the number of helixes , alpha helixes and beta bridges and other details of protein structure.
77
HNN AIM: To predict secondary structure of a given protein using HNN tool from Expasy. DESCRIPTION: Hierarchial Neural Networks can be used to predict protein structure. The protein sequence is translated into patterns by shifting a window of n adjacent residues(typical value of n=13-21) through the protein. SOURCE: http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_nn.html METHOD: 1. A protein query sequence is retrieved from NCBI in FASTA format. 2. The retrieved protein sequence is pasted on the HNN submission form. 3. The e-mail id was entered. 4. The tool HNN was run and the results viewed and saved.
INPUT: ACCESSION NO – IH4P B CHTINASE PROTEIN
78
OUTPUT:
INTERPRETATION: Using HNN tool we are able to prerdict the secondary structure for chitinase Accession number.JH4P B. Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of protein structure.
79
SOPMA AIM: To predict secondary structure of a given protein using SOPMA tool from Expasy. DESCRIPTION: SOPMA is a secondary structure prediction program ( Self Optimized Prediction Method) that uses multiple alignments. SOPMA correctly predicts 69.57% of amino acids for a secondary structure (alpha helix, beta sheet and coil). SOURCE: http://npsa_pbil.ibcp.fr.cgi_bin/npsa_automat.pi?page=npsa_sopma.html METHOD: 1. A protein query sequence is retrieved from NCBI in FASTA format. 2. The retrieved protein sequence is pasted on the SOPMA submission form. 3. The e-mail id was entered. 4. The tool SOPMA was run and the results viewed and saved. INPUT: ACCESSION NO –IH4P B CHTINASE PROTEIN
80
OUTPUT:
INTERPRETATION: Using SOPMA tool we are able to prerdict the secondary structure for chain b Accession number. JH4P B.
Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of protein structure.
81
JPred
AIM: To predict secondary structure of a given protein using JPred tool from Expasy.
DESCRIPTION: It is a consensus to find secondary structure of protein put forth by University of Dundee.
SOURCE: http://www.compbio.dundee.ac.uk/~www-jpred/
METHOD: 1. A protein query sequence is retrieved from NCBI in FASTA format. 2. The retrieved protein sequence is pasted on the JPred submission form. 3. The e-mail id was entered. 4. The tool JPred was run and the results viewed and saved.
INPUT: ACCESSION NO –AAA32641 CHTINASE PROTEIN
82
OUTPUT:
INTERPRETATION: Using Jpred tool we are able to prerdict the secondary structure for chitinase Accession number. AA32461.
83
TO COSTRUCT THE PHYLOGENTIC REALTIONSHIP BETWEEN DIFFERENT ORGANISMS AIM: To draw the phylogenetic tree of the given sequences using the software phylodraw. DESCRIPTION: The sequences whose phylogenetic relationship is to be known are retrieved from NCBI by keyword search or by the accession number. The tool Phylodraw available on the net is used for drawing the phylogenetic tree. The input format is Dialign which is obtained by doing a multiple sequence alignment using the dialign tool. For this phylogenetic treedrawing Phylodraw and Dialign are the tools used. SOURCES: Dialign: http://bibiserve.techfak.uni-bielefeld.de/dialign/sumission.html Phylodaw: http://pearl.cs.pusan.ac.kr/phylodraw/ NCBI: www.ncbi.nlm.nih.gov METHOD : 1. The sequences with the following accession numbers are retrieved from the NCBIs biological database. 2. The sequences are used as the input in the Dialign tool for multiple sequence alignment. 3. The output and the result of dialign is used as the input in the phylodraw tool. 4. Phylodraw is the tool used to draw phylogeetic trees. It has the following types of trees. a. Unrooted tree b. Rooted tree c. Radial tree d. Slated cladogram e. Rectangular cladogram f.
Phylogram.
5. The results are displayed in Radial tree, Slated cladogram, rectangle cladogram and Phylogram tree formats.
84
INPUT: ACCESSSION NO.- AAD28733 Triticum aestivum
PHYLODRAW INPUT:
85
OUTPUT:
The phylodraw tool is used to draw the phylogenetic tree of genetically related species. It can display the trees in various formats. The tree formats thet are displayed are: a. Radial tree Slated cladogram Rectangular cladogram Phylogram
INTERPRETATION: The phylogenetic tree for the sequences has been drawn using the Phylodraw tool with the result of Dialign as the input.
86