Chapter 1
STRATEGIES FOR GENERATING EST AND FULL LENGTH INSERTS
Table of contents 1. 2.
3. 4. 5.
Introduction Strategies for generating ESTs 2.1 Generation of cDNA 2.2 Generation of ESTs cDNA Libraries and ESTs EST Application, Page 6 Reference, Page 6
1. Introduction: ESTs are small pieces of DNA sequence (usually 200 to 500 nucleotides long) that are generated by sequencing either one or both ends of an expressed gene. They may be used to identify gene transcripts, and are instrumental in gene discovery and gene sequence determination. The identification of ESTs has proceeded rapidly, with approximately 52 million ESTs now available in public databases (e.g. GenBank 5/2008, all species). The idea is to sequence bits of DNA that represent genes expressed in certain cells, tissues, or organs from different organisms and use these "tags" to fish a gene out of a portion of chromosomal DNA by matching base pairs. The challenge associated with identifying genes from genomic sequences varies among organisms and is dependent upon genome size as well as the presence or absence of introns, the intervening DNA sequences interrupting the protein coding sequence of a gene. ESTs can be rapidly generated from either the 5' or 3' end of a cDNA clone in a highthroughput manner from a particular cell, tissue or organism of interest at a low cost to get a quick insight into transcriptionally active regions. ESTs accelerate gene discovery, complement genome annotation, aid gene structure identification, establish the viability of alternative transcripts, guide SNP characterization and facilitate proteome analysis.
Edited by- Pratiksha S., Asha G. etal Published by- Anand M. B.
Page 1
Chapter 1
STRATEGIES FOR GENERATING EST AND FULL LENGTH INSERTS
2. Strategies for Generating ESTs: 2.1 Generation of cDNA: Gene identification is very difficult in humans, because most of our genome is composed of introns interspersed with a relative few DNA coding sequences, or genes. These genes are expressed as proteins, a complex process composed of two main two steps. Each gene (DNA) must be converted, or transcribed, into messenger RNA (mRNA), RNA that serves as a template for protein synthesis. The resulting mRNA then guides the synthesis of a protein through a process called translation. Interestingly, mRNAs in a cell do not contain sequences from the regions between genes, nor from the non-coding introns that are present within many genes. Therefore, isolating mRNA is key to finding expressed genes in the vast expanse of the human genome. The problem, however, is that mRNA is very unstable outside of a cell; therefore, scientists use special enzymes to convert it to complementary DNA (cDNA). Edited by- Pratiksha S., Asha G. etal Published by- Anand M. B.
Page 2
Chapter 1
STRATEGIES FOR GENERATING EST AND FULL LENGTH INSERTS
cDNA is a much more stable compound and, importantly, because it was generated from a mRNA in which the introns have been removed, cDNA represents only expressed DNA sequence.
2.2 Generation of ESTs Once cDNA representing an expressed gene has been isolated, scientists can then sequence a few hundred nucleotides from either end of the molecule to create two different kinds of ESTs. Sequencing only the beginning portion of the cDNA produces what is called a 5' EST. A 5' EST is obtained from the portion of a transcript that usually codes for a protein. These regions tend to be conserved across species and do not change much within a gene family. Sequencing the ending portion of the cDNA molecule produces what is called a 3' EST. Because these ESTs are generated from the 3' end of a transcript, they are likely to fall within non-coding or untranslated regions (UTRs), and therefore tend to exhibit less cross-species conservation than do coding sequences.
Edited by- Pratiksha S., Asha G. etal Published by- Anand M. B.
Page 3
Chapter 1
STRATEGIES FOR GENERATING EST AND FULL LENGTH INSERTS
3. cDNA Libraries and ESTs or Full length inserts: The procedure for capturing an expression profile is straightforward. First, a sample of cells is obtained; then RNA is extracted from the cells, and is stabilized by using reverse transcriptase to run off cDNA from the RNA template. The cDNA is transformed into a library suitable for use in rapid sequencing experiments. A sample of clones is selected from the library at random-e.g.-10,000 from a library with a complexity of 2 million clones. A substantial automated sequencing operation is required to produce 10,000 sequencing reactions, and then to run these on automated sequencers. The resulting data are downloaded to computers for further analysis. The ideal result is a set of 10,000 sequences, each between 200 and 400 bases in length, representing part of the sequence of each of the 10,000 clones. In reality, some sequencing runs will fail altogether, some will fail to appropriate quality. The sequences that emerge successfully from this process are called ESTs. It is important to understand the statistics of library production to be confident in handlings EST data. The number of clones in the library reflects the efficiency of mRNA extraction from the source of cells. Good libraries contain at least 1 million clones, and probably substantially more. Some tissues and cell types are difficult to deal with and the resulting libraries tend to be less representative. The actual number of distinct genes expressed in a cell may be a few thousand the number varies according to cell type, with the most complex human cells expressing up to 2000. Thus we have a small number of different genes represented in a pool of 1 million clones. We then take a relatively small single random sample of clones for sequencing, rather then multiple samples.
Edited by- Pratiksha S., Asha G. etal Published by- Anand M. B.
Page 4
Chapter 1
STRATEGIES FOR GENERATING EST AND FULL LENGTH INSERTS
4. EST Application: 1> ESTs are versatile and have multiple uses. 2> ESTs were first used to construct expression maps of the human genome, then to assess the gene coverage from EST sequencing alone and to develop and map gene-based site markers. 3> With the exponential rise in genomic data in the from global sequencing projects, databases of ESTs are used in: a. gene structure prediction, b. to investigate alternative splicing, c. to discriminate between genes exhibiting tissue or disease specific expression and d. For the discovery and characterization of candidate single nucleotide polymorphisms (SNPs). 4> The usefulness of EST data has extended well beyond its original application in gene finding and in transcriptome analysis. 5> ESTs are also a useful resource for designing probes for DNA microarrays used to determine gene expression.
5. References: 5. 1 Webliography: 1> http://www.ncbi.nlm.nih.gov/About/primer/est.html 2> http://www.biolinfo.org/EST/ 3> http://en.mimi.hu/biology/expressed_sequence_tag.html 5.2 Bibliography: 1> Introduction to bioinformatics, T K Attwood & D J Parry-Smith
Edited by- Pratiksha S., Asha G. etal Published by- Anand M. B.
Page 5