Genome Expression Alick Mwambungu
[email protected]
Steps involved in Genome Expression:
transcription
translation
1. Accessing the genome 2. Assembly of the transcription initiation complex 3. Synthesis of RNA (RNA polymerase) 4. Processing of RNA 5. RNA degradation 6. Assembly of the translation initiation complex 7. Protein synthesis 8. Protein folding and protein processing 9. Protein degradation
2 processes that lead from genome to transcriptome:
Fig 9.1, Genomes, Brown, 2nd Ed
Accessing the Genome
Accessing the Genome • The DNA in nucleus of eukaryotic cell / nucleoid of prokaryote is attached to variety of proteins that are not directly involved in genome expression • These proteins must be displaced in order for RNA polymerase and other expression proteins to gain access to the genes • Little is known about events in prokaryotes (due to poor knowledge about the physical organization of the prokaryotic genome) • More is known about how the packaging of DNA into chromatin influences genome expression in eukaryotes
Eukaryotic Genome • Divided into ≥ 2 linear DNA molecules, each contained in a different chromosome: chromosome number can vary • Smaller (circular) mitochondrial genomes • Genome size range: smallest: less than 10 Mb, largest: over 100 000 Mb. • Nature of DNA packaging into chrs has influence on processes involved in expression of genes • Nuclear DNA is associated with DNA-binding proteins called histones • Chromatin = beads of protein on a string of DNA • Each bead, or nucleosome, contains eight histone protein molecules, these being two each of histones H2A, H2B, H3 and H4.
Packaging of chromatin into metaphase chromosome
Fig 10.24, MCB, Lodish, 5th Ed
Heterochromatin & Euchromatin • 2 types of heterochromatin are recognized: – Constitutive: permanent feature of all cells, DNA contains no genes: can always be retained in a compact organization – Facultative: not permanent feature, contains genes that are inactive in some cells or at some periods of the cell cycle. • Organization of heterochromatin so compact: proteins involved in gene expression cannot access the DNA • Light areas = Euchromatin: contains active genes: Exact organization of DNA within euchromatin not known: loops of DNA visible • Each loop between 40 and 100 kb in length, in the form of the 30 nm chromatin fiber.
Chromatin Modifications and Genome Expression •
2 ways in which chromatin structure can influence genome expression:
•
The degree of chromatin packaging displayed by segment of chromosome determines whether or not genes within that segment are expressed
•
If gene accessible: its transcription is influenced by precise nature and positioning of the nucleosomes in region where transcription initiation complex will be assembled
2 ways in which chromatin structure can influence gene expression 1.
2.
Fig 8.8, Genomes, Brown, 2nd Ed
1. Degree of chromatin packaging • Degree of packaging displayed by a segment of chromatin is determined by the precise chemical structure of histone proteins contained within nucleosomes • Histone proteins can undergo various types of modification, best studied = histone acetylation • Histone acetylation: – N terminal regions of histone form tails that protrude from the nucleosome core octamer – acetyl groups are attached to lysine amino acids in these N-terminal regions by enzyme histone acetyltransferases (HATs) = histone acetylation
Histone Nterminal tail
1. Degree of chromatin packaging: Histone Acetylation Histone acetylation: – Reduces the affinity of the histones for DNA – Possibly reduces interaction between individual nucleosomes that forms 30 nm chromatin fiber Result: DNA accessible by proteins involved in gene expression: transcription initiated = activation of the genome Histone deacetylation: • Removal of acetyl groups from histone tails:. This is the role of the histone deacetylases (HDACs) Result: reverse transcription-activating effects of HATs, repression of genome activity
2. Re-positioning of Nucleosomes • •
•
• •
Often referred to as nucleosome remodeling Involves modification or repositioning of nucleosomes within short region of genome: DNA-binding proteins gain access Does not involve covalent alterations to histone molecules, but may occur in conjunction with histone acetylation Remodeling induced by energy-dependent process: weakens contact between nucleosome and DNA 3 distinct types of change can occur : – Remodelling – Sliding or cis-displacement – Transfer or trans-displacement
2. Re-positioning of Nucleosomes
Fig 8.10, Genomes, Brown, 2nd Ed
RNA Polymerases
RNA Polymerases • Transcription of eukaryotic nuclear genes requires 3 different RNA polymerases: RNA polymerase I, RNA polymerase II and RNA polymerase III • Each is a multi-subunit protein (8-12 subunits) with molecular mass in excess of 500 kDa • Structurally, polymerases are quite similar, functionally distinct: no interchangeability • Bacteria possess single RNA polymerase: consists of 5 subunits: α2 ββ σ (2 α subunits, one each of β and the related β’ and one of σ). • α, β and β’ subunits equivalent to 3 largest subunits of eukaryotic RNA polymerases, but σ subunit has own special properties in terms of structure & function
Eukaryotic RNA Polymerases Polymerase
Genes transcribed
RNA polymerase I
28S, 5.8S and 18S ribosomal RNA (rRNA) genes
RNA polymerase II
Protein-coding genes; most small nuclear RNA (snRNA) genes, miRNA genes
RNA polymerase III
Genes for transfer RNAs (tRNA), 5S rRNA, U6-snRNA, small nucleolar (sno) RNAs
Table 9.3, Genomes, Brown, 2nd Ed
RNA polymerase binding to promoter
+1 +1 +1
Prokaryotes Fig 3.6, Genomes, Brown, 2nd Ed
Eukaryotes
Generalised events: Transcription Initiation • Bacterial RNA pol and 3 eukaryotic RNA pol all initiate transcription by attaching, directly or via accessory proteins, to promoter / core promoter seq • Closed promoter complex converted into open promoter complex by breakage of limited number of bps around transcription initiation site • RNA pol moves away from promoter: promoter clearance • True completion of initiation stage = establishment of stable transcription complex: actively transcribing gene to which it is attached
Initiation of Transcription in Prokaryotes (E. Coli)
Recognition sequences for transcription initiation • • • •
•
Essential that transcription initiation complexes are constructed at the correct positions on DNA molecules Positions marked by target sequences that are recognized either by RNA polymerase itself or DNA-binding protein Bacteria: target sequence for RNA polymerase attachment = promoter E. coli promoter consists of two segments, both of six nucleotides, described as follows: – -35 box 5’-TTGACA-3’ – -10 box 5’-TATAAT-3’ These are consensus sequences: describe the 'average' of all promoter sequences in E. coli; actual sequences might be slightly different
Example: Promoter for Lactose Operon of E. Coli
Fig 9.17, Genomes, Brown, 2nd Ed
Assembly of Transcription Initiation Complex in E. coli • In E. coli, direct contact formed between promoter and RNA pol: sequence specificity of pol resides in σ subunit • ‘Core enzyme‘: lacks σ subunit, makes loose / non-specific attachments to DNA • Recognition of promoter occurs by interaction between σ subunit and -35 box forming closed promoter complex • Followed by breaking of bps in -10 box: open complex • Opening up of helix involves contacts between polymerase and non-template strand •
σ subunit: dissociates soon after initiation complete, core enzyme carries out elongation phase
Initiation of transcription in E. coli
Fig 9.20, Genomes, Brown, 2nd Ed
Synthesis of Bacterial Transcripts • Bacterial mRNAs do not undergo significant processing: primary transcript synthesized = mature mRNA, translation usually begins before transcription is complete
Figure 10.1, Genomes, Brown, 2nd Ed
Termination of Transcription in Prokaryotes (E. Coli)
Termination of Bacterial Transcripts 2 distinct strategies (act as termination signals): • Intrinsic terminators: DNA sequence in template encodes inverted palindromic sequences and a run of deoxyadenosine nts. Transcription of this sequence promotes dissociation of RNA polymerase by formation of hairpin loop (RNA-RNA base-pairing) which destabilizes attachment of growing transcript to template • Rho dependent: requires activity of protein called Rho = helicase. Formation of hairpin loop signals to Rho to actively break bps, between template and transcript (no run of deoxyadenosine nts present)
Termination at an intrinsic terminator
Figure 10.3, Genomes, Brown, 2nd Ed
Rho-dependent termination
Figure 10.4, Genomes, Brown, 2nd Ed
Control over choice between elongation and termination in bacteria • One mechanism = antitermination • RNA pol ignores termination signal, continues elongating its transcript until 2nd signal reached • Provides mechanism whereby one or more genes at end of operon can be switched off /on • Antitermination is controlled by antiterminator protein: attaches to DNA near beginning of operon: transfers to RNA pol as it moves past en route to 1st termination signal • Presence of antiterminator protein causes RNA pol to ignore the termination signal
Anti-termination (bacteria)
Initiation of Transcription in Eukaryotes
Eukaryotic promoters are more complex • In eukaryotes, term 'promoter' used to describe all sequences important in initiation of transcription of a gene • Include not only core promoter (basal promoter) = site at which initiation complex is assembled, but also one or more upstream promoter elements • Each of 3 types of eukaryotic RNA polymerase recognizes a different type of promoter sequence: – RNA pol I promoters: consist of core promoter spanning transcription start point (nts -45 -+20) and an upstream control element (UCE) ~100 bp further upstream – RNA pol II promoters: variable, can stretch for several kbs upstream of transcription start site.
•
•
RNA pol II promoters contd… Core promoter consists of 2 segments: • -25 or TATA box (consensus 5’ -TATAWAW-3’ , where W is A or T) • Initiator (Inr) sequence (consensus 5’ -YYCARR-3’ , where Y is C or T, and R is A or G) located around nt +1 – Some genes transcribed by RNA pol II have only 1 of these 2 components – As well as core promoter, genes recognized by RNA pol II have various upstream promoter elements RNA pol III promoters: Fall into 3 categories: 1st 2 categories unusual: located within genes whose transcription they promote. Usually the core promoter spans 50-100 bp and comprises 2 conserved boxes
Structures of Eukaryotic promoters
Fig 9.18, Genomes, Brown, 2nd Ed
Transcription initiation in eukaryotes with RNA polymerase II • Eukaryotic polymerases do not directly recognize their core promoter sequences • General transcription factor (GTF) = protein or protein complex that is transient / permanent component of initiation complex formed during eukaryotic transcription • A GTF called TFIID = complex made up of TATA Binding Protein (TBP) & at least 12 TBP associated factors (TAFs) makes initial contact with gene being transcribed by RNA polymerase • TBP = sequence-specific protein: binds to DNA via unusual TBP domain: makes contact with minor groove in region of TATA box
Transcription initiation in eukaryotes with RNA polymerase II • After TFIID (TBP + TAFs) has attached to core promoter, the pre-initiation complex (PIC) is formed by attachment of remaining GTFs • In vitro experiments indicate that GTFs bind to complex in order: TFIIA, TFIIB, TFIIF/RNA pol II, TFIIE and TFIIH • Within the overall process, 3 events important: • Attachment of TBP induces formation of a bend in DNA in region of TATA box • Bend provides recognition structure for TFIIB, which ensures correct positioning of RNA pol II relative to transcription start site. • The disruption to base pairing needed to form open promoter complex is brought about by TFIIH
Figure 9.21, Genomes, Brown, 2nd Ed
Transcription initiation in eukaryotes with RNA polymerase II • Final step in assembly of initiation complex is addition of phosphate groups to C-terminal domain (CTD) of largest subunit of RNA pol II • Once phosphorylated, polymerase is able to leave pre-initiation complex and begin synthesizing RNA • Phosphorylation might be carried out by TFIIH • After departure of polymerase, at least some of GTFs detach from core promoter, but TFIID, TFIIA and TFIIH remain • Re-initiation is a more rapid process than primary initiation
Processing of mRNA in Eukaryotes: Capping & Polyadenylation
Processing of mRNA in Eukaryotes • Capping: All mRNAs have cap added to 5’ end • Polyadenylation: Most mRNAs: addition of series of adenosines to 3’ end • Splicing: many primary transcripts contain introns, that block translation and so undergo splicing • (Editing: Some mRNAs are subject to RNA editing) • mRNAs processed while being synthesized (elongation). Cap added as soon as transcription initiated; splicing & editing begin while transcript still being made, polyadenylation important part of termination mechanism for RNA pol II.
Capping of RNA pol II transcripts • Occurs before transcript reaches 30 nts in length, after successful promoter clearance • 2 step process: • Step 1: addition of extra guanosine to extreme 5’ end of RNA – involves reaction between 5’ triphosphate of terminal nucleotide of RNA molecule and triphosphate of GTP nucleotide – γ-phosphate (outermost) of terminal nt removed – β and γ phosphates of GTP removed – 5’– 5’ bond formed by enzyme guanylyl transferase • Step 2: new terminal guanosine converted into 7methylguanosine by attachment of methyl group to N7 of purine ring – modification catalyzed by guanine methyltransferase
Capping Reaction:
Figure10.9A+B, Genomes, Brown, 2nd Ed
Elongation of eukaryotic mRNAs •
Fundamental aspects of transcript elongation are the same in bacteria and eukaryotes • 2 major distinctions 3. length of transcript that must be synthesized, eukaryotes much longer – Extreme length of eukaryotic genes places demands on RNA pol II – Elongation factors stabilise RNA pol II - prevent it from pausing/stopping during transcription 4. RNA pol II must negotiate nucleosomes – Elongation factors capable of modifying chromatin structure in some way
Polyadenylation of eukaryotic mRNA • Inherent part of the mechanism for termination of transcription by RNA pol II • Series of ≤ 250 adenosines at 3’ end added to transcript by template-independent RNA pol = poly (A) polymerase • Mammals: polyadenylation directed by poly (A) signal sequence = 5’ -AAUAAA-3’: located between 10 and 30 nts upstream of poly (A) site • Poly (A) site: immediately after dinucleotide 5’ -CA-3’ and is followed by a GU-rich region • Poly (A) signal sequence and GU-rich region = binding sites for multi-subunit protein complexes = cleavage factors
Polyadenylation of eukaryotic mRNA • Cleavage factors are: – Cleavage and polyadenylation specificity factor (CPSF) – Cleavage stimulation factor (CstF) • For polyadenylation to occur, Poly (A) polymerase must associate with: – Both cleavage factors (CPSF & CstF) – Polyadenylate binding protein (PADP) • When poly (A) signal sequence located, cleavage occurs at internal site, i.e. poly (A) site and poly (A) tail added • Properties of elongation complex altered: termination becomes favored over continued RNA synthesis
Polyadenylation of eukaryotic mRNA
Figure 10.10, Genomes, Brown, 2nd Ed
Processing of mRNA in Eukaryotes: Splicing
Intron Splicing • Intron = non-coding region within a discontinuous gene • Introns less common in lower eukaryotes: – 6000 genes in yeast genome contain only 239 introns in total – Many single individual mammalian genes contain ≥ 50 introns
Figure 10.12, Genomes, Brown, 2nd Ed
GU-AG Introns •
• • •
In most pre-mRNA introns: first 2 nts of intron sequence are 5’ – GU - 3’ and last 2 nts are 5’ –AG – 3’: therefore called 'GU-AG' introns GU-AG motifs parts of longer consensus sequences that span 5’ and 3’ splice sites 5 ‘splice site = donor site 3’ splice site = acceptor site
Adapted from Figure 10.13, Genomes, Brown, 2nd Ed
Splicing pathway: 2 major steps • •
Cleavage of 5’ splice site: Transesterification reaction promoted by hydroxyl group (hydroxyl attack) attached to 2’C of adenosine nt located within intron sequence (this A = branch point).
•
Hydroxyl attack results in cleavage of phosphodiester bond at 5’ splice site, accompanied by formation of new 5’ -2’ phosphodiester bond linking first nt of intron with internal A Intron: looped back on itself to create a lariat structure
•
Splicing pathway: 2 major steps 1. Cleavage of 3’ splice site & joining of exons •
2nd transesterification reaction, promoted by 3’ -OH group attached to end of upstream exon
•
3’-OH group attacks phosphodiester bond at 3’ splice site, cleaving it: intron released as lariat structure: subsequently converted back to a linear RNA and degraded
•
3’ end of upstream exon joins to newly formed 5’ end of downstream exon, completing the splicing process
Splicing pathway: 2 major steps
Figure 10.14, Genomes, Brown, 2nd Ed
snRNPs • Central components of splicing apparatus for GU-AG introns are snRNAs called U1, U2, U4, U5 & U6 • These are short molecules (between 106 nts and 185 nts) that associate with proteins to form small nuclear ribonucleoproteins (snRNPs) • Structure of U1-snRNP: – Comprises 165-nt U1-RNA plus ten proteins – 3 of these proteins including U1-70K and U1-A are specific to this snRNP – Remaining 7 proteins are Sm proteins that are found in all snRNPs involved in splicing – U1-RNA forms stem-loop base-paired structure: U1-70K and U1-A proteins attach to two of the major stem-loops – Sm proteins attach to the Sm site
Structure of U1-snRNP
Figure 10.16, Genomes, Brown, 2nd Ed
Formation of the Spliceosome •
snRNPs, and other accessory proteins, attach to transcript, form series of complexes, to form the spliceosome = structure within which actual splicing reactions occur
•
Formation of spliceososome brings 3’ splice site in close proximity to branch point & 5’ site, therefore 2 transesterification reactions can occur as linked reaction: occurs in 3 steps: 1. Formation of Commitment Complex 2. Formation of Pre-Spliceosome Complex 3. Formation of Spliceosome
Formation of the Spliceosome •
•
•
Commitment Complex: initiates splicing activity: comprises U1-snRNP, which binds to 5’ splice site, partly by RNA-RNA base-pairing, and additional protein factors (x3) which make protein-RNA contacts with branch point, polypyrimidine tract and 3’ splice site Pre-spliceosome Complex: comprises the commitment complex plus U2-snRNP which also attaches to the branch site. Association between U1-snRNP and U2-snRNP brings 5’ splice site into close proximity with branch point Spliceosome: formed when U4/U6-snRNP and U5-snRNP attach to pre-spliceosome complex. This results in additional interactions that bring 3’ splice site close to 5’ site and branch point. All 3 key positions in intron are now in close proximity and 2 transesterifications occur as a linked reaction, possibly catalyzed by U6-snRNP, completing the splicing process
Formation of the Spliceosome
Figure 10.17, Genomes, Brown, 2nd Ed
SR Proteins • How correct splice sites are selected is still poorly understood – presence of intron-exon boundaries not sufficient to define splice sites • Set of splicing factors called SR proteins are important in splice-site selection • SR proteins interact with Exonic Splicing Enhancers (ESEs) and Exonic Splicing Silencers (ESSs) • SR proteins bound to ESEs stimulate spliceosome assembly and splice-site recognition • SR proteins also are involved in the establishment of a connection between bound U1-snRNP and bound U2AF in the commitment complex • Location of ESEs and ESSs indicates that assembly of spliceosome driven not simply by contacts within the intron but also by interactions with adjacent exons
Importance of ESEs • It is possible that an individual commitment complex is not assembled within an intron, but initially bridges an exon • Initial assembly of commitment complex across an exon might therefore be a less difficult task than assembly across a much longer intron • Importance of ESE and ESS in controlling splicing is clear from the discovery that several human diseases, including one type of muscular dystrophy, are caused by mutations in ESE sequences
Role of SR proteins in Commitment Complex Formation
Figure 10.18, Genomes, Brown, 2nd Ed
Degradation of Eukaryotic RNAs • • • •
Presence or absence of mRNAs in the cell determines which proteins will be synthesized Degradation of specific mRNAs could be a powerful way of regulating genome expression Eukaryotic mRNAs longer lived than bacterial counterparts: half-lives of several hours for mammalian mRNA 4 pathways in mRNA degradation in eukaryotes have been identified: • Exosome: degradation of transcripts in 3’ -> 5’ direction • Deadenylation-dependent decapping: removal of poly(A) tail and 5’ cap
• •
Nonsense-mediated RNA Decay (NMD) RNA silencing/interference: protection from foreign RNAs e.g. viral
Nonsense-mediated RNA Decay (NMD) • Results in specific degradation of mRNAs that have a termination codon at an incorrect position (due to mutation or as a result of incorrect splicing) • NMD therefore prevents nonsense mRNA from being continually translated and consequently producing potentially deleterious truncated polypeptides • Incorrect codon detected by a 'surveillance' mechanism that involves a complex of proteins which scans the mRNA and is able to distinguish between correct and incorrect termination codon • Identification of incorrect termination codon induces cap cleavage and 5’ ->3’ exonuclease degradation
Alternative Splicing
• Splicing can be: – Constitutive: pre-mRNA always processed in the same way so that only 1 type mRNA produced and only 1 protein is generated from a given gene – Alternative: pre-mRNA can be processed in different ways to produce various mRNAs with different sequences, which give rise to variety of proteins that differ in peptide sequence • 2 modes of alternative splicing: – Intron retaining mode: instead of splicing out an intron, it is retained in the mRNA transcript. However, the intron's code must be properly expressible, otherwise a stop codon or a shift in reading frame will cause the protein to be nonfunctional. – Exon skipping mode: certain exons are spliced out to alter the sequence of amino acids in the expressed protein
Figure 10.19, Genomes, Brown, 2nd Ed
Alternative Splicing •
Primary transcripts of some genes can follow 2 or more alternative splicing pathways producing many different but related splice variants
•
Protein products of the alternative splicing pathway may show different chemical and biological properties
•
In some organisms alternative splicing is uncommon, only three examples being known in Saccharomyces cerevisiae
•
In higher eukaryotes it is much more prevalent
•
Human genome ~ 35 000 genes: 35% of these genes undergo alternative splicing
•
Alternative splicing is modulated according to cell type, developmental stage, sex or in response to external stimuli, precise regulation mechanism is poorly understood
Example of Alternative Splicing in Humans • Human slo gene codes for membrane protein that regulates entry and exit of K+ into and out of cells • Slo gene has 35 exons, 8 of which are involved in alternative splicing events which involve different combinations of the 8 exons • Alternative splicing results in over 500 distinct mRNAs, each specifying a membrane protein with slightly different functional properties • What are the biological consequences of this? • Human slo genes are active in inner ear: determine auditory properties of hair cells on basilar membrane of cochlea • Different hair cells respond to different sound frequencies between 20 and 20 000 Hz, their individual capabilities determined in part by the properties of their Slo proteins. • Alternative splicing of slo genes in cochlear hair cells therefore determines the auditory range of humans
Aberrant Splicing • Alternation of the normal process of alternative splicing in cancer cells results in production of previously non-existing mRNAs which may give rise to different protein isoforms with potentially tumorigenic properties = aberrant splicing
Taken from: Lancet Oncol 2007; 8: 439-57
2 Main Causes of Aberrant Splicing 1. Mutations in pre-mRNA sequences: – Can disrupt existing splice sites – Can create cryptic splice sites (a site within an intron or exon that has sequence similarity with the consensus motifs of real splice sites) – May be inherited (germline mutations) or aquired (somatic mutations) – Neurofibromatosis 1 (NF1) gene one of highest mutation rates for any human disorder: in 32% NF1 patients: disease caused by aberrant alternatively spliced transcripts 2. Changes in cellular factors involved in assembly of spliceosome: – Alternation in the activity and composition of splicing factors may modify the selection of splice sites, e.g. SR proteins
Splice Variants as Cancer Biomarkers
Splice Variants as Cancer Biomarkers • The NMD pathway will eliminate aberrantly spliced mRNAs that could give rise to potential oncogenic proteins • However, in certain pathological conditions, aberrantly spliced mRNAs go unnoticed by NMD pathway • Such splice variants can give rise to proteins which can cause disease • Discovery of cancer-specific alternatively spliced variants has prompted interest in their potential use as disease biomarkers (diagnostic, prognostic or predictive) • Available detection methods must be optimally specific and sensitive: RT-PCR-based techniques (using variantspecific primers); microarrays and generation of Abs against protein variants all possible approaches
Aberrant CD44 Variants are Candidate Biomarkers • CD44 is a multifunctional receptor • CD44 and products of its splice variants are involved in a number of cancer-related processes including apoptosis, cell differentiation, angiogenesis, cell migration and invasion • Aberrant CD44 splice variants are found in the blood of patients with certain cancers including head & neck squamous cell carcinomas, gastric carcinomas, osteosarcomas • CD44 attractive biomarker for routine clinical analysis by minimally invasive procedures • However, data obtained by different research groups – conflicting, may be due to differences in methodology