INTRODUCTION
A DNA microarray is a multiplex technology used in molecular biology and in medicine. It consists of an arrayed series of thousands of microscopic spots of DNA oligonucleotides, called features, each containing picomoles of a specific DNA sequence. This can be a short section of a gene or other DNA element that are used as probes to hybridize a cDNA or cRNA sample (called target) under high-stringency conditions. Probe-target hybridization is usually detected and quantified by fluorescence-based detection of fluorophore-labeled targets to determine relative abundance of nucleic acid sequences in the target. In standard microarrays, the probes are attached to a solid surface by a covalent bond to a chemical matrix (via epoxy-silane, amino-silane, lysine, polyacrylamide or others). The solid surface can be glass or a silicon chip, in which case they are commonly known as gene chip or colloquially Affy chip when an Affymetrix chip is used. Other microarray platforms, such as Illumina, use microscopic beads, instead of the large solid support. DNA arrays are different from other types of microarray only in that they either measure DNA or use DNA as part of its detection system. Microarray technology evolved from Southern blotting, where fragmented DNA is attached to a substrate and then probed with a known gene or fragment. The use of a collection of distinct DNAs in arrays for expression profiling was first described in 1987, and the arrayed DNAs were used to identify genes whose expression is modulated by interferon. These early gene arrays were made by spotting cDNAs onto filter paper with a pin-spotting device. The use of miniaturized microarrays for gene expression profiling was first reported in 1995, and a complete eukaryotic genome (Saccharomyce cerevisiae) on a microarray was published in 1997.
PRINCIPLE
The microarray technology consists of spotting PCR products or long oligonucleotides (50mer-70mer) on glass slides at densities of up to 6000 spots / cm2. These slides are hybridised using fluorescent targets (cDNAs or genomic DNAs). The fluorescent molecules most commonly used are members of the cyanine family, Cy3 et Cy5. After hybridisation, the signals are detected using a fluorescence scanner. The use of two different fluorochromes allows the determination of hybridisation signals from two distinct strains in one single experiment. One the fluorescent intensities have been obtained, the major part of the work is the analysis of the data in order to extract the biological information. This analysis can be divided into five steps : Target preparation Hybridization Slide scanning Data analysis Expression profile clustering
MATERIALS DNA sources About 5200 human cDNA clones of the IMAGE library were obtained from the RZPD Resource Centre (Berlin, Germany). Some 21 000 random shotgun clones representing the genome of Trypanosoma brucei were provided by Najib El-Sayed of the Institute for Genomic Research (TIGR, Rockville, USA). Nearly 4550 shotgun clones covering the entire genome of Pseudomonas putida as a minimal tiling path were obtained from Helmut Hilbert of Qiagen (Hilden, Germany). PCR products for some 21 000 predicted open reading frames (ORFs) of Drosophila melanogaster were produced directly from genomic DNA. The template for some 7300 ORF-specific PCR products of Candida albicans was strain SC5314 (Can14). PCR amplification PCR amplifications were performed in 384- or 96-well microtitre plates. For PCR on the cDNA and shotgun clones, 0.2 µM of the respective, vector-specific primer pairs d(TCA CACAGGAAACAGCTATGAC) and d(GTAAAACGACGGCCAGTG) (human clones), d(TTGTAAAACGACGGCCAGTG) and d(GCGGATAACAATTTCACACAGGA) (T.brucei) or d(TCGGATCCACTAGTAACG) and d(GGCCGCCAGTGTGATG) (P.putida) (all from Interactiva, Ulm, Germany) were used. The reactions were started by inoculating 25 or 100 µl of PCR mix, usually in 10 mM Tris–HCl, pH 8.3, 2.25 mM MgCl2, 50 mM KCl, 0.2 mM each dATP, dTTP, dGTP and dCTP, 1.5 M betaine, 0.1 mM cresol red and 2 U Taq polymerase, with a few Escherichia coli cells transferred from a growth culture using a plastic 384- or 96-pin gadget (Genetix, New Milton, UK). The plates were incubated for 3 min at 94°C, before 35 cycles of denaturation at 94°C for 30 s, annealing at 51°C for 30 s and elongation at 72°C for 90 s were performed, followed by a final elongation phase at 72°C for 10 min. In some cases, the PCR was performed without betaine. The Drosophila ORFs were initially amplified on 100 ng genomic DNA with some 43 000 gene-specific primers, all of which contained one of several common tag sequences of 15 nt length at their 5'-ends. Subsequent re-amplification was carried out using the fitting primer pair. PCR products of C.albicans ORFs were produced on 20 ng genomic DNA with 7300 specific primer pairs.
Microarray production process: DNA fragments amplified by PCR technique are spotted on a microscopic glass slide coated with polylysine prior to spotting process. The polylysine coating goal is to ensure DNA fixation through electrostatic interactions. PCR fragments are in our case the expressed part (ORF) of the 6200 Saccharomyces cerevisae genes (baker yeast). Slide preparation is achieved by blocking the polylysine not fixed to DNA in order to avoid target binding. Prior to hybridisation, DNA is denatured to obtained a single strand DNA
on the microarray, this will allow the probe to bind to the complementary strand from the target.
Target preparation: RNA are extracted from two yeast cultures from which we want to compare expression level. Messengers RNA are then transformed in cDNA by reverse transcription. On this stage, DNA from the first culture with a green dye, whereas DNA from the second culture is labelled with a red dye. The available target-preparation methods can be divided into two groups: first-strand cDNA that is labeled or tagged with a capture sequence, or the generation of antisense RNA (aRNA) from double-stranded cDNA during an in vitro transcription (IVT) reaction. Labeled cDNA can be prepared via direct The incorporation of a fluorophorelabeled nucleotide or through incorporation of an aminoallyl-labeled nucleotide, followed by coupling to a fluorophore containing an amine-reactive group to the aminoallyl nucleotide (Schena et al. 1995; for review, see Lockhart and Winzeler 2000). Alternatively, the first-strand cDNA can be tagged with a capture sequence that is used for subsequent detection steps (Stears et al. 2000). DNA microarrays containing short oligonucleotide probes (<35 nucleotides long) require more target for each hybridization, which requires an amplification method with smaller sample sizes. Typically, the generation of aRNA (aRNA is also commonly called complementary RNA or cRNA) is preceded by first-strand synthesis of cDNA using an oligonucleotide primer containing a bacteriophage T7 RNA polymerase promoter proximal to an oligo(dT) sequence (van Gelder et al. 1990;Eberwine et al. 1992; Lockhart et al. 1996). After second-strand cDNA synthesis and cDNA purification, an IVT reaction is performed using T7 RNA polymerase in the presence of labeled nucleotides. Alternatives to this labeling strategy produce unlabeled aRNA, followed by a cDNA synthesis in the presence of a fluorophore-labeled nucleotide (Wang et al. 2000). Any target preparation method requires a linear amplification of the available transcripts to be representative of the transcript population.
Hybridisation: Green labelled cDNA and red labelled ones are mixed together (call the target) and put on the matrix of spotted single strand DNA (call the probe). The chip is then incubated one night at 60 degrees. At this temperature, a DNA strand that encounter the complementary strand and match together to create a double strand DNA. The fluorescent DNA will then hybridise on the spotted ones.
The discrepancies in microarray results are a consequence of differences in microarray measures, such as accuracy [i.e. ‘the degree of conformity of the measured quantity to its actual (true) value’; sensitivity [i.e. ‘the concentration range of target molecules in which accurate measurements can be made’; reproducibility [i.e. ‘the degree to which repeated measurements of the same quantity will show the same or similar results’; and specificity [i.e. ‘the ability of a probe to provide a signal that is influenced only by the presence of the target molecule’. Accuracy, sensitivity and reproducibility may be affected by several effectors. These measures and their effectors are discussed by Dufva and Draghici et al. , and will not be detailed here. An example for an effector of sensitivity, reproducibility and accuracy is the type of microarray platform: oligonucleotide arrays have been found to be more reproducible and sensitive than cDNA arrays , and some oligonucleotide arrays have been found to be more accurate than others. Sensitivity is also affected by probe density (i.e. the number of different probes that are fabricated in a given area), which has been shown to be an effector for the availability of probes for hybridization; this availability may also be affected by the steric restrictions imposed by the solid microarray surface. A higher availability of probes for hybridization has been demonstrated to increase sensitivity. In addition, sensitivity is affected by the hybridization signal-to-noise ratio (i.e. the ratio between the spot signal and that of the background): a low background increases microarray hybridization sensitivity Low specificity of microarray hybridizations has been suggested to be one of the prime measures affecting discrepancies in gene-expression profiles between different probes targeting the same region of a given transcript or between different microarray platforms; in the present review, we will highlight the issue of microarray - hybridization specificity as a key measure that once improved, may increase the validity of microarray results.
Microarrays consist of multiple probes. Hence, a prime key for specificity during microarray hybridiation, for either short-oligomer or cDNA microarrays; is the ability of the probe to discriminate between different target molecules. Probes are designed to be complementary to the target molecule according to the Watson– Crick rules of binding. Therefore, a probe with high specificity to its target molecule should provide a signal influenced only by the presence of the target molecule. Nevertheless, a perfect match in terms of sequence-similarity-based complementarity between a probe and its target molecule does not guarantee specificity. This is due to the presence of thousands of target molecules during microarray hybridization—each target molecule being composed of tens of hundreds or thousands of four-nucleotide bases, and to the effect of different effectors (discussed subsequently) of hybridization specificity, which may alter the ability of a probe to bind to a target molecule. Hence, there is often some degree of microarray-probe hybridization to a target molecule which is not strictly complementary to it or vice versa, a variable number of target molecules that are hybridized to a microarray probe which is not exactly complementary to them.
FOUR LEVELS OF HYBRIDIZATION SPECIFICITY We define four levels of hybridization specificity in the context of microarray hybridization. The first is of hybridization between a single probe molecule and a single target molecule. The two molecules may exhibit perfect hybridization, partial hybridization (i.e. the target molecule is only partially hybridized to the probe; or no hybridization. The second level of specificity is of a spot. At this level, multiple probe molecules that compose one spot are hybridized to multiple target molecules. The spot probes may exhibit perfect, partial or no hybridization with the target molecules. Notably, at this level, partial hybridization may have one or both of two forms: only some of the probes may be hybridized to the target molecule, or probes may be hybridized to only some of the target molecules. This partial hybridization, at the spot level, may be a result of crosshybridization (i.e. hybridization between sequences that are not strictly complementary, due to the presence and hybridization of nontarget molecules with sequences similar to that of the spot probes. Since a spot is composed of multiple probes, a single spot may simultaneously bear all combinations of one to four of the presented probe-target molecule types of binding. The third level of specificity is of a spot-set [or, in Affymetrix terminology, ‘probe-set’, in which multiple spots represent different segments of the same reference sequence (e.g. different exons of a gene). At this level, different spots of a spot-set may exhibit perfect hybridization with the target molecule; partial hybridization with the target molecule due to the presence of probes with mismatches to the target molecule as a result of, for example, an annotation error in the gene sequence, or intended mismatches introduced to quantify nonspecific hybridization; no hybridization due to, for example, alternative
splicing of a transcript, leading to probes with no match to the target molecule; cross hybridization due to, for example, a spot, within a spot-set that represents an evolutionarily conserved gene segment, which hybridizes with nontarget molecules derived from various gene-family members. The fourth level of specificity is that of the microarray, in which a variable number of spot-sets may exhibit different forms of hybridization with target sequences perfect hybridization (i.e. all target molecules are hybridized to their representative spot-sets and all spot-sets are hybridized to the target molecules they represent), partial hybridization in either direction, no hybridization (i.e. target molecules are not hybridized to any spot-set or spot-sets do not match any target molecules) or cross- hybridization (e.g. target molecules of different genes hybridize to the same spot-set or target molecules of a particular gene hybridize to several different genes’ spot-sets). These different forms may exist for a large number of different target molecules or spot-sets.
Slide scanning: A laser excites each spot and the fluorescent emission gather through a photomultiplicator (PMT) coupled to a confocal microscope. We obtained two images where grey scales represent fluorescent intensities read. If we replace grey scales by green scales for the first image and red scales for the second one, we obtained by superimposing the two images one image composed of spots going from green ones (where only DNA from the first condition is fixed) to red (where only DNA from the second condition is fixed) passing through the yellow colour (where DNA from the two conditions are fixed on equal amount).
Data analysis: We have now two microarray images from which we have to calculate the number of DNA molecules in each experimental condition. To dos o, we measure the signal amount in the green dye emission wavelength and the signal amount in the red dye emission wavelength. Then we normalise these amount according to various parameters (yeast amount in each culture condition, emission power of each dye, …). We suppose that the amount of fluorescent DNA fixed is proportional to the mRNA amount present in each cell at the beginning and we calculate the red/green fluorescence ratio. If this ratio is greater than 1 (red on the image), the gene expression is greater in the second experimental condition, if this ration is smaller than 1 (green on the image), the gene expression is greater in the first condition. We can visualize these differences in expression using software as the one developed in the laboratory call ArrayPlot (cf below image). This software allows from the intensities list of spot to display the red intensities of each spot as a function of the green intensities.
Fabrication Microarrays can be manufactured in different ways, depending on the number of probes under examination, costs, customization requirements, and the type of scientific question being asked. Arrays may have as few as 10 probes to up to 2.1 million micrometre-scale probes from commercial vendors.
Surface engineering The first step of DNA microarray fabrication involves surface engineering of a substrate in order to obtain desirable surface properties for the application of interest. Optimal surface properties are those which produce high signal to noise ratios for the DNA targets of interest. Generally, this involves maximizing the probe surface density and activity while minimizing the non-specific binding of the targets of interest. Methods of surface engineering vary depending on the platform material, design, and application.
Spotted vs. oligonucleotide arrays Microarrays can be fabricated using a variety of technologies, including printing with fine-pointed pins onto glass slides, photolithography using pre-made masks, photolithography using dynamic micromirror devices, ink-jet printing, or electrochemistry on microelectrode arrays.
In spotted microarrays, the probes are oligonucleotides, cDNA or small fragments of PCR products that correspond to mRNAs. The probes are synthesized prior to deposition on the array surface and are then "spotted" onto glass. A common approach utilizes an array of fine pins or needles controlled by a robotic arm that is dipped into wells containing DNA probes and then depositing each probe at designated locations on the array surface. The resulting "grid" of probes represents the nucleic acid profiles of the prepared probes and is ready to receive complementary cDNA or cRNA "targets" derived from experimental or clinical samples. This technique is used by research scientists around the world to produce "in-house" printed microarrays from their own labs. These arrays may be easily customized for each experiment, because researchers can choose the probes and printing locations on the arrays, synthesize the probes in their own lab (or collaborating facility), and spot the arrays. They can then generate their own labeled samples for hybridization, hybridize the samples to the array, and finally scan the arrays with their own equipment. This provides a relatively low-cost microarray that may be customized for each study, and avoids the costs of purchasing often more expensive commercial arrays that may represent vast numbers of genes that are not of interest to the investigator. Publications exist which indicate in-house spotted microarrays may not provide the same level of sensitivity compared to commercial oligonucleotide arrays, possibly owing to the small batch sizes and reduced printing efficiencies when compared to industrial manufactures of oligo arrays. In oligonucleotide microarrays, the probes are short sequences designed to match parts of the sequence of known or predicted open reading frames. Although oligonucleotide probes are often used in "spotted" microarrays, the term "oligonucleotide array" most often refers to a specific technique of manufacturing. Oligonucleotide arrays are produced by printing short oligonucleotide sequences designed to represent a single gene or family of gene splice-variants by synthesizing this sequence directly onto the array surface instead of depositing intact sequences. Sequences may be longer (60-mer probes such as the Agilent design) or shorter (25-mer probes produced by Affymetrix) depending on the desired purpose; longer probes are more specific to individual target genes, shorter probes may be spotted in higher density across the array and are cheaper to manufacture. One technique used to produce oligonucleotide arrays include photolithographic synthesis (Agilent and Affymetrix) on a silica substrate where light and light-sensitive masking agents are used to "build" a sequence one nucleotide at a time across the entire array. Each applicable probe is selectively "unmasked" prior to bathing the array in a solution of a single nucleotide, then a masking reaction takes place and the next set of probes are unmasked in preparation for a different nucleotide exposure. After many repetitions, the sequences of every probe become fully constructed. More recently, Maskless Array Synthesis from NimbleGen Systems has combined flexibility with large numbers of probes.
Two-channel vs. one-channel detection
Diagram of typical dual-colour microarray experiment. Two-color microarrays or two-channel microarrays are typically hybridized with cDNA prepared from two samples to be compared (e.g. diseased tissue versus healthy tissue) and that are labeled with two different fluorophores. Fluorescent dyes commonly used for cDNA labelling include Cy3, which has a fluorescence emission wavelength of 570 nm (corresponding to the green part of the light spectrum), and Cy5 with a fluorescence emission wavelength of 670 nm (corresponding to the red part of the light spectrum). The two Cy-labelled cDNA samples are mixed and hybridized to a single microarray that is then scanned in a microarray scanner to visualize fluorescence of the two fluorophores after excitation with a laser beam of a defined wavelength. Relative intensities of each fluorophore may then be used in ratio-based analysis to identify up-regulated and downregulated genes. Oligonucleotide microarrays often contain control probes designed to hybridize with RNA spike-ins. The degree of hybridization between the spike-ins and the control probes is used to normalize the hybridization measurements for the target probes. Although absolute levels of gene expression may be determined in the two-color array, the relative differences in expression among different spots within a sample and between samples is the preferred method of data analysis for the two-color system. Examples of providers for such microarrays includes Agilent with their Dual-Mode platform, Eppendorf with their DualChip platform for fluorescence labeling, and TeleChem International with Arrayit. In single-channel microarrays or one-color microarrays, the arrays are designed to give estimations of the absolute levels of gene expression. Therefore the comparison of two conditions requires two separate single-dye hybridizations. As only a single dye is used, the data collected represent absolute values of gene expression. These may be compared to other genes within a sample or to reference "normalizing" probes used to calibrate data across the entire array and across multiple arrays. Three popular single-channel systems
are the Affymetrix "Gene Chip", the Applied Microarrays "CodeLink" arrays, and the Eppendorf "DualChip & Silverquant". One strength of the single-dye system lies in the fact that an aberrant sample cannot affect the raw data derived from other samples, because each array chip is exposed to only one sample (as opposed to a two-color system in which a single low-quality sample may drastically impinge on overall data precision even if the other sample was of high quality). Another benefit is that data are more easily compared to arrays from different experiments; the absolute values of gene expression may be compared between studies conducted months or years apart. A drawback to the one-color system is that, when compared to the two-color system, twice as many microarrays are needed to compare samples within an experiment.
Expression profile clustering: Then we can try to gather genes that share the same expression profile on several experiments. This clustering can be done gradually as for phylogenetic analysis, which consist in calculating similarity criteria between expression profiles and gather the most similar ones. We can also use more complex techniques as principal component analysis or neuronal networks. At the end hierarchical clustering is usually displayed as a matrix where each column represent one experiment and each row a gene. Ratios are displayed thanks to a colour scale going from green (repressed genes) to red (induced genes).
Uses and types Arrays of DNA can be spatially arranged, as in the commonly known gene chip (also called genome chip, DNA chip or gene array), or can be specific DNA sequences labelled such that they can be independently identified in solution. The traditional solid-phase array is a collection of microscopic DNA spots attached to a solid surface, such as glass, plastic or silicon biochip. The affixed DNA segments are known as probes (although some sources use different terms such as reporters). Thousands of them can be placed in known locations on a single DNA microarray. DNA microarrays can be used to detect DNA (as in comparative genomic hybridization), or detect RNA (most commonly as cDNA after reverse transcription)that may or may not be translated into proteins. The process of measuring gene expression via cDNA is called expression analysis or expression profiling. Since an array can contain tens of thousands of probes, a microarray experiment can accomplish that many genetic tests in parallel. Therefore arrays have dramatically accelerated many types of investigation.
Applications include: Technology or Application
Gene expression profiling
Synopsis
In an mRNAor gene expression profiling experiment the expression levels of thousands of genes are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on gene expression. For example, microarray-based gene expression profiling can be used to identify genes whose expression is changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues.
Comparative genomic Assessing genome content in different cells or closely related hybridization organisms. Chromatin immunoprecipitation on Chip
DNA sequences bound to a particular protein can be isolated by immunoprecipitating that protein (CHIP), these fragments can be then hybridized to a microarray (such as a tiling array) allowing the determination of protein binding site occupancy throughout
the genome. Example protein to immunoprecipitate are histone modifications (H3K27me3, H3K4me2, H3K9me3, etc), Polycomb-group protein (PRC2:Suz12, PRC1:YY1) and trithorax-group protein (Ash1) to study the epigenetic landscape or RNA Polymerase II to study the transcription lanscape.
SNP detection
Identifying single nucleotide polymorphism among alleles within or between populations. Several applications of microarrays make use of SNP detection, including Genotyping, forensic analysis, measuring predisposition to disease, identifying drugcandidates, evaluating germline mutations in individuals or somatic mutations in cancers, assessing loss of heterozygosity, or genetic linkage analysis.
Alternative splicing detection
An 'exon junction array design uses probes specific to the expected or potential splice sites of predicted exons for a gene. It is of intermediate density, or coverage, to a typical gene expression array (with 1-3 probes per gene) and a genomic tiling array (with hundreds or thousands of probes per gene). It is used to assay the expression of alternative splice forms of a gene. Exon arrays have a different design, employing probes designed to detect each individual exon for known or predicted genes, and can be used for detecting different splicing isoforms.
Tiling array
Genome tiling arrays consist of overlapping probes designed to densely represent a genomic region of interest, sometimes as large as an entire human chromosome. The purpose is to empirically detect expression of transcripts or alternatively splice forms which may not have been previously known or predicted.