NEUROGENOMICS: APPLICATIONS AND ANALYSIS Diego A. Forero, MD, PhD (c) 1,2,3,4,5 1
Applied Molecular Genomics Group, Department of Molecular Genetics, Flanders Institute for Biotechnology (VIB); 2 University of Antwerp, Antwerp, Belgium; 3 Laboratory of Developmental Genetics, VIB; 4 Catholic University of Leuven, Leuven, Belgium; 5 Grupo de Neurociencias, Universidad Nacional de Colombia, Bogotá, Colombia. Email:
[email protected] http://users.skynet.be/dforero/index.htm I have consolidated a set of exercises, in which you can apply different in-silico approaches to common research problems in genetics and genomics. It is expected that the application of these tools will enhance the design and analysis of neurogenomics experiments, in terms of scope, precision and speed. All the bioinformatics tools required to solve these exercises are listed in my website: http://users.skynet.be/dforero/df9.htm 1. Identify the number of haplotype blocks that are found in the following human genes -CREM gene in European population -GABRA6 gene in African population -BDNF gene in Asian population -LMNA gene in African population -PRNP gene in European population 2. Identify the tagging SNPs for the following human genes: -GRIA2 gene in European population -PDE4B gene in African population -HTR2C gene in Asian population -KCNA2 gene in African population -RIMS3 gene in European population 3. Find the top 10 candidate targets for each one of the following human microRNAs: -hsa-mir-132 -hsa-mir-134 -hsa-mir-7 -hsa-mir-135b -hsa-let-7a 4. Identify the predicted secondary structures of the following human miRNAs: -hsa-mir-132 -hsa-mir-134 -hsa-mir-7 -hsa-mir-135b -hsa-let-7a 5. Retrieve the tissue with the highest expression in humans for each one of these genes. -APOE 1
-CREM -BDNF -PRNP -BACE1 6. Retrieve the dbSNP identifiers for the following human variations -GCTGTAGGCCAGACCCTGGCA(A/C)GATCTGGGTGGATAATC -AAATGAGGACTTCTGACCTC(A/G)AACGCTGCCCTTGTTCTT -GCAGCCGGACAAACTTGCCCTCCTC(A/G)CCACCTCCTCCAC -ACTATTAATGATAATACT(A/G)TCTCTCATTTATTGAGCATT -CTGACACTTTCGAACAC(A/G)TGATAGAAGAGCTGTTGGATG
7. Identify the top 10 candidate genes for Alzheimer disease and the top 10 candidate genes for Parkinson disease (with basis in meta-analysis of published association studies): 8. Retrieve the list of known genes located in the following human genomic regions: -9q34.3 -21q21.3 -17p13.1 -11q23.3 -1q23.2 9. Identify the repeat sequences that are present in the following human genomic regions: -chr17:8279904-8312206 -chr2:86247142-86276108 -chr6:16846682-16869700 -chr1:40858939-40903911 -chr6:163755665-163914884 10. Identify the effects on transcription factors binding sites for the following SNPs: -rs34706444 -rs12028379 -rs5774713 -rs12239355 -rs17129477
11. Identify the vector sequences that are present in the following DNA fragments: -acacctttgaggtgaaagagtattcagtgaatatgatggtcatgatgatgtcaccttggatttaaggcattttcttaag atgtgtaaagtatgttcctttagccgccaccgcggtggagctcccagcttttgttcccttta -tatctgggctttagtttctccatcattacaatgaagagatgtgctatccttttccaccctgttctaaaattgtgtaact tttttttttcttttttgagacatgcacgagtgggttacatcgaactggatctcaacagcggt -gtagtcaggattctgctgacctgcttacagggcactaaatacctgaggaggcaggagcttgggggaaagctgagaggta tctatccccatctacctactgatggagttccgcgttacataacttacggtaaatggcccgcc
12. Identify the top candidate variations in the following human DNA sequence traces: You will use a file with the chromatograms of 96 subjects sequenced for a 500 bp region. 13. Retrieve the genomic lengths, protein lengths, chromosomal positions and number of exons for the following genes: -PLXNA2 -NRG1 -MTHFR -DTNBP1 -SLC6A4
2
14. Identify the homologues in mouse and drosophila of the following human genes: -SV2A -PDE4B -DRD1 -SYT1 -RGS4 15. Design overlapping PCR primers to sequence the following human genomic regions: -chr1:40,879,177-40,883,673 -chrX:77,256,575-77,258,830 -chr8:26,530,136-26,532,811 -chr4:122,960,094-122,962,212 -chr5:161,054,462-161,056,347 16. Identify the differential GO and KEGG terms in the following two lists of human genes: List 1. GPR51, GRIA2, KIF5C, MBP, MEF2C, NAP1L3, NCDN, NDRG4, NEFL, NRGN, NTRK2, OLFM1 List 2. AKAP6, BRF1, CCNA2, DST, MACF1, NBEA, RAB11A, RANBP5, SEC8L1, SYNE1, ZFYVE20, ZNF490 17. Identify the proteins encoded by the following RNA sequences: -atggaaaaccccagcccggccgccgccctgggcaaggccctctgcgctctcctcctggccactctcggcgccgccggcc agcctcttgggggagagtccatctgttccgccagagccccggccaaatacagcatcaccttcacg
-atggagctggaccaccggaccagcggcgggctccacgcctaccccgggccgcggggcgggcaggtggccaagcccaacgtgatcctgc agatcgggaagtgccgggccgagatgctggagcacgtgcggcggacgcaccggcac -atgggcttgttagagtgctgtgcaagatgtctggtaggggccccctttgcttccctggtggccactggattgtgtttct ttggggtggcactgttctgtggctgtggacatgaagccctcactggcaca
18. Identify the cDNAs encoding the following protein sequences: -LCADARMYGVLPWNAFPGKVCGSNLLSICKTAEFQMTFHLFIAAFVGAAATLVSLLTFMIAATYNFAVLKLMGRGTKF -EMMDLQHGSLFLRTPKIVSGKDYNVTANSKLVIITAGARQQEGESRLNLVQRNVNIFKFIIPNVVKYSPNCKLLIVSN -MVDMMDLPRSRINAGMLAQFIDKPVCFVGRLEKIHPTGKMFILSDGEGKNGTIELMEPLDEEISGIVEVVGRVTAKAT
19. Find the hierarchical clustering of the following list of genes: Gene 1 Gene 2 Gene 3 Gene 4
Tiss1 0,052905 0,0336 0,021603 0,01405
Tiss2 Tiss3 Tiss4 Tiss5 0,058392 0,06977 0,056961 0,074954 0,095512 0,061694 0,036708 0,050386 0,024434 0,021238 0,018759 0,01518 0,018037 0,008364 0,010938 0,017524
Tiss6 0,061005 0,042539 0,015751 0,006858
Tiss7 0,050068 0,030157 0,012132 0,005407
Tiss8 0,059917 0,056136 0,027813 0,016314
20. Find the genes that have their highest expression in prefrontal cortex (200 fold enrichment in comparison with other tissues), repeat it for amygdala. 21. Identify the transcripts that are targeted by the following affymetrix probes: -204312_x_at -207630_s_at -210400_at -212581_x_at -201891_s_at 22. Identify the haplotypes that are present in the following dataset (including their frequency and calculate the LD values between SNPs). 3
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
subj1
CT
AG
AC
TT
AC
CT
GG
CC
CT
AG
CT
AA
AG
subj2
TT
GG
AA
TT
CC
CC
GG
CC
CT
AG
CT
AG
AG
subj3
CT
AG
AA
TT
CC
CC
GG
CC
CC
GG
TT
AA
AG
subj4
CT
AG
AA
CT
CC
CC
AG
CT
CT
GG
CT
AG
AG
subj5
CT
AG
AA
TT
CC
CC
GG
CC
CT
AG
CT
AG
AG
subj6
CT
AG
AA
TT
CC
CC
GG
CC
CC
GG
TT
AA
AA
subj7
CT
AG
AA
TT
CC
CC
GG
CT
CT
GG
CT
AG
AG
subj8
CC
AG
AC
TT
CC
CC
GG
TT
TT
GG
CC
GG
GG
subj9
TT
GG
CC
TT
AA
TT
GG
CT
CT
GG
CT
AG
AG
subj10
TT
GG
AA
TT
CC
CC
GG
CC
CT
AG
CT
AG
AG
subj11
CT
AG
AA
TT
CC
CC
GG
CC
CT
AG
CT
AA
AG
subj12
CT
GG
AA
TT
CC
CC
GG
TT
TT
GG
CC
GG
GG
subj13
CT
AG
AA
TT
CC
CC
GG
CC
CC
GG
TT
AA
AG
subj14
TT
GG
AA
CT
CC
CC
AG
CC
CT
AG
CT
AG
AG
subj15
CT
AG
AA
TT
CC
CC
GG
CC
CC
GG
TT
AA
AG
subj16
CT
AG
AA
TT
CC
CC
GG
CC
CT
AG
CT
AG
AG
subj17
CT
GG
AC
TT
AC
CT
GG
TT
TT
GG
CC
GG
GG
subj18
CC
AA
AA
TT
CC
CC
GG
CC
CT
AG
CT
AG
AG
subj19
CT
AG
AA
TT
CC
CC
GG
CC
CC
GG
TT
AA
AA
subj20
CC
AA
AC
TT
CC
CT
GG
CC
CC
GG
TT
AA
AA
subj21
CT
AG
AA
TT
CC
CC
GG
CC
CC
GG
TT
AA
AG
subj22
CC
AA
CC
TT
AA
TT
GG
CC
CC
GG
TT
AA
AG
subj23
TT
GG
AA
TT
CC
CC
GG
CC
CC
GG
TT
AA
AG
subj24
TT
GG
AA
TT
CC
CC
GG
CC
CC
GG
TT
AA
AA
23. Identify the predicted functional effects of each one of the following nsSNPs: -rs28931579 -rs769452 -rs28931577 -rs11542040 -rs11542035 24. Retrieve the genomic sequence for all the exons (including 50 bp of flanking sequence) of the following genes: -RGS4 -RIMS3 -RTN1 -SLC1A3 -SNAP25 25. Identify the interacting partners for each one of the following genes: -MEF2C -NAP1L3 -NCDN -NDRG4 -NEFL 26. Identify which of the next P values pass a False Discovery Rate of 0.05. 0,650106935, 0,308093469, 0,463145394, 0,19572116, 0,112681844, 0,493084372, 0,043017213, 0,515230709, 0,098477813, 0,276669253, 0,4536028, 0,927263525, 0,000763073, 0,391324056, 0,381511095, 0,003431856, 0,206671413, 0,354702281, 0,25477432
4
27. Identify the top 10 down-regulated genes in post-mortem schizophrenia brains, repeat it for bipolar disorder. 28. Design PCR primers that allow the cloning of the following fragments: -chrX:77256575-77256975; EcoRI and HindIII -chr8:26530136-26530636; HindIII and XbaI -chr6:16846682-16846982; EcoRI and XbaI -chr1:40858939-40859339; HindIII and EcoRI 29. Identify the genomic regions that are amplified using the following PCR primer pairs: -F-ATGGAGTGGCTAGAAGAGTCAG R-TGGATCATTTGCGATTTCCAGTT -F-AGGGCTTCCTTATGTCCTCCA R-TACCCACGTACCATTAGGAGC -F-AAAAGCAGGAGTGTGATGACG R-CGATCCCAAGTGTGTTACTGG 31. Identify the maximum LOD score simulated for the following pedigree:
32. Identify the nucleotide that is conserved in mouse and rat for the following SNPs: -rs9817739 -rs1937690 -rs7973772 -rs278151 -rs10128858 33. Design primers to genotype the following SNPs by AS-PCR: -rs974849 -rs246835 -rs12768718 -rs10185953 -rs5753220 34. Design primers to genotype the following SNPs by PCR-RFLP: -rs16949418 -rs4979416 -rs4852259 -rs11593916 -rs10488140 35. Identify the number of citations for the papers with the following PMIDs: -17173049 -16862116 5
-8895455 -818641 -17571346 36. Identify the predicted network of interactions for the following genes: CAMK2B, DNER, DNM1, EEF1A2, ELAVL4, GFAP 37. Identify the best predicted drug compound that can modulate the activity of the following genes: -CAMK2B -NTRK2 -VDAC1 -CCNA2 -PDE4B 38. Design PCR primers to differentiate between cDNA and genomic DNA for the following genes: -TF -TU3A -TUBB4 -UCHL1 -VSNL1 39. Identify the most suitable journal to publish a hypothetical paper with the following abstract: Human memory is a polygenic trait. We performed a genome-wide screen to identify memoryrelated gene variants. A genomic locus encoding the brain protein KIBRA was significantly associated with memory performance in three independent, cognitively normal cohorts from Switzerland and the United States. Gene expression studies showed that KIBRA was expressed in memory-related brain structures. Functional magnetic resonance imaging detected KIBRA allele– dependent differences in hippocampal activations during memory retrieval. Evidence from these experiments suggests a role for KIBRA in human memory. 40. Identify the significant SNPs in a genome wide association study and identify possible runs of homozigosity in the same dataset. You will download a publicly available dataset with results from about 500.000 SNPs. 41. Identify SNPs that are located in conserved transcription factor binding sites in chromosome 1; retrieve SNPs that are located in microRNA binding sites in chromosome 2. 42. Identify the Ensembl IDs for the genes of the point 36.
DF, 03-2008 If you use these exercises for teaching purposes, please cite the original source; if you have commentaries or suggestions, please do not hesitate to contact me by email.
6