Data Mining Workshop

  • Uploaded by: Diego Forero
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Mining Workshop as PDF for free.

More details

  • Words: 1,093
  • Pages: 3
WORKSHOP

DATA MINING IN GENOMICS Diego A. Forero, MD, PhD(c) Applied Molecular Genomics Group, VIB Department of Molecular Genetics, University of Antwerp, Antwerp, Belgium VIB Laboratory of Developmental Genetics, Catholic University of Leuven, Leuven, Belgium Unit of Functional Genomics, University of Liege, Liege, Belgium Grupo de Neurociencias, Facultad de Medicina e Instituto de Genética, Universidad Nacional de Colombia, Bogotá, Colombia Editor, hum-molgen.org Email: [email protected] Website: http://www.daforerog.co.cc/

Data mining in genomics involves several interesting and useful approaches to extract relevant information from large biological datasets. This workshop is mainly oriented to students and researchers with backgrounds in biological sciences; people with backgrounds in computer sciences and interested in biology are also welcome. Students are invited to bring their own biological questions of interest to explore in the exercises. A special emphasis will be given to several freely available tools that have been created in recent years with the objective of helping experimentalists (with little programming experience) in the mining of large biological datasets. These datasets can include those ones available in public repositories and databases. A short discussion of extensions for bioinformatics of classical programming languages will also be presented. A brief description of each one of the topics will be followed by extensive hands-on exercises in the use of these tools. Number of students: Between 10 and 20. Duration: 2 days of physical interaction and several days of previous preparation reading the suggested bibliographic material. Cost: Free for selected students. Selection will be based on academic background and a strong interest in genomics. Level: Undergraduate students, MSc students, PhD students. Given the duration and approach of the course, the main goal expected is: Introduction of advanced tools for data mining in genomics at an intermediate level. And it is not expected: To teach what genomics is or to transfer an expertise level on data mining in genomics. It is expected that the students will read all the selected papers before the workshop, in order to be able to focus in the advanced and practical aspects of these topics in an interactive and constructive way. Elements: -PC with connection to internet for each student -Selected literature

Day 1. BRIEF INTRODUCTION -Generation of data vs analysis of data -Current trends in data mining in biology BIOMART AND ADVANCED FEATURES OF THE ENSEMBL GENOME BROWSER

-Types of data available at Biomart -Retrieval of data from Biomart -Data available at Ensembl Browser -Practical Exercises GALAXY AND ADVANCED FEATURES OF THE UCSC GENOME BROWSER -Advanced use of the tables feature of UCSC browser -Creation of user-defined tracks in the UCSC browser -Practical Exercises TAVERNA, MY GRID AND MY EXPERIMENT -Available services to use with Taverna -Creation and execution of bioinformatics workflows -Use of available workflows from My Experiment -Practical Exercises

Day 2. iTOOLS, BIOWEKA AND BIOMOBY -Features of iTools -Features of BioWeka -Features of BioMOBY -Practical Exercises. DATA MINING OF GENOME-WIDE EXPRESSION STUDIES -Retrieval of data from NCBI GEO -Retrieval of data from ArrayExpress -General features of Bioconductor -Practical Exercises DATA MINING OF THE SCIENTIFIC LITERATURE IN GENOMICS -Tools for advanced mining and retrieval of literature in genomics -Practical Exercises BIOPYTHON, BIOPERL AND BIOJAVA -General features of BioPython -General features of BioPerl -General features of BioJava -Practical Exercises in Python and Biopython. PERSPECTIVES FOR DATA MINING AND GENOMICS -The future of -omics approaches -Future of data mining in genomics -EXTENDED SESSION OF SUPERVISED EXERCISES

Selected References Links to the full text of the papers are available at: http://www.daforerog.co.cc/insilico.htm

-Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006 Jan;7(1):55-65. -Barabási AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004 Feb;5(2):101-13. -Bassi S. A primer on python for life science researchers. PLoS Comput Biol. 2007 Nov;3(11):e199. -Brazma A, Krestyaninova M, Sarkans U. Standards for systems biology. Nat Rev Genet. 2006 Aug;7(8):593-605. -Dinov ID, Rubin D, Lorensen W, Dugan J, Ma J, Murphy S, Kirschner B, Bug W, Sherman M, Floratos A, Kennedy D, Jagadish HV, Schmidt J, Athey B, Califano A, Musen M, Altman R, Kikinis R, Kohane I, Delp S, Parker DS, Toga AW. iTools: a framework for classification, categorization and integration of computational biology resources. PLoS ONE. 2008 May 28;3(5):e2265. -Fernández-Suárez XM, Birney E. Advanced genomic data mining. PLoS Comput Biol. 2008 Sep 26;4(9):e1000121. -Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005 Oct;15(10):1451-5. -Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80. -Gewehr JE, Szugat M, Zimmer R. BioWeka--extending the Weka framework for bioinformatics. Bioinformatics. 2007 Mar 1;23(5):651-3. -Hartwell LH, Hopfield JJ, Leibler S, Murray AW. From molecular to modular cell biology. Nature. 1999 Dec 2;402(6761 Suppl):C47-52. -Holland RC, Down TA, Pocock M, Prlic A, Huen D, James K, Foisy S, Dräger A, Yates A, Heuer M, Schreiber MJ. BioJava: an open-source framework for bioinformatics. Bioinformatics. 2008 Sep 15;24(18):2096-7. -Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44-57. -Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W729-32. -Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet. 2006 Feb;7(2):119-29. -Kitano H. Computational systems biology. Nature. 2002 Nov 14;420(6912):206-10. -Lee JK, Williams PD, Cheon S. Data mining in genomics. Clin Lab Med. 2008 Mar;28(1):145-66. -Lee GW, Kim S. Genome data mining for everyone. BMB Rep. 2008 Nov 30;41(11):757-64. -Moore JH. Bioinformatics. J Cell Physiol. 2007 Nov;213(2):365-9. -Schattner P. Automated querying of genome databases. PLoS Comput Biol. 2007 Jan 26;3(1):e1. -Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehväslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002 Oct;12(10):1611-8. -Wilkinson MD, Links M. BioMOBY: an open source biological web services proposal. Brief Bioinform. 2002 Dec;3(4):331-41. Current Version: March 20, 2009

Related Documents

Data Mining Workshop
May 2020 10
Data Mining
May 2020 23
Data Mining
October 2019 35
Data Mining
November 2019 32
Data Mining
May 2020 21
Data Mining
May 2020 19

More Documents from ""

Exercises
April 2020 35
Psiquiatria-genetica
April 2020 8
Data Mining Workshop
May 2020 10
November 2019 24
April 2020 18