Workflow

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Workflow as PDF for free.

More details

  • Words: 451
  • Pages: 2
For project X - We get reads/assemblies (contigs/genome scaffolds) from genbank/Trace CDS and Proteins and ncRNAs are predicted from all these datasets. For download from the publication page - we ONLY provide these Genbank-derived datasets AND user-provided annotated data (if any), and other custom datasets provided by the authors For BLAST db, we produce • • • • • • • •

(1) ProjectX: reads (2) Project X: CDS from reads (3) ProjectX : predicted Proteins from reads (4) Project X: Assemblies (5) Project X: CDS from Assemblies (6) ProjectX: Predicted proteins from Assemblies (7) Project X: ncRNAs We UPDATE : All metagenomics reads, All metagenomic CDS, ALL metagenomic predicted proteins, and ALL metagenomic ncRNAs (NOTE here: I'm assuming that this new annotation pipeline is doing something more sophisticated and generating CDS and Predicted proteins as opposed to the 6-frame translation ORFs and peptides we have now for GOS and HOT, if yes, we'll have to run the GOS and HOT data through this pipeline as well and replace those datasets with CDS and proteins)

Annotation •



• • •

All reads available (either via Genbank or the Trace Archive) that are longer than 250bp on avg should be annotated - How well does the pipeline work for such short seqs? How well does it work for Sanger seqs for that matter? What is the relative utility ? What to do with ESTs? GSS? If annotation is available in Genbank, it should be retrieved and discussed - not available, one exception is Leptospirillum assemly from AMD - 8 genome scaffold sequences with predicted proteins have been deposited as a separate project from the metagenomic projects. If an environmental dataset has scaffolds for organisms deposited in Genbank, should we treat it like an organism? LIke an environmental set? As an environmnental set I think. Depends on whether this data makes it into Genbank NR (and hence CAMERA nIAA) or not. Can we annotate contigs/scaffolds via our metagenomic annotation pipeline? Or should we be using the prok pipeline? Which predicted proteins should be included in clusters?

Blastable Datasets • • •

reads should be added to All Metagenomic reads, same for ORFs, Peptides, ncRNAs Contigs and assemblies should be added to "All Metagenomic Assemblies" , no such db presently since GOS is the only one with assemblies, when/if we DO provide this, it should only contain "site-specific" assemblies. If available, mapping between reads and contigs should be absorbed

New datsets:

Only one new (CAMERA-relevant-maybe) project is available for update: Termite gut metagenome - No traces deposited , data is 1337 fosmid clone seqs, and 1 WGS entry (55,108 contigs) and 48 glycoside hydrolase family genes. Contacted JGI about

the traces. http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Ove rview&list_uids=19107?

Related Documents

Workflow
October 2019 32
Workflow
November 2019 32
Workflow System
November 2019 24
Workflow Links
November 2019 21
Workflow Pdf
April 2020 5
Workflow Forms
May 2020 4