Genome Informatics Alliance

The Genome Informatics Alliance (GIA) was an interdisciplinary thought leadership conference started in 2009 to uncover near term challenges to the advancement of Next Generation Sequencing (NGS) as the technology was advancing in terms of quality and quantity (and was reducing in terms of cost). The thesis was that bringing together commercial and academic experts in genomics and in related technological disciplines would result in an acceleration in the application of NGS to precision medicine, life sciences research, and in any field in which some aspect of -omics data could extend understanding. The format was built around framing questions to guide short expert presentation each followed by active Q&A for the presenter.

The basic format of the two day meetings were to assemble leading experts - and competitors - to openly discuss pre-competitive areas of need for next generation sequencing. The organizing committee and later the organizer tried to look beyond the current technical capabilities to look beyond the horizon in anticipation of challenges that would be introduced as sequencing technology and its applications became more widespread.

The inaugural framing questions in 2009 were at a time when very few human whole genomes were available, and when the output of a sequencing run was significantly under 10 Gigabases of output. At the time a whole human DNA sequence required 30x representation or 100G of data. With this in mind, the initial questions were:

What data models are required on which to build a foundation?
How will genomes need to be compared?
How will collaboration and data sharing evolve?

All four GIA meetings were sponsored by Illumina, and were initially organized by committee.

The 2009 GIA meeting was held in Healdsburg, California at the Hotel Healdsburg on 17-18 March 2009; the organizing committee was comprised of David Dooling Elaine Mardis of Washington University in St. Louis (WUStL), and Scott Kahn of Illumina.

Introduction Meeting Overview Elaine Mardis WUStL

Why this is important?
How will the concepts of a common reference evolve?
How does this impact ‘omics and research beyond the genomics lab.

Session 1 Challenges at Genome Centers Session Chair: Scott Kahn

What are the challenges that the large genome centers are currently facing that the typical researcher will be facing soon?

Tony Cox Sanger Challenges of Production Sequencing
Toby Bloom Broad Challenges of Scale: Implications for Infrastructure, Integration and Standards

David Dooling WUStL Challenges with data quality, sharing and versioning

Session 2 Challenges with Users and Analysts Session Chair: Dirk Evers

How is analysis being changed by the increased availability of data from multiple genomes and rapid turnaround of experiments?

David Craig TGEN The reality of privacy under the commoditization of genomics

John McPherson OICR Lessons about to be learned from the ICGC – worldwide planning, challenges and perspectives

Session 3 Challenges from a High Data Volume Perspective Session Chair: David Dooling

What can we learn from groups that handle high volumes of analytical data?

Guy Cochrane EBI Scaling a sequence repository: upwards and outwards
David Anderson UC Berkeley Using volunteer computing for data-intensive applications

Miron Livny Open Science Grid Data Intensive Science in a Shared (inter)-national Environment - the OSG Experience

Session 4 Challenges with Data Management Session Chair: Elaine Mardis

How do the platforms manage the current data load and what will data management look like if the platform yield increases by 10X?
What are the data structures and QC measures that will be needed for effective data transport and integration?

Jordan Stockton Illumina
Tim Hunkapiller Life Technologies

Jim Knight Roche

Round Table Discussion: Framing Issues Moderator: Jacques Retief

What is preventing the reduction in data storage and processing?
What are the most critical missing applications?
Sequence analysis challenges, specific to different applications
Action items?

Session 5 Framing Solutions Session Chair: Jordan Stockton

What potential solutions already exist and what is on the horizon?
What are the obstacles to their adoption?

Olof Barring CERN High throughput data reduction and storage
Andrew Hogue Google Genomics at Web Scale
Deepak Singh Amazon Leveraging infrastructure as a Service: From data sets to elastic computing

Allen Brown Microsoft SaaS strategies and challenges and commitment to LS (HealthVault)

Session 6 Framing Data Structure Solutions Session Chair: Jacques Retief

Do current ways of storing data make sense?
What challenges remain unsolved?

Martin Shumway NCBI Small Views on Large Datasets
Vivian Bonazzi NHGRI The View from NHGRI

Satnam Alag NextBio Experiences with Semantic Mining of Biological Data

Session 7 Framing the Tool and Service Solutions Session Chair: Gavin Sherlock

How can software help to transform genomic data into biological insight?

Steven Brenner UC Berkeley Running with the Red Queen: RNA-Seq for understanding splicing regulation, surveillance, and disease
Jimmy Lin U Maryland Sequence Alignment in the Clouds: Experiences with MapReduce at the University of Maryland
Gaddy Getz Broad
Darren Platt Amyris Biotech Biofuel genomics: Delivering an annotated genome from raw data in 4 hours

Raul Rabadan Columbia University Forgotten Scripts, High Throughput Sequencing and Emerging Viruses

Round Table Discussion Moderators: Elaine and Scott

How do we break free from old perspectives on data?
Universal data structures for sequence transfer and storage
How can the data analysis pipeline be streamlined?
Action items?

The 2010 GIA meeting was held in Woodinville, Washington at the Willows Lodge on 6-7 May 2010; the organizing committee was comprised of David Dooling of WUStL and Scott Kahn of Illumina.

The 2011 GIA meeting was held in Verona, Italy at the Byblos Art Hotel - Villa Amista on 9-10 June 2011; the meeting was organized by Scott Kahn of Illumina.

Introduction Scott Kahn (Illumina) Genomics and the Importance of Annotation

Session 1 Evolution of the human reference genome Session Chair. Scott Kahn

Guy Cochrane (EBI) Reference-Aware Sequence Archive Services

Richa Agarwala (NCBI) Augmenting Transcriptional Knowledge w/ RNAseq

Session 2 Tools, methods and best practices for genomic annotation Session Chair. Jacques Retief

Richard Durbin (Sanger) Importance of Data Cleaning
Carsten Daub (Riken) Annotation Methods from FANTOM

Vasily Borisov (Kadme AS) Non-Genomic Experiences in Complex Data Integration

Session 3 State-of-art in database curation and aggregation Session Chair. Gary Schroth

Frank Schacherer (BioBase) Challenges and Progress in Data Curation
Chris Mason (Cornell Medical) Transcriptome Standards and Massive Transcriptome Expansion with the SEQC Consortium

Len Pennacchio (JGI) Annotation Challenges in Metagenomics

Session 4 Communication and visualization of genomic annotations Session Chair. Jordan Stockton

Steven Jones (BCCA) Annotation, Analysis, and Visualization of Cancer Diagnostics
Andreas Hildebrandt (Saarland) Visualization of High Complexity Data and Uncertainty

Andrew Cardno (bis2) Alternative Visualizations of High Dimensional Data

Session 5 Tools, methods and best practices for genomic annotation (continued) Session Chair. Jordan Stockton

Jordan Stockton (Illumina) Introduction
Jean-Marc Neefs (JnJ) NGS and Annotation – A Pharmaceutical Perspective
David Caldwell (Monsanto) Unique Challenges with Plant Annotation

Jennifer Wortman (Broad) Leveraging RNASeq Data for Genome Annotation

Session 6 State-of-art in database curation and aggregation (continued) Session Chair. John MacPherson

Shawn Dolley (Netezza) Challenges with aggregation of clinical data
Ilya Kupershmidt (NextBio) Translation of data into useful information

Doug Basset (Ingenuity) Visualization & Interaction with Genomic Annotation

Session 7 Communication and visualization of genomic annotations (continued) Session Chair. Dirk Evers

Tobias Rausch (EMBL) Communications of annotation results at EMBL
Robert Kincaid (Agilent) Genome Visualization – Progress and Challenges
Gary Schroth (Illumina) Annotations Across Many Biological Perspectives

P.-Jean Letourneau (Wolfram) Practical Experiences with Human Genome Annotation

The 2012 GIA meeting was held in Newberg, Oregon at the Allison Inn and Spa on 29-30 March 2012; the meeting was organized by Scott Kahn of Illumina.

Scott Kahn Illumina Welcome
David Bentley Illumina Meeting Overview

The One Million Genome Challenge Session Chair: Jordan Stockton

What are the challenges in trying to aggregate one million human whole genome sequences?
Are there collaboration mechanisms that would support this?
Are there database frameworks that are applicable?
Would this change the definition of "raw data"?
Are there analysis and visualization tools that are needed?
How would it advance science?

Leonard D'Avolio VA Boston Healthcare System Barriers on the Road to Personalized Medicine
Toby Bloom Broad Institute What Breaks Next: Can we really handle a million genomes?
Jonathan Sheldon Oracle Building a scalable translational research infrastructure to enable genotype-phenotype 'big data' analysis

Folker Meyer Argonne National Laboratory Infrastructure challenges for the metagenomics community from the Earth Microbiome Project (EMP)

Tools, methods and best practices for genomic annotation Session Chair: Semyon Kruglyak

How is context included in annotation to support use and re-use?
How will transitioning from single genomes to multi-sample studies impact tools?
What are the content-centric interdependencies that must be resolved?
How do/will social media methods impact this area?

David Haussler University of California, Santa Cruz Cancer Genomics
Frank Schacherer BioBase A million monkeys with typewriters -- tackling the challenge of large scale annotation

Deepak Singh Amazon Most of the smartest people work for someone else

Sequencing unplugged Session Chair: Scott Kahn

If sequencing technology was not bounded by current sequencer form factors and technological limitations, what would be the ultimate application?
What technologies exist in other marker segments that we can re-purpose?
What are interesting convergences that can be foreseen?

Jared Schwartz Aperio Imaging, Genomics, Proteomics, GKWN: How and Who in Medicine Will Integrate the Data into Actionable Information?
Andrew Girvin Palantir Technologies Palantir: Data Integration, Analysis and Secure Collaboration at Scale

Archana Ganapathi Splunk Splunk: A platform for analyzing machine generated data

Information communication challenges Session Chair: Jacques Retief

How are truly large amounts of data communicated in biology and outside of biology.
What will be needed to support research, clinical applications, applied technology applications (e.g., forensics), and ultimately personal use?

David Dooling Washington University in St Louis Know Your Audience

Robert Kincade Agilent Technologies Implications of a Post-PC Era

Clinical applications Session Chair: David Bentley

How can the world-wide accumulated genomic information be used to improve the diagnosis of an individual patient?
Will there ever be an exhaustive library of cancer mutations? How big a cohort will it take to build it?
What will be required to build a comprehensive set of rules/mechanisms that can describe the acquisition of genomic aberrations and disease progression in cancer?

Paul Aldridge Genomic Health Improving cancer treatment decisions, one patient at a time
John McPherson Ontario Institute for Cancer Research Lost in Translation
Don Rule Translational Software The Importance of Being Structured - Evolving clinical records to incorporate genomic data

Steven Brenner University of California, Berkeley Progress in Diagnosis and Challenges Remaining to Deliver the Promise - TBD

More than DNA Session Chair: Gary Schroth

What are the unique opportunities and challenges presented by metagenomics, epigenomics, RNA, and their integration.
Will we ever achieve wholeomics, a completely integrated view of the genome, epigenome and biology?

Len Pennacchio DOE Joint Genome Institute The Growth of Functional Genomics at the JGI
Ilya Kupershmidt NextBio Organizing and making sense of world's data - issues, challenges and solutions

Martin McIntosh Fred Hutchinson Cancer Research Center Challenges and opportunities for proteome profiling using RNA sequencing

Round Table Discussion Moderator Scott Kahn

Summary of meeting and discussion of where to explore next
Are there topics that would be fruitfully be brought into scope for GIA?
Are there next steps that we should take away from this meeting?

References

This article "Genome Informatics Alliance" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Genome Informatics Alliance. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.