Research Consortium Announces Significant Progress to Close Gaps and Uncover Novel Genes in the Human Reference Genome Sequence
High quality long-read sequencing technology is providing new insights into previously unsequenced regions of the Human Reference Genome
(PresseBox) (Penzberg, Germany and Marco Island Florida, USA, )Roche (SIX: RO, ROG; OTCQX: RHHBY) announced today that a consortium consisting of researchers from Penn State University, the National Center for Biotechnology Information, Children's Hospital Oakland Research Institute and Roche 454 Life Sciences is working on a new comprehensive de novo assembly of a human genome to augment and supplement the current human reference genome sequence. The team has presented the latest results today at the Advances in Genome Biology and Technology (AGBT) congress in Marco Island, Florida.
Under the leadership of Stephan Schuster, Ph.D., Professor at Penn State University, the consortium is analyzing and assembling the RP11 human reference genome as part of new efforts to close gaps in the human reference assembly using Roche's 454 GS FLX+ Sequencer. To date, the draft assembly covers a significant number of the remaining human reference sequence gaps and has revealed 36 million bases of novel sequence, including novel genes with potential biological relevance.
"We are very proud to have been able to contribute to a project of such importance and potential impact on future genomic research with our unique long-read sequencing technology," said Dan Zabrowski, Head of Roche Applied Science. "This project also shows the power of combining different innovative sequencing analysis and assembly technologies."
This new de novo assembly is quickly becoming the most complete available of the Human Reference Genome using next-generation sequencing technology. The size and contiguity of the new assembly matches that of previous Sanger-based assemblies, including the J. Craig Venter genome (HuRef) published in 2007. In total, the latest draft assembly fully spans 76 remaining gaps and extends into 13 additional repeat regions, as well as revealing a total of 36 million bases of novel genomic sequences.
"I am pleased with the overall progress of the project and the high quality of the assembly even at this early stage," said Stephan Schuster. "The 454 Sequencing technology has proven to sequence entire human genomes with even coverage and the long reads enable Sanger-like sequencing of reference genomes."
The current draft assembly was generated using a hybrid of 18X 454 GS FLX+ long read sequence data and 7.5X short read sequence data from the Illumina MiSeq and HiSeq platforms. De novo genome assembly was performed using the Roche 454 GS De Novo Assembler software (Newbler). Significant ongoing efforts to add additional sequence data and apply different bioinformatics strategies are expected to further improve the contiguity of the assembly and quality of results, which will be made publically available to the research community.
On the RP11 genome and the Human Genome Reference Sequence
For the Human Genome Project (HGP) published in 2001, researchers collected DNA samples from a large number of male and female donors. Only a few of the many collected samples were processed as DNA resources and donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. Due to quality considerations, most of the sequence (~70%) of the Human Genome Reference Sequence (currently GRCh37) initially produced by the HGP and now managed by the Genome Reference Consortium came from a single anonymous male donor from Buffalo, New York by shotgun sequencing clones from the "RP11" BAC library. Samples from this original DNA were used in this latest assembly project and were provided in collaboration with Pieter de Jong, Ph.D., Children's Hospital Oakland Research Institute.
Roche Diagnostics Deutschland GmbH Sandhofer Str. 116 D-68305Mannheim