Authors

Arang Rhie, National Human Genome Research Institute (NHGRI)
Shane A. McCarthy, University of Cambridge
Olivier Fedrigo, Rockefeller University
Joana Damas, University of California, Davis
Giulio Formenti, Rockefeller University
Sergey Koren, National Human Genome Research Institute (NHGRI)
Marcela Uliano-Silva, Leibniz-Institut für Zoo- und Wildtierforschung
William Chow, Wellcome Sanger Institute
Arkarachai Fungtammasan, DNAnexus
Juwan Kim, Seoul National University
Chul Lee, Seoul National University
Byung June Ko, Department of Food and Animal Biotechnology
Mark Chaisson, University of Southern California
Gregory L. Gedman, Rockefeller University
Lindsey J. Cantin, Rockefeller University
Francoise Thibaud-Nissen, National Center for Biotechnology Information (NCBI)
Leanne Haggerty, EMBL’s European Bioinformatics Institute
Iliana Bista, University of Cambridge
Michelle Smith, Wellcome Sanger Institute
Bettina Haase, Rockefeller University
Jacquelyn Mountcastle, Rockefeller University
Sylke Winkler, Max Planck Institute of Molecular Cell Biology and Genetics
Sadye Paez, Rockefeller University
Jason Howard, Novogene Co., Ltd.
Sonja C. Vernes, Max Planck Institute for Psycholinguistics
Tanya M. Lama, University of Massachusetts AmherstFollow
Frank Grutzner, The University of Adelaide
Wesley C. Warren, University of Missouri
Christopher N. Balakrishnan, East Carolina University
Dave Burt, The University of Queensland
Julia M. George, Clemson University
Matthew T. Biegler, Rockefeller University

Document Type

Article

Publication Date

4-29-2021

Publication Title

Nature

Abstract

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

Volume

592

Issue

7856

First Page

737

Last Page

746

DOI

10.1038/s41586-021-03451-0

ISSN

00280836

Comments

Archived as published. Open Access Article

Plum Print visual indicator of research metrics
PlumX Metrics
  • Citations
    • Citation Indexes: 983
    • Policy Citations: 6
  • Usage
    • Abstract Views: 1
    • Downloads: 1
  • Captures
    • Readers: 945
  • Mentions
    • Blog Mentions: 5
    • News Mentions: 27
    • References: 3
  • Social Media
    • Shares, Likes & Comments: 643
see details

Included in

Biology Commons

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.