Author ORCID Identifier

Laura A. Katz: 0000-0002-9138-4702

Auden Cote-L'Heureux: 0000-0001-5793-7695

Document Type

Article

Publication Date

6-11-2025

Publication Title

mBio

Abstract

Eukaryotic diversity is largely microbial, with macroscopic lineages (plants, animals, and fungi) nesting among a plethora of diverse protists. Our understanding of the evolutionary relationships among eukaryotes is rapidly advancing through ’omics analyses, but phylogenomic analyses are challenging for microeukaryotes, particularly uncultivable lineages, as single-cell sequencing approaches generate a mixture of sequences from hosts, associated microbiomes, and contaminants. Moreover, many analyses of eukaryotic gene families and phylogenies rely on boutique data sets and methods that are challenging for other research groups to replicate. To address these challenges, we present EukPhylo v.1.0, a modular, user-friendly pipeline that enables effective data curation through phylogeny-informed contamination removal, estimation of homologous gene families (GFs), and generation of both multisequence alignments and gene trees. For the GF assignment, we provide the “Hook Database” of ~15,000 ancient GFs, which users can easily replace with a set of gene families of interest. We demonstrate the power of EukPhylo, including a suite of stand-alone utilities, through phylogenomic analyses of 500 conserved GFs sampled from 1,000 diverse species of eukaryotes, bacteria, and archaea. We show improvements in estimates of the eukaryotic tree of life, recovering clades that are well established in the literature, through successive rounds of curation using the EukPhylo contamination loop. The final trees corroborate numerous hypotheses in the literature (e.g., Opisthokonta, Rhizaria, Amoebozoa) while challenging others (e.g., CRuMs, Obazoa, Diaphoretickes). The flexibility and transparency of EukPhylo set new standards for curation of ’omics data for future studies.

Keywords

phylogenomic standards, single-cell transcriptomes, phylogenetics, protists, eukaryotic tree of life, microbiome, contamination

DOI

https://doi.org/10.1128/mbio.01770-25

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Rights

© 2025 Katz et al.

Version

Version of Record

Included in

Biology Commons

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.