Display options
Share it on

Bioinformatics. 2021 Oct 20; doi: 10.1093/bioinformatics/btab723. Epub 2021 Oct 20.

AncestralClust: Clustering of Divergent Nucleotide Sequences by Ancestral Sequence Reconstruction using Phylogenetic Trees.

Bioinformatics (Oxford, England)

Lenore Pipes, Rasmus Nielsen

Affiliations

  1. Department of Integrative Biology, University of California-Berkeley, Berkeley, 94707, USA.
  2. Department of Statistics, University of California-Berkeley, Berkeley, CA 94707, USA.
  3. Globe Institute, University of Copenhagen, 1350 København K, Denmark.

PMID: 34668516 DOI: 10.1093/bioinformatics/btab723

Abstract

MOTIVATION: Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences.

RESULTS: We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods.

AVAILABILITY AND IMPLEMENTATION: AncestralClust is an Open Source program available at https://github.com/lpipes/ancestralclust.

SUPPLEMENTARY INFORMATION: Supplementary figures and table are available online.

© The Author(s) (2021). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].

Publication Types