Display options
Share it on

Algorithms Mol Biol. 2016 May 03;11:10. doi: 10.1186/s13015-016-0075-7. eCollection 2016.

Jabba: hybrid error correction for long sequencing reads.

Algorithms for molecular biology : AMB

Giles Miclotte, Mahdi Heydari, Piet Demeester, Stephane Rombauts, Yves Van de Peer, Pieter Audenaert, Jan Fostier

Affiliations

  1. Department of Information Technology, Ghent University - iMinds, Ghent, Belgium ; Bioinformatics Institute Ghent, Ghent, Belgium.
  2. Department of Plant Systems Biology, VIB, Ghent, Belgium ; Department of Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium ; Bioinformatics Institute Ghent, Ghent, Belgium.
  3. Department of Plant Systems Biology, VIB, Ghent, Belgium ; Department of Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium ; Bioinformatics Institute Ghent, Ghent, Belgium ; Department of Genetics, Genome Research Institute, University of Pretoria, Pretoria, South Africa.

PMID: 27148393 PMCID: PMC4855726 DOI: 10.1186/s13015-016-0075-7

Abstract

BACKGROUND: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned.

RESULTS: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented.

CONCLUSION: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.

Keywords: Error correction; Maximal exact matches; Sequence analysis; de Bruijn graph

References

  1. Genome Biol. 2010;11(11):R116 - PubMed
  2. Bioinformatics. 2014 Dec 15;30(24):3506-14 - PubMed
  3. Nat Methods. 2012 Mar 04;9(4):357-9 - PubMed
  4. Bioinformatics. 2014 Nov 1;30(21):3004-11 - PubMed
  5. Bioinformatics. 2013 Jan 1;29(1):119-21 - PubMed
  6. Nat Biotechnol. 2011 Nov 08;29(11):987-91 - PubMed
  7. Brief Bioinform. 2013 Jan;14(1):56-66 - PubMed
  8. Nat Biotechnol. 2015 Jun;33(6):623-30 - PubMed
  9. Bioinformatics. 2014 Oct;30(19):2723-32 - PubMed
  10. Genome Res. 2008 May;18(5):821-9 - PubMed
  11. Nat Biotechnol. 2012 Jul 01;30(7):693-700 - PubMed
  12. Genome Res. 2012 Mar;22(3):557-67 - PubMed
  13. PLoS One. 2012;7(10):e46679 - PubMed
  14. Bioinformatics. 2009 Sep 1;25(17):2157-63 - PubMed
  15. Bioinformatics. 2010 Mar 1;26(5):589-95 - PubMed
  16. BMC Bioinformatics. 2015 May 15;16:159 - PubMed
  17. Bioinformatics. 2011 Jun 1;27(11):1455-61 - PubMed
  18. Bioinformatics. 2011 Feb 1;27(3):295-302 - PubMed
  19. Bioinformatics. 2013 Mar 15;29(6):802-4 - PubMed
  20. BMC Bioinformatics. 2012 Sep 19;13:238 - PubMed
  21. Bioinformatics. 2012 Sep 15;28(18):i318-i324 - PubMed
  22. Bioinformatics. 2012 Feb 15;28(4):593-4 - PubMed
  23. BMC Genomics. 2014 Dec 13;15:1103 - PubMed
  24. Bioinformatics. 2015 Nov 1;31(21):3421-8 - PubMed

Publication Types