Display options
Share it on

F1000Res. 2015 Sep 25;4:900. doi: 10.12688/f1000research.6924.1. eCollection 2015.

The khmer software package: enabling efficient nucleotide sequence analysis.

F1000Research

Michael R Crusoe, Hussien F Alameldin, Sherine Awad, Elmar Boucher, Adam Caldwell, Reed Cartwright, Amanda Charbonneau, Bede Constantinides, Greg Edvenson, Scott Fay, Jacob Fenton, Thomas Fenzl, Jordan Fish, Leonor Garcia-Gutierrez, Phillip Garland, Jonathan Gluck, Iván González, Sarah Guermond, Jiarong Guo, Aditi Gupta, Joshua R Herr, Adina Howe, Alex Hyer, Andreas Härpfer, Luiz Irber, Rhys Kidd, David Lin, Justin Lippi, Tamer Mansour, Pamela McA'Nulty, Eric McDonald, Jessica Mizzi, Kevin D Murray, Joshua R Nahum, Kaben Nanlohy, Alexander Johan Nederbragt, Humberto Ortiz-Zuazaga, Jeramia Ory, Jason Pell, Charles Pepe-Ranney, Zachary N Russ, Erich Schwarz, Camille Scott, Josiah Seaman, Scott Sievert, Jared Simpson, Connor T Skennerton, James Spencer, Ramakrishnan Srinivasan, Daniel Standage, James A Stapleton, Susan R Steinman, Joe Stein, Benjamin Taylor, Will Trimble, Heather L Wiencko, Michael Wright, Brian Wyss, Qingpeng Zhang, En Zyme, C Titus Brown

Affiliations

  1. Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA.
  2. Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, USA.
  3. Population Health and Reproduction, University of California, Davis, Davis, CA, USA.
  4. Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, USA.
  5. Biology Department, San Jose State University, San Jose, CA, USA.
  6. School of Life Sciences and The Biodesign Institute, Arizona State University, Tempe, AZ, USA.
  7. Genetics, Michigan State University, East Lansing, MI, USA.
  8. Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK.
  9. Micron Technology, Seattle, WA, USA.
  10. Invitae, San Francisco, CA, USA.
  11. Computer Science and Engineering, Michigan State University, East Lansing, MI, USA.
  12. Independent Researcher, Munich, Germany.
  13. Mathematics Institute, University of Warwick, Warwick, UK.
  14. Eastlake Data, Seattle, WA, USA.
  15. Graduate Program, University of Maryland, College Park, MD, USA.
  16. Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA, USA.
  17. Independent Researcher, Seattle, WA, USA.
  18. Center for Microbial Ecology, Michigan State University, East Lansing, MI, USA.
  19. Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, IA, USA.
  20. Department of Biology, University of Utah, Salt Lake City, UT, USA.
  21. ConSol Software GmbH, Munchen, Germany.
  22. Independent Researcher, Sydney, Australia.
  23. Verdematics, Fremont, CA, USA.
  24. Independent Researcher, San Francisco, CA, USA.
  25. Population Health and Reproduction, University of California, Davis, Davis, CA, USA ; Clinical Pathology, Mansoura University, Mansoura, Egypt.
  26. Addgene, Cambridge, MA, USA.
  27. Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA.
  28. ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, ACT, Australia.
  29. BEACON Center, Michigan State University, East Lansing, MI, USA.
  30. Independent Researcher, New Orleans, LA, USA.
  31. Centre for Ecological and Evolutionary Synthesis, Dept. of Biosciences, University of Oslo, Oslo, Norway.
  32. Department of Computer Science, Rio Piedras Campus, University of Puerto Rico, San Juan, Puerto Rico.
  33. Biochemistry, St. Louis College of Pharmacy, St. Louis, MO, USA.
  34. Crop and Soil Sciences, Cornell University, Ithaca, NY, USA.
  35. Department of Bioengineering, UC Berkeley, Berkeley, CA, USA.
  36. Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA.
  37. Data Visualization, Newline Technical Innovations, Windsor, CO, USA.
  38. Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA.
  39. Ontario Institute for Cancer Research, Toronto, ON, Canada ; Computer Science, University of Toronto, Toronto, ON, Canada.
  40. Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, CA, USA.
  41. Dept of Physics and Dept of Materials, Imperial College London, London, UK.
  42. Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
  43. Department of Biology, Indiana University, Bloomington, IN, USA ; Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA, USA.
  44. Chemical Engineering & Materials Science, Michigan State University, East Lansing, MIS, USA.
  45. The New York Eye and Ear Infirmary of Mount Sinai, New York, NY, USA.
  46. Independent Researcher, Providence, RI, USA.
  47. Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA.
  48. Department of Genetics, Smurfit Institute, Trinity College Dublin, Dublin, Ireland.
  49. Independent Researcher, Boston, MA, USA.
  50. Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA ; Population Health and Reproduction, University of California, Davis, Davis, CA, USA ; Computer Science and Engineering, Michigan State University, East Lansing, MI, USA.

PMID: 26535114 PMCID: PMC4608353 DOI: 10.12688/f1000research.6924.1

Abstract

The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at  https://github.com/dib-lab/khmer/.

Keywords: bioinformatics; dna sequencing analysis; k-mer; khmer; kmer; low-memory; online; streaming

References

  1. Genome Res. 2008 May;18(5):821-9 - PubMed
  2. Nat Protoc. 2013 Aug;8(8):1494-512 - PubMed
  3. BMC Bioinformatics. 2008 Jan 09;9:11 - PubMed
  4. J Open Res Softw. 2016;4(1): - PubMed
  5. PLoS One. 2014 Jul 25;9(7):e101271 - PubMed
  6. Proc Natl Acad Sci U S A. 2014 Apr 1;111(13):4904-9 - PubMed
  7. J Comput Biol. 2012 May;19(5):455-77 - PubMed
  8. J Open Res Softw. 2016;4(1): - PubMed
  9. Proc Natl Acad Sci U S A. 2012 Aug 14;109(33):13272-7 - PubMed

Publication Types

Grant support