Display options
Share it on

Front Plant Sci. 2016 Jan 07;6:1171. doi: 10.3389/fpls.2015.01171. eCollection 2015.

CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences.

Frontiers in plant science

En-Hua Xia, Qiu-Yang Yao, Hai-Bin Zhang, Jian-Jun Jiang, Li-Ping Zhang, Li-Zhi Gao

Affiliations

  1. Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of SciencesKunming, China; University of Chinese Academy of SciencesBeijing, China.
  2. Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences Kunming, China.

PMID: 26779212 PMCID: PMC4703815 DOI: 10.3389/fpls.2015.01171

Abstract

Simple sequence repeats (SSRs), also known as microsatellites, are ubiquitous short tandem duplications commonly found in genomes and/or transcriptomes of diverse organisms. They represent one of the most powerful molecular markers for genetic analysis and breeding programs because of their high mutation rate and neutral evolution. However, traditionally experimental screening of the SSR polymorphic status and their subsequent applicability to genetic studies are extremely labor-intensive and time-consuming. Thankfully, the recently decreased costs of next generation sequencing and increasing availability of large genome and/or transcriptome sequences have provided an excellent opportunity and sources for large-scale mining this type of molecular markers. However, current tools are limited. Thus we here developed a new pipeline, CandiSSR, to identify candidate polymorphic SSRs (PolySSRs) based on the multiple assembled sequences. The pipeline allows users to identify putative PolySSRs not only from the transcriptome datasets but also from multiple assembled genome sequences. In addition, two confidence metrics including standard deviation and missing rate of the SSR repetitions are provided to systematically assess the feasibility of the detected PolySSRs for subsequent application to genetic characterization. Meanwhile, primer pairs for each identified PolySSR are also automatically designed and further evaluated by the global sequence similarities of the primer-binding region, ensuring the successful rate of the marker development. Screening rice genomes with CandiSSR and subsequent experimental validation showed an accuracy rate of over 90%. Besides, the application of CandiSSR has successfully identified a large number of PolySSRs in the Arabidopsis genomes and Camellia transcriptomes. CandiSSR and the PolySSR marker sources are publicly available at: http://www.plantkingdomgdb.com/CandiSSR/index.html.

Keywords: CandiSSR; microsatellites; multiple assembled genomes; multiple assembled transcriptomes; polymorphic SSR; transferability

References

  1. Nature. 2011 Aug 28;477(7365):419-23 - PubMed
  2. Curr Protoc Bioinformatics. 2002 Aug;Chapter 2:Unit 2.3 - PubMed
  3. Mol Ecol Resour. 2013 May;13(3):538-45 - PubMed
  4. Plant Mol Biol. 2005 Sep;59(1):1-6 - PubMed
  5. Am J Bot. 2012 May;99(5):e203-5 - PubMed
  6. BMC Genomics. 2015 Apr 15;16:298 - PubMed
  7. BMC Bioinformatics. 2012 Jun 18;13:134 - PubMed
  8. Mol Ecol. 2002 Dec;11(12):2453-65 - PubMed
  9. Theor Appl Genet. 2002 Sep;105(4):577-584 - PubMed
  10. Bioinformatics. 2007 May 15;23(10):1289-91 - PubMed
  11. Science. 2002 Apr 5;296(5565):92-100 - PubMed
  12. Mol Ecol Resour. 2014 Jan;14(1):69-78 - PubMed
  13. Philos Trans R Soc Lond B Biol Sci. 2008 Feb 12;363(1491):557-72 - PubMed
  14. Mol Genet Genomics. 2003 Dec;270(4):315-23 - PubMed
  15. Theor Appl Genet. 2004 Jan;108(2):280-91 - PubMed
  16. BMC Genomics. 2011 May 25;12:265 - PubMed
  17. Nat Biotechnol. 2011 May 15;29(7):644-52 - PubMed
  18. Nucleic Acids Res. 1984 May 25;12(10):4127-38 - PubMed
  19. Am J Bot. 2010 Dec;97(12):e153-6 - PubMed
  20. Trends Genet. 1997 Feb;13(2):74-8 - PubMed
  21. BMC Res Notes. 2010 Feb 24;3:42 - PubMed
  22. Plant Physiol. 2004 Jul;135(3):1198-205 - PubMed
  23. Theor Appl Genet. 2003 Mar;106(5):819-25 - PubMed
  24. BMC Genomics. 2011 Feb 28;12:131 - PubMed
  25. Int J Plant Genomics. 2008;2008:412696 - PubMed
  26. PLoS One. 2014 Aug 19;9(8):e104150 - PubMed
  27. Bioinformatics. 2004 Jun 12;20(9):1475-6 - PubMed
  28. Nucleic Acids Res. 2012 Aug;40(15):e115 - PubMed
  29. BMC Bioinformatics. 2008 Sep 15;9:374 - PubMed
  30. Nature. 2000 Dec 14;408(6814):796-815 - PubMed
  31. Plant Mol Biol. 1997 Sep;35(1-2):25-34 - PubMed
  32. Theor Appl Genet. 2003 Feb;106(3):411-22 - PubMed
  33. Genome Res. 1999 Sep;9(9):868-77 - PubMed
  34. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402 - PubMed
  35. DNA Res. 2007 Feb 28;14(1):37-45 - PubMed
  36. Proc Natl Acad Sci U S A. 2014 Nov 18;111(46):E4954-62 - PubMed
  37. Bioinformatics. 2004 May 1;20(7):1081-6 - PubMed

Publication Types