Display options
Share it on

PLoS One. 2015 Feb 26;10(2):e0117988. doi: 10.1371/journal.pone.0117988. eCollection 2015.

DWFS: a wrapper feature selection tool based on a parallel genetic algorithm.

PloS one

Othman Soufan, Dimitrios Kleftogiannis, Panos Kalnis, Vladimir B Bajic

Affiliations

  1. King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal 23955-6900, Saudi Arabia.
  2. King Abdullah University of Science and Technology (KAUST), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal 23955-6900, Saudi Arabia.

PMID: 25719748 PMCID: PMC4342225 DOI: 10.1371/journal.pone.0117988

Abstract

Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem's dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filtering methods that may be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.

References

  1. Bioinformatics. 2012 Nov 1;28(21):2851-2 - PubMed
  2. Bioinformatics. 2012 Nov 1;28(21):2834-42 - PubMed
  3. Bioinformatics. 2013 Jan 1;29(1):117-8 - PubMed
  4. Brief Bioinform. 2000 Sep;1(3):214-28 - PubMed
  5. Cancer Cell. 2002 Mar;1(2):203-9 - PubMed
  6. Bioinformatics. 2004 Feb 12;20(3):374-80 - PubMed
  7. Science. 1999 Oct 15;286(5439):531-7 - PubMed
  8. IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1226-38 - PubMed
  9. Nat Med. 2007 Mar;13(3):361-6 - PubMed
  10. Bioinformatics. 2007 Oct 1;23(19):2507-17 - PubMed
  11. Bioinformatics. 2008 Jan 1;24(1):18-25 - PubMed
  12. Brief Bioinform. 2008 Mar;9(2):102-18 - PubMed
  13. Bioinformatics. 2009 Apr 15;25(8):989-95 - PubMed
  14. Nucleic Acids Res. 2009 Jul;37(Web Server issue):W652-60 - PubMed
  15. BMC Bioinformatics. 2009;10:358 - PubMed
  16. Brief Bioinform. 2010 Jan;11(1):127-41 - PubMed
  17. AMIA Annu Symp Proc. 2009;2009:406-10 - PubMed
  18. Comput Biol Chem. 2011 Jun;35(3):199-209 - PubMed
  19. PLoS One. 2011;6(7):e21887 - PubMed
  20. PLoS One. 2012;7(7):e40419 - PubMed

MeSH terms

Publication Types