Display options
Share it on

Stat Interface. 2009;2(2):153-159. doi: 10.4310/sii.2009.v2.n2.a5.

Weighted random subspace method for high dimensional data classification.

Statistics and its interface

Xiaoye Li, Hongyu Zhao

Affiliations

  1. Susquehanna International Group L.L.P., 401 City Avenue, Bala Cynwyd, PA 19004.

PMID: 21918713 PMCID: PMC3170928 DOI: 10.4310/sii.2009.v2.n2.a5

Abstract

High dimensional data, especially those emerging from genomics and proteomics studies, pose significant challenges to traditional classification algorithms because the performance of these algorithms may substantially deteriorate due to high dimensionality and existence of many noisy features in these data. To address these problems, pre-classification feature selection and aggregating algorithms have been proposed. However, most feature selection procedures either fail to consider potential interactions among the features or tend to over fit the data. The aggregating algorithms, e.g. the bagging predictor, the boosting algorithm, the random subspace method, and the Random Forests algorithm, are promising in handling high dimensional data. However, there is a lack of attention to optimal weight assignments to individual classifiers and this has prevented these algorithms from achieving better classification accuracy. In this article, we formulate the weight assignment problem and propose a heuristic optimization solution.We have applied the proposed weight assignment procedures to the random subspace method to develop a weighted random subspace method. Several public gene expression and mass spectrometry data sets at the Kent Ridge biomedical data repository have been analyzed by this novel method. We have found that significant improvement over the common equal weight assignment scheme may be achieved by our method.

References

  1. Bioinformatics. 2003 Sep 1;19(13):1636-43 - PubMed
  2. Nature. 2002 Jan 31;415(6871):530-6 - PubMed
  3. Physiol Genomics. 2001 Apr 27;5(4):161-70 - PubMed
  4. Lancet. 2002 Feb 16;359(9306):572-7 - PubMed
  5. Science. 1999 Oct 15;286(5439):531-7 - PubMed
  6. Cancer Res. 2002 Sep 1;62(17):4963-7 - PubMed

Publication Types

Grant support