Display options
Share it on

Cancer Inform. 2015 May 05;14:129-38. doi: 10.4137/CIN.S17284. eCollection 2015.

Prediction of early breast cancer metastasis from DNA microarray data using high-dimensional cox regression models.

Cancer informatics

Christophe Zemmour, François Bertucci, Pascal Finetti, Bernard Chetrit, Daniel Birnbaum, Thomas Filleron, Jean-Marie Boher

Affiliations

  1. Département de la Recherche Clinique et de l'Innovation, Unité de Biostatistique et de Méthodologie, Institut Paoli-Calmettes, Marseille, France.
  2. Département d'Oncologie Moléculaire, Institut Paoli-Calmettes, Centre de Recherche en Cancérologie de Marseille, INSERM, CNRS, Marseille, France. ; Département d'Oncologie Médicale, Institut Paoli-Calmettes, Centre de Recherche en Cancérologie de Marseille, INSERM, CNRS, Marseille, France.
  3. Département d'Oncologie Moléculaire, Institut Paoli-Calmettes, Centre de Recherche en Cancérologie de Marseille, INSERM, CNRS, Marseille, France.
  4. Centre de Recherche en Cancérologie de Marseille, INSERM, CNRS, Marseille, France.
  5. Bureau des Essais Cliniques, Cellule Biostatistique, Institut Claudius Regaud, Institut Universitaire du Cancer Toulouse Oncopôle, Toulouse, France.

PMID: 25983547 PMCID: PMC4426954 DOI: 10.4137/CIN.S17284

Abstract

BACKGROUND: DNA microarray studies identified gene expression signatures predictive of metastatic relapse in early breast cancer. Standard feature selection procedures applied to reduce the set of predictive genes did not take into account the correlation between genes. In this paper, we studied the performances of three high-dimensional regression methods - CoxBoost, LASSO (Least Absolute Shrinkage and Selection Operator), and Elastic net - to identify prognostic signatures in patients with early breast cancer.

METHODS: We analyzed three public retrospective datasets, including a total of 384 patients with axillary lymph node-negative breast cancer. The Amsterdam van't Veer's training set of 78 patients was used to determine the optimal gene sets and classifiers using sensitivity thresholds resulting in mis-classification of no more than 10% of the poor-prognosis group. To ensure the comparability between different methods, an automatic selection procedure was used to determine the number of genes included in each model. The van de Vijver's and Desmedt's datasets were used as validation sets to evaluate separately the prognostic performances of our classifiers. The results were compared to the original Amsterdam 70-gene classifier.

RESULTS: The automatic selection procedure reduced the number of predictive genes up to a minimum of six genes. In the two validation sets, the three models (Elastic net, LASSO, and CoxBoost) led to the definition of genomic classifiers predicting the 5-year metastatic status with similar performances, with respective 59, 56, and 54% accuracy, 83, 75, and 83% sensitivity, and 53, 52, and 48% specificity in the Desmedt's dataset. In comparison, the Amsterdam 70-gene signature showed 45% accuracy, 97% sensitivity, and 34% specificity. The gene overlap and the classification concordance between the three classifiers were high. All the classifiers added significant prognostic information to that provided by the traditional prognostic factors and showed a very high overlap with respect to gene ontologies (GOs) associated with genes overexpressed in the predicted poor-prognosis vs. good-prognosis classes and centred on cell proliferation. Interestingly, all classifiers reported high sensitivity to predict the 4-year status of metastatic disease.

CONCLUSIONS: High-dimensional regression methods are attractive in prognostic studies because finding a small subset of genes may facilitate the transfer to the clinic, and also because they strengthen the robustness of the model by limiting the selection of false-positive predictive genes. With only six genes, the CoxBoost classifier predicted the 4-year status of metastatic disease with 93% sensitivity. Selecting a few genes related to ontologies other than cell proliferation might further improve the overall sensitivity performance.

Keywords: boosting; breast cancer; cross-validation; genomics; metastasis

References

  1. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50 - PubMed
  2. J Clin Invest. 2011 Jul;121(7):2750-67 - PubMed
  3. J Stat Softw. 2010;33(1):1-22 - PubMed
  4. Clin Cancer Res. 2007 Jun 1;13(11):3207-14 - PubMed
  5. Bioinformatics. 2007 Aug 15;23(16):2080-7 - PubMed
  6. Biometrics. 2006 Dec;62(4):961-71 - PubMed
  7. Nature. 2002 Jan 31;415(6871):530-6 - PubMed
  8. OMICS. 2006 Winter;10(4):429-43 - PubMed
  9. Stat Med. 2014 Dec 30;33(30):5310-29 - PubMed
  10. Lancet. 2005 Feb 5-11;365(9458):488-92 - PubMed
  11. Front Genet. 2013 Dec 04;4:270 - PubMed
  12. Lancet. 2005 Feb 19-25;365(9460):671-9 - PubMed
  13. Stat Med. 1997 Feb 28;16(4):385-95 - PubMed
  14. Cancer Inform. 2012;11:29-39 - PubMed
  15. N Engl J Med. 2002 Dec 19;347(25):1999-2009 - PubMed
  16. Biostatistics. 2003 Apr;4(2):249-64 - PubMed

Publication Types