Display options
Share it on

Bioinformatics. 2021 Dec 14; doi: 10.1093/bioinformatics/btab839. Epub 2021 Dec 14.

swCAM: estimation of subtype-specific expressions in individual samples with unsupervised sample-wise deconvolution.

Bioinformatics (Oxford, England)

Lulu Chen, Chiung-Ting Wu, Chia-Hsiang Lin, Rujia Dai, Chunyu Liu, Robert Clarke, Guoqiang Yu, Jennifer E Van Eyk, David M Herrington, Yue Wang

Affiliations

  1. Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
  2. Department of Electrical Engineering, National Cheng Kung University, Tainan City, Taiwan 70101, ROC.
  3. Department of Psychiatry, SUNY Upstate Medical University, NY 13210, USA.
  4. The Hormel Institute, University of Minnesota, Austin, MN 55912.
  5. Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA.
  6. Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA.

PMID: 34904628 DOI: 10.1093/bioinformatics/btab839

Abstract

MOTIVATION: Complex biological tissues are often a heterogeneous mixture of several molecularly distinct cell subtypes. Both subtype compositions and subtype-specific expressions can vary across biological conditions. Computational deconvolution aims to dissect patterns of bulk tissue data into subtype compositions and subtype-specific expressions. Existing deconvolution methods can only estimate averaged subtype-specific expressions in a population, while many downstream analyses such as inferring co-expression networks in particular subtypes require subtype expression estimates in individual samples. However, individual-level deconvolution is a mathematically underdetermined problem because there are more variables than observations.

RESULTS: We report a sample-wise Convex Analysis of Mixtures (swCAM) method that can estimate subtype proportions and subtype-specific expressions in individual samples from bulk tissue transcriptomes. We extend our previous CAM framework to include a new term accounting for between-sample variations and formulate swCAM as a nuclear-norm and ℓ2,1-norm regularized matrix factorization problem. We determine hyperparameter values using cross-validation with random entry exclusion and obtain a swCAM solution using an efficient alternating direction method of multipliers. Experimental results on realistic simulation data show that swCAM can accurately estimate subtype-specific expressions in individual samples and successfully extract co-expression networks in particular subtypes that are otherwise unobtainable using bulk data. In two real-world applications, swCAM analysis of bulk RNASeq data from brain tissue of cases and controls with bipolar disorder or Alzheimer's disease identified significant changes in cell proportion, expression pattern and co-expression module in patient neurons. Comparative evaluation of swCAM versus peer methods is also provided.

AVAILABILITY: The R Scripts of swCAM are freely available at https://github.com/Lululuella/swCAM. A user's guide and a vignette are provided.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

© The Author(s) (2021). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].

Publication Types