Display options
Share it on

J Proteome Res. 2021 Apr 02;20(4):1902-1910. doi: 10.1021/acs.jproteome.0c00919. Epub 2021 Feb 09.

Simplified and Unified Access to Cancer Proteogenomic Data.

Journal of proteome research

Caleb M Lindgren, David W Adams, Benjamin Kimball, Hannah Boekweg, Sadie Tayler, Samuel L Pugh, Samuel H Payne

Affiliations

  1. Biology Department, Brigham Young University, Provo, Utah 84602, United States.

PMID: 33560848 PMCID: PMC8022323 DOI: 10.1021/acs.jproteome.0c00919

Abstract

Comprehensive cancer data sets recently generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) offer great potential for advancing our understanding of how to combat cancer. These data sets include DNA, RNA, protein, and clinical characterization for tumor and normal samples from large cohorts of many different cancer types. The raw data are publicly available at various Cancer Research Data Commons. However, widespread reuse of these data sets is also facilitated by easy access to the processed quantitative data tables. We have created a data application programming interface (API) to distribute these processed tables, implemented as a Python package called cptac. We implement it such that users who prefer to work in R can easily use our package for data access and then transfer the data into R for analysis. Our package distributes the finalized processed CPTAC data sets in a consistent, up-to-date format. This consistency makes it easy to integrate the data with common graphing, statistical, and machine-learning packages for advanced analysis. Additionally, consistent formatting across all cancer types promotes the investigation of pan-cancer trends. The data API structure of directly streaming data within a programming environment enhances the reproducibility. Finally, with the accompanying tutorials, this package provides a novel resource for cancer research education. View the software documentation at https://paynelab.github.io/cptac/. View the GitHub repository at https://github.com/PayneLab/cptac.

Keywords: CPTAC; Python; R; cancer; data access; data dissemination; genomics; mass spectrometry; proteogenomics; proteomics; reproducibility

References

  1. N Engl J Med. 2016 Sep 22;375(12):1109-12 - PubMed
  2. Nat Biotechnol. 2014 Mar;32(3):223-6 - PubMed
  3. Cancer Cell. 2018 May 14;33(5):817-828.e7 - PubMed
  4. Cancer Cell. 2021 Mar 8;39(3):361-379.e16 - PubMed
  5. Mol Cell Proteomics. 2019 Sep;18(9):1893-1898 - PubMed
  6. EBioMedicine. 2019 Feb;40:305-317 - PubMed
  7. Cell. 2019 May 2;177(4):1035-1049.e19 - PubMed
  8. PLoS Comput Biol. 2019 Jul 25;15(7):e1007007 - PubMed
  9. Cell. 2019 Oct 31;179(4):964-983.e31 - PubMed
  10. Cell. 2020 Feb 20;180(4):729-748.e26 - PubMed
  11. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50 - PubMed
  12. Bioinformatics. 2018 May 1;34(9):1615-1617 - PubMed
  13. Nucleic Acids Res. 2019 Jan 8;47(D1):D442-D450 - PubMed
  14. Cell Rep Med. 2020 Apr 21;1(1): - PubMed
  15. Cell Syst. 2017 Oct 25;5(4):386-398.e4 - PubMed
  16. Metabolomics. 2019 Sep 14;15(10):125 - PubMed
  17. Cell. 2020 Jul 9;182(1):200-225.e35 - PubMed
  18. Cell Syst. 2017 Oct 25;5(4):399-409.e5 - PubMed
  19. Cell. 2020 Nov 25;183(5):1436-1456.e31 - PubMed

MeSH terms

Publication Types

Grant support