Display options
Share it on

J Proteome Res. 2015 Mar 06;14(3):1350-60. doi: 10.1021/pr500850u. Epub 2015 Feb 05.

Prediction of a missing protein expression map in the context of the human proteome project.

Journal of proteome research

Elizabeth Guruceaga, Manuel M Sanchez del Pino, Fernando J Corrales, Victor Segura

Affiliations

  1. Proteomics, Genomics and Bioinformatics Unit, §Division of Hepatology and Gene Therapy, Center for Applied Medical Research, University of Navarra , Pamplona 31008, Spain.

PMID: 25612097 DOI: 10.1021/pr500850u

Abstract

Experimental evidence for the entire human proteome has been defined in the Human Proteome Project, and it is publicly available in the neXtProt database. However, there are still human proteins for which reliable experimental evidence does not exist, and the identification of such information has become one of the overriding objectives in the chromosome-centric study of the human proteome. With this aim and considering the complexity of protein detection using shotgun and targeted proteomics, the research community has addressed the integration of transcriptomics and proteomics landscapes. Here, we describe an analytical pipeline that predicts the probability of a missing protein being expressed in a biological sample based on (1) gene sequence characteristics, (2) the probability of an expressed gene being a coding gene of a missing protein in a certain sample, and (3) the probability of a gene being expressed in a transcriptomic experiment. More than 3400 microarray experiments were analyzed corresponding to three biological sources: cell lines, normal tissues, and cancer samples. A gene classification based on gene expression profiles distinguished among ubiquitous, nonubiquitous, nonexpressed, and coding genes of missing proteins. In addition, a different tissue-specific expression pattern for the coding genes of missing proteins is reported. Our results underline the relevance of selecting an appropriate sample for the detection of missing proteins and provide a comprehensive method to score their expression probability. Testis, brain, and skeletal muscle are the most promising normal tissues.

Keywords: C-HPP; missing proteins; naive Bayes classifier; protein expression profiles; transcriptome profiling

Substances

MeSH terms

Publication Types