Display options
Share it on

BMC Med Inform Decis Mak. 2015;15:S6. doi: 10.1186/1472-6947-15-S1-S6. Epub 2015 May 20.

Identification of genomic features in the classification of loss- and gain-of-function mutation.

BMC medical informatics and decision making

Seunghwan Jung, Sejoon Lee, Sangwoo Kim, Hojung Nam

PMID: 26043747 PMCID: PMC4460711 DOI: 10.1186/1472-6947-15-S1-S6

Abstract

BACKGROUND: Alterations of a genome can lead to changes in protein functions. Through these genetic mutations, a protein can lose its native function (loss-of-function, LoF), or it can confer a new function (gain-of-function, GoF). However, when a mutation occurs, it is difficult to determine whether it will result in a LoF or a GoF. Therefore, in this paper, we propose a study that analyzes the genomic features of LoF and GoF instances to find features that can be used to classify LoF and GoF mutations.

METHODS: In order to collect experimentally verified LoF and GoF mutational information, we obtained 816 LoF mutations and 474 GoF mutations from a literature text-mining process. Next, with data-preprocessing steps, 258 LoF and 129 GoF mutations remained for a further analysis. We analyzed the properties of these LoF and GoF mutations. Among the properties, we selected features which show different tendencies between the two groups and implemented classifications using support vector machine, random forest, and linear logistic regression methods to confirm whether or not these features can identify LoF and GoF mutations.

RESULTS: We analyzed the properties of the LoF and GoF mutations and identified six features which have discriminative power between LoF and GoF conditions: the reference allele, the substituted allele, mutation type, mutation impact, subcellular location, and protein domain. When using the six selected features with the random forest, support vector machine, and linear logistic regression classifiers, the result showed accuracy levels of 72.23%, 71.28%, and 70.19%, respectively.

CONCLUSIONS: We analyzed LoF and GoF mutations and selected several properties which were different between the two classes. By implementing classifications with the selected features, it is demonstrated that the selected features have good discriminative power.

References

  1. Hum Mol Genet. 2000 Aug 12;9(13):2001-8 - PubMed
  2. Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30 - PubMed
  3. Hum Mol Genet. 2004 Jan 1;13(1):69-78 - PubMed
  4. Nucleic Acids Res. 2007 Jan;35(Database issue):D26-31 - PubMed
  5. Head Neck. 2007 May;29(5):488-96 - PubMed
  6. Bioinformatics. 2009 Feb 15;25(4):451-7 - PubMed
  7. Genome Res. 2009 Jul;19(7):1316-23 - PubMed
  8. PLoS One. 2009;4(12):e8311 - PubMed
  9. Trends Endocrinol Metab. 2010 Jun;21(6):385-93 - PubMed
  10. Cancer Cell. 2011 Jan 18;19(1):17-30 - PubMed
  11. Nucleic Acids Res. 2011 Sep 1;39(17):e118 - PubMed
  12. Science. 2012 Feb 17;335(6070):823-8 - PubMed
  13. Database (Oxford). 2012;2012:bas008 - PubMed
  14. Nature. 2012 Mar 22;483(7390):474-8 - PubMed
  15. Genes Dev. 2012 Jun 15;26(12):1326-38 - PubMed
  16. J Biol Chem. 2012 Sep 28;287(40):33745-55 - PubMed
  17. Bioinformatics. 2013 Jun 1;29(11):1433-9 - PubMed
  18. Nature. 2013 Oct 17;502(7471):333-9 - PubMed
  19. Nucleic Acids Res. 2014 Jan;42(Database issue):D865-72 - PubMed
  20. Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8 - PubMed
  21. Nat Genet. 2003 Apr;33(4):463-5 - PubMed

MeSH terms

Publication Types