Display options
Share it on

Open Med Inform J. 2010;4:63-73. doi: 10.2174/1874431101004010063. Epub 2010 May 28.

Association rule based similarity measures for the clustering of gene expression data.

The open medical informatics journal

Prerna Sethi, Sathya Alagiriswamy

Affiliations

  1. Department of Health Informatics and Information Management and Biological Sciences, Ruston, USA. [email protected]

PMID: 21603179 PMCID: PMC3096052 DOI: 10.2174/1874431101004010063

Abstract

In life threatening diseases, such as cancer, where the effective diagnosis includes annotation, early detection, distinction, and prediction, data mining and statistical approaches offer the promise for precise, accurate, and functionally robust analysis of gene expression data. The computational extraction of derived patterns from microarray gene expression is a non-trivial task that involves sophisticated algorithm design and analysis for specific domain discovery. In this paper, we have proposed a formal approach for feature extraction by first applying feature selection heuristics based on the statistical impurity measures, the Gini Index, Max Minority, and the Twoing Rule and obtaining the top 100-400 genes. We then analyze the associative dependencies between the genes and assign weights to the genes based on their degree of participation in the rules. Consequently, we present a weighted Jaccard and vector cosine similarity measure to compute the similarity between the discovered rules. Finally, we group the rules by applying hierarchical clustering. To demonstrate the usability and efficiency of the concept of our technique, we applied it to three publicly available, multiclass cancer gene expression datasets and performed a biomedical literature search to support the effectiveness of our results.

Keywords: Microarray gene expression; association rules; clustering.; similarity measure

References

  1. J Comput Biol. 1999 Fall-Winter;6(3-4):281-97 - PubMed
  2. Nat Med. 2001 Jun;7(6):673-9 - PubMed
  3. Genome Biol. 2007;8(1):R3 - PubMed
  4. Genome Biol. 2003;4(5):P3 - PubMed
  5. Proc Natl Acad Sci U S A. 1999 Mar 16;96(6):2907-12 - PubMed
  6. BMC Bioinformatics. 2006 Feb 07;7:54 - PubMed
  7. Cell Growth Differ. 1990 Jul;1(7):325-31 - PubMed
  8. Int J Cancer. 2004 Jul 10;110(5):687-94 - PubMed
  9. IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):40-53 - PubMed
  10. Bioinformatics. 2005 Sep 1;21 Suppl 2:ii123-9 - PubMed
  11. Genome Biol. 2009;10(4):R39 - PubMed
  12. BMC Genomics. 2007 Oct 23;8:385 - PubMed
  13. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D354-7 - PubMed
  14. Bioinformatics. 2003 Jan;19(1):79-86 - PubMed
  15. Cancer Cell. 2002 Mar;1(2):133-43 - PubMed
  16. Proc Natl Acad Sci U S A. 2008 Sep 16;105(37):14076-81 - PubMed
  17. Clin Cancer Res. 2005 Oct 15;11(20):7209-19 - PubMed
  18. Stem Cells. 2005 Sep;23(8):1180-91 - PubMed
  19. Nat Med. 2002 Aug;8(8):816-24 - PubMed
  20. Nucleic Acids Res. 2009 Jul;37(Web Server issue):W317-22 - PubMed
  21. Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8 - PubMed
  22. Bioinformatics. 2003 Aug 12;19(12):1578-9 - PubMed
  23. Genome Biol. 2002;3(12):RESEARCH0067 - PubMed
  24. EURASIP J Bioinform Syst Biol. 2007;:64628 - PubMed
  25. Nat Genet. 1999 Jul;22(3):281-5 - PubMed
  26. Mol Cell. 1998 Jul;2(1):65-73 - PubMed
  27. Nat Genet. 2002 Jan;30(1):41-7 - PubMed

Publication Types

Grant support