Display options
Share it on

Biol Direct. 2006 Sep 07;1:27. doi: 10.1186/1745-6150-1-27.

Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution.

Biology direct

Jaroslav P Novak, Seon-Young Kim, Jun Xu, Olga Modlich, David J Volsky, David Honys, Joan L Slonczewski, Douglas A Bell, Fred R Blattner, Eduardo Blumwald, Marjan Boerma, Manuel Cosio, Zoran Gatalica, Marian Hajduch, Juan Hidalgo, Roderick R McInnes, Merrill C Miller, Milena Penkowa, Michael S Rolph, Jordan Sottosanto, Rene St-Arnaud, Michael J Szego, David Twell, Charles Wang

Affiliations

  1. McGill University and Genome Québec Innovation Centre, 740 Docteur Penfield Avenue, Montreal, Québec, H3A 1A4, Canada. [email protected]

PMID: 16959036 PMCID: PMC1586001 DOI: 10.1186/1745-6150-1-27

Abstract

BACKGROUND: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data.

RESULTS: Here we examine the expression data obtained from 682 Affymetrix GeneChips with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution.

CONCLUSION: In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the K(alpha) coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the K(alpha) distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance.

REVIEWERS: This article was reviewed by Yoav Gilad (nominated by Doron Lancet), Sach Mukherjee (nominated by Sandrine Dudoit) and Amir Niknejad and Shmuel Friedland (nominated by Neil Smalheiser).

References

  1. Math Biosci. 2002 Mar;176(1):35-51 - PubMed
  2. BMC Bioinformatics. 2004;5:203 - PubMed
  3. Bioinformatics. 2004 Apr 12;20(6):839-46 - PubMed
  4. Clin Cancer Res. 2004 May 15;10(10):3410-21 - PubMed
  5. Nutrition. 2004 Jan;20(1):109-14 - PubMed
  6. Bioinformatics. 2004 Jun 12;20(9):1436-46 - PubMed
  7. J Comput Biol. 2001;8(6):557-69 - PubMed
  8. J Neurochem. 2005 Jan;92(2):417-32 - PubMed
  9. Nat Rev Genet. 2006 Jan;7(1):55-65 - PubMed
  10. Nat Genet. 2004 Feb;36(2):197-204 - PubMed
  11. J Bacteriol. 2005 Jan;187(1):304-19 - PubMed
  12. Comp Funct Genomics. 2005;6(3):116-22 - PubMed
  13. BMC Bioinformatics. 2004 Oct 26;5:165 - PubMed
  14. Genome Biol. 2005;6(2):R16 - PubMed
  15. Genome Biol. 2001;2(8):RESEARCH0032 - PubMed
  16. J Cell Biol. 2001 Sep 17;154(6):1161-71 - PubMed
  17. Bioinformatics. 2003 Jan 22;19(2):185-93 - PubMed
  18. Bioinformatics. 2003 Nov 22;19(17):2254-62 - PubMed
  19. J Neuroimmunol. 2004 Dec;157(1-2):17-26 - PubMed
  20. BMC Bioinformatics. 2004 Oct 25;5:159 - PubMed
  21. J Gerontol A Biol Sci Med Sci. 2004 Apr;59(4):306-15 - PubMed
  22. Bioinformatics. 2002;18 Suppl 1:S105-10 - PubMed
  23. Proc Natl Acad Sci U S A. 2001 Jan 2;98(1):31-6 - PubMed
  24. Environ Health Perspect. 2004 Mar;112(4):449-55 - PubMed
  25. Biol Direct. 2006;1:18 - PubMed
  26. Annu Rev Neurosci. 2003;26:657-700 - PubMed
  27. Plant J. 2005 Mar;41(5):697-709 - PubMed
  28. Bioinformatics. 2005 Feb 15;21(4):502-8 - PubMed
  29. J Bacteriol. 2005 Feb;187(3):1135-60 - PubMed
  30. Physiol Genomics. 2006 Apr 13;25(2):179-93 - PubMed
  31. BMC Bioinformatics. 2003 Jun 25;4:27 - PubMed
  32. Genetics. 2003 Oct;165(2):747-57 - PubMed
  33. BMC Bioinformatics. 2005;6:26 - PubMed
  34. Mol Carcinog. 2004 Sep;41(1):17-38 - PubMed
  35. Genome Biol. 2004;5(11):R85 - PubMed
  36. Plant Physiol. 2005 Jun;138(2):757-66 - PubMed
  37. Genomics. 2002 Jan;79(1):104-13 - PubMed
  38. Bioinformatics. 2001 Jun;17(6):509-19 - PubMed
  39. Plant J. 2004 Dec;40(5):752-71 - PubMed
  40. Genome Biol. 2003;4(6):R41 - PubMed
  41. Diabetologia. 2002 Nov;45(11):1584-93 - PubMed
  42. Cancer Genet Cytogenet. 2005 Jan 1;156(1):14-22 - PubMed
  43. Carcinogenesis. 2005 Aug;26(8):1343-53 - PubMed
  44. BMC Genomics. 2005;6:6 - PubMed
  45. Clin Cancer Res. 2005 Jan 15;11(2 Pt 1):565-72 - PubMed
  46. Gene. 2004 Mar 31;329:167-85 - PubMed
  47. Bioinformatics. 2002;18 Suppl 1:S96-104 - PubMed
  48. Blood Coagul Fibrinolysis. 2006 Apr;17(3):173-80 - PubMed
  49. Nucleic Acids Res. 2003 Feb 15;31(4):e15 - PubMed
  50. Biostatistics. 2003 Apr;4(2):249-64 - PubMed

Publication Types

Grant support