Display options
Share it on

Bioinform Biol Insights. 2008 Mar 01;2:75-94. doi: 10.4137/bbi.s443.

Estimating the fraction of non-coding RNAs in mammalian transcriptomes.

Bioinformatics and biology insights

Yurong Xin, Giulio Quarta, Hin Hark Gan, Tamar Schlick

Affiliations

  1. Department of Chemistry, 251 Mercer Street, New York University, New York, NY 10012, USA.

PMID: 19812767 PMCID: PMC2735967 DOI: 10.4137/bbi.s443

Abstract

Recent studies of mammalian transcriptomes have identified numerous RNA transcripts that do not code for proteins; their identity, however, is largely unknown. Here we explore an approach based on sequence randomness patterns to discern different RNA classes. The relative z-score we use helps identify the known ncRNA class from the genome, intergene and intron classes. This leads us to a fractional ncRNA measure of putative ncRNA datasets which we model as a mixture of genuine ncRNAs and other transcripts derived from genomic, intergenic and intronic sequences. We use this model to analyze six representative datasets identified by the FANTOM3 project and two computational approaches based on comparative analysis (RNAz and EvoFold). Our analysis suggests fewer ncRNAs than estimated by DNA sequencing and comparative analysis, but the verity of our approach and its prediction requires more extensive experimental RNA data.

Keywords: fraction model; putative non-coding RNA; randomness test

References

  1. Trends Genet. 2006 Jan;22(1):1-5 - PubMed
  2. RNA. 2007 Apr;13(4):478-92 - PubMed
  3. EMBO J. 2006 Mar 8;25(5):923-31 - PubMed
  4. Nat Genet. 2004 Jan;36(1):40-5 - PubMed
  5. Trends Genet. 2005 May;21(5):289-97 - PubMed
  6. Q Rev Biophys. 2005 Aug;38(3):221-43 - PubMed
  7. J Comput Biol. 2005 Jun;12(5):545-53 - PubMed
  8. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D112-5 - PubMed
  9. Genome Res. 2004 Mar;14(3):331-42 - PubMed
  10. Nature. 1992 Mar 12;356(6365):168-70 - PubMed
  11. J Theor Biol. 1999 Feb 7;196(3):297-308 - PubMed
  12. PLoS Comput Biol. 2006 Apr;2(4):e33 - PubMed
  13. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D101-3 - PubMed
  14. Hum Mol Genet. 2006 Apr 15;15 Spec No 1:R17-29 - PubMed
  15. Biopolymers. 1990 May-Jun;29(6-7):1105-19 - PubMed
  16. Genome Res. 2003 Jun;13(6B):1301-6 - PubMed
  17. BMC Bioinformatics. 2001;2:8 - PubMed
  18. Nucleic Acids Res. 2004 Jan 15;32(1):354-60 - PubMed
  19. Nucleic Acids Res. 2003 Apr 1;31(7):2006-13 - PubMed
  20. Nat Biotechnol. 2005 Nov;23(11):1383-90 - PubMed
  21. Nucleic Acids Res. 2005 Oct 27;33(18):6057-69 - PubMed
  22. Science. 2004 Dec 24;306(5705):2242-6 - PubMed
  23. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4 - PubMed
  24. Science. 2005 Sep 2;309(5740):1559-63 - PubMed
  25. Trends Genet. 2005 Feb;21(2):93-102 - PubMed
  26. Genome Res. 2006 Jan;16(1):11-9 - PubMed
  27. Nucleic Acids Res. 1999 Dec 15;27(24):4816-22 - PubMed
  28. Genome Res. 2003 Jun;13(6B):1273-89 - PubMed
  29. RNA. 2005 Jun;11(6):853-63 - PubMed
  30. Science. 2002 May 3;296(5569):916-9 - PubMed
  31. Theor Biol Med Model. 2005 Aug 11;2:29 - PubMed
  32. Science. 2005 May 20;308(5725):1149-54 - PubMed
  33. Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2454-9 - PubMed
  34. Gene. 2005 Jan 17;345(1):81-90 - PubMed
  35. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D125-30 - PubMed
  36. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D121-4 - PubMed
  37. Curr Opin Struct Biol. 2006 Jun;16(3):279-87 - PubMed

Publication Types