BMC Bioinformatics. 2018 Oct 10;19(1):368. doi: 10.1186/s12859-018-2341-9.
Genome-wide analysis of fitness data and its application to improve metabolic models.
BMC bioinformatics
Edward Vitkin, Oz Solomon, Sharon Sultan, Zohar Yakhini
Affiliations
Affiliations
- Department of Computer Science, Technion, Haifa, Israel.
- Faculty of Biotechnology and Food Engineering, Technion, Haifa, Israel. [email protected].
- School of Computer Science, The Interdisciplinary Center, Herzliya, Israel. [email protected].
- School of Computer Science, The Interdisciplinary Center, Herzliya, Israel.
- Department of Computer Science, Technion, Haifa, Israel. [email protected].
- School of Computer Science, The Interdisciplinary Center, Herzliya, Israel. [email protected].
PMID: 30305012
PMCID: PMC6180484 DOI: 10.1186/s12859-018-2341-9
Abstract
BACKGROUND: Synthetic biology and related techniques enable genome scale high-throughput investigation of the effect on organism fitness of different gene knock-downs/outs and of other modifications of genomic sequence.
RESULTS: We develop statistical and computational pipelines and frameworks for analyzing high throughput fitness data over a genome scale set of sequence variants. Analyzing data from a high-throughput knock-down/knock-out bacterial study, we investigate differences and determinants of the effect on fitness in different conditions. Comparing fitness vectors of genes, across tens of conditions, we observe that fitness consequences strongly depend on genomic location and more weakly depend on gene sequence similarity and on functional relationships. In analyzing promoter sequences, we identified motifs associated with conditions studied in bacterial media such as Casaminos, D-glucose, Sucrose, and other sugars and amino-acid sources. We also use fitness data to infer genes associated with orphan metabolic reactions in the iJO1366 E. coli metabolic model. To do this, we developed a new computational method that integrates gene fitness and gene expression profiles within a given reaction network neighborhood to associate this reaction with a set of genes that potentially encode the catalyzing proteins. We then apply this approach to predict candidate genes for 107 orphan reactions in iJO1366. Furthermore - we validate our methodology with known reactions using a leave-one-out approach. Specifically, using top-20 candidates selected based on combined fitness and expression datasets, we correctly reconstruct 39.7% of the reactions, as compared to 33% based on fitness and to 26% based on expression separately, and to 4.02% as a random baseline. Our model improvement results include a novel association of a gene to an orphan cytosine nucleosidation reaction.
CONCLUSION: Our pipeline for metabolic modeling shows a clear benefit of using fitness data for predicting genes of orphan reactions. Along with the analysis pipelines we developed, it can be used to analyze similar high-throughput data.
Keywords: Co-expression; Co-fitness; Fitness data; Flux balance analysis (FBA); Metabolic modelling; Orphan reactions
References
- Nat Biotechnol. 2014 Nov;32(11):1146-50 - PubMed
- Nature. 2002 Nov 14;420(6912):190-3 - PubMed
- Bioinformatics. 2004 Aug 4;20 Suppl 1:i178-85 - PubMed
- J Biol Chem. 2001 Jan 12;276(2):884-94 - PubMed
- Nat Biotechnol. 2012 Feb 26;30(3):271-7 - PubMed
- Nucleic Acids Res. 2013 Feb 1;41(3):e45 - PubMed
- Curr Opin Biotechnol. 2014 Oct;29:39-45 - PubMed
- PLoS Genet. 2011 Nov;7(11):e1002385 - PubMed
- Mol Syst Biol. 2011 Oct 11;7:535 - PubMed
- Nucleic Acids Res. 2016 Jan 4;44(D1):D133-43 - PubMed
- Nat Rev Microbiol. 2013 Jul;11(7):435-42 - PubMed
- Nature. 2018 May;557(7706):503-509 - PubMed
- Mol Syst Biol. 2017 Jan 16;13(1):907 - PubMed
- Bioinformatics. 2010 Feb 15;26(4):536-43 - PubMed
- Appl Environ Microbiol. 2012 Jan;78(1):70-80 - PubMed
- Bioinformatics. 2007 Jul 1;23(13):i205-11 - PubMed
- Nucleic Acids Res. 2007 Jul;35(Web Server issue):W253-8 - PubMed
- Nucleic Acids Res. 2017 Jan 4;45(D1):D543-D550 - PubMed
- Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85 - PubMed
- J Bacteriol. 1973 Oct;116(1):12-8 - PubMed
- BMC Genomics. 2015 Feb 05;16:37 - PubMed
- Curr Opin Biotechnol. 2017 Oct;47:67-82 - PubMed
- Proc Natl Acad Sci U S A. 1989 Feb;86(3):830-4 - PubMed
- Algorithms Mol Biol. 2014 Apr 05;9(1):11 - PubMed
- Nat Rev Genet. 2014 Feb;15(2):107-20 - PubMed
- Nature. 2010 May 20;465(7296):363-7 - PubMed
- Nucleic Acids Res. 2006 Jan 05;34(1):1-9 - PubMed
- Nucleic Acids Res. 2013 Jan;41(Database issue):D613-24 - PubMed
- Appl Microbiol Biotechnol. 2004 Oct;65(5):576-82 - PubMed
- BMC Bioinformatics. 2006 Mar 29;7:177 - PubMed
- Genome Biol. 2012 Nov 29;13(11):R111 - PubMed
- Biotechnol Bioeng. 2003 Dec 20;84(6):647-57 - PubMed
- Trends Genet. 2014 Jul;30(7):287-97 - PubMed
- Curr Opin Biotechnol. 2016 Feb;37:127-134 - PubMed
- Nat Biotechnol. 2004 Jul;22(7):911-7 - PubMed
- BMC Genomics. 2011 Aug 01;12:385 - PubMed
- MBio. 2015 May 12;6(3):e00306-15 - PubMed
- Nat Rev Genet. 2013 Jun;14(6):390-403 - PubMed
- Nucleic Acids Res. 2013 Jul;41(Web Server issue):W174-9 - PubMed
- Genome Res. 2014 Jun;24(6):999-1011 - PubMed
- Sci Rep. 2016 Jun 13;6:27761 - PubMed
- BMC Bioinformatics. 2004 Jun 09;5:76 - PubMed
- Nat Biotechnol. 2012 May 20;30(6):521-30 - PubMed
- Nat Biotechnol. 2010 Mar;28(3):245-8 - PubMed
- Genes Dev. 2005 Dec 1;19(23):2816-26 - PubMed
- Bioinformatics. 2016 Sep 1;32(17):i559-i566 - PubMed
- Environ Microbiol. 2002 Mar;4(3):133-40 - PubMed
- J Bacteriol. 2002 Dec;184(23):6602-14 - PubMed
- Genome Res. 2014 Oct;24(10):1698-706 - PubMed
- Science. 2016 Jan 15;351(6270):null - PubMed
- Genome Biol. 2007;8(2):R24 - PubMed
- Front Microbiol. 2014 Aug 13;5:402 - PubMed
- Nucleic Acids Res. 2000 Jan 1;28(1):33-6 - PubMed
- PLoS Comput Biol. 2007 Mar 23;3(3):e39 - PubMed
- Genome Biol. 2006;7(2):R17 - PubMed
MeSH terms
Publication Types