Genome-wide analysis of fitness data and its application to improve metabolic models.

Display options

Format

Share it on

BMC Bioinformatics. 2018 Oct 10;19(1):368. doi: 10.1186/s12859-018-2341-9.

Genome-wide analysis of fitness data and its application to improve metabolic models.

BMC bioinformatics

Edward Vitkin, Oz Solomon, Sharon Sultan, Zohar Yakhini

Affiliations

Department of Computer Science, Technion, Haifa, Israel.
Faculty of Biotechnology and Food Engineering, Technion, Haifa, Israel. [email protected].
School of Computer Science, The Interdisciplinary Center, Herzliya, Israel. [email protected].
School of Computer Science, The Interdisciplinary Center, Herzliya, Israel.
Department of Computer Science, Technion, Haifa, Israel. [email protected].
School of Computer Science, The Interdisciplinary Center, Herzliya, Israel. [email protected].

PMID: 30305012 PMCID: PMC6180484 DOI: 10.1186/s12859-018-2341-9

Abstract

BACKGROUND: Synthetic biology and related techniques enable genome scale high-throughput investigation of the effect on organism fitness of different gene knock-downs/outs and of other modifications of genomic sequence.

RESULTS: We develop statistical and computational pipelines and frameworks for analyzing high throughput fitness data over a genome scale set of sequence variants. Analyzing data from a high-throughput knock-down/knock-out bacterial study, we investigate differences and determinants of the effect on fitness in different conditions. Comparing fitness vectors of genes, across tens of conditions, we observe that fitness consequences strongly depend on genomic location and more weakly depend on gene sequence similarity and on functional relationships. In analyzing promoter sequences, we identified motifs associated with conditions studied in bacterial media such as Casaminos, D-glucose, Sucrose, and other sugars and amino-acid sources. We also use fitness data to infer genes associated with orphan metabolic reactions in the iJO1366 E. coli metabolic model. To do this, we developed a new computational method that integrates gene fitness and gene expression profiles within a given reaction network neighborhood to associate this reaction with a set of genes that potentially encode the catalyzing proteins. We then apply this approach to predict candidate genes for 107 orphan reactions in iJO1366. Furthermore - we validate our methodology with known reactions using a leave-one-out approach. Specifically, using top-20 candidates selected based on combined fitness and expression datasets, we correctly reconstruct 39.7% of the reactions, as compared to 33% based on fitness and to 26% based on expression separately, and to 4.02% as a random baseline. Our model improvement results include a novel association of a gene to an orphan cytosine nucleosidation reaction.

CONCLUSION: Our pipeline for metabolic modeling shows a clear benefit of using fitness data for predicting genes of orphan reactions. Along with the analysis pipelines we developed, it can be used to analyze similar high-throughput data.

Keywords: Co-expression; Co-fitness; Fitness data; Flux balance analysis (FBA); Metabolic modelling; Orphan reactions

References

Nat Biotechnol. 2014 Nov;32(11):1146-50 - PubMed
Nature. 2002 Nov 14;420(6912):190-3 - PubMed
Bioinformatics. 2004 Aug 4;20 Suppl 1:i178-85 - PubMed
J Biol Chem. 2001 Jan 12;276(2):884-94 - PubMed
Nat Biotechnol. 2012 Feb 26;30(3):271-7 - PubMed
Nucleic Acids Res. 2013 Feb 1;41(3):e45 - PubMed
Curr Opin Biotechnol. 2014 Oct;29:39-45 - PubMed
PLoS Genet. 2011 Nov;7(11):e1002385 - PubMed
Mol Syst Biol. 2011 Oct 11;7:535 - PubMed
Nucleic Acids Res. 2016 Jan 4;44(D1):D133-43 - PubMed
Nat Rev Microbiol. 2013 Jul;11(7):435-42 - PubMed
Nature. 2018 May;557(7706):503-509 - PubMed
Mol Syst Biol. 2017 Jan 16;13(1):907 - PubMed
Bioinformatics. 2010 Feb 15;26(4):536-43 - PubMed
Appl Environ Microbiol. 2012 Jan;78(1):70-80 - PubMed
Bioinformatics. 2007 Jul 1;23(13):i205-11 - PubMed
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W253-8 - PubMed
Nucleic Acids Res. 2017 Jan 4;45(D1):D543-D550 - PubMed
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85 - PubMed
J Bacteriol. 1973 Oct;116(1):12-8 - PubMed
BMC Genomics. 2015 Feb 05;16:37 - PubMed
Curr Opin Biotechnol. 2017 Oct;47:67-82 - PubMed
Proc Natl Acad Sci U S A. 1989 Feb;86(3):830-4 - PubMed
Algorithms Mol Biol. 2014 Apr 05;9(1):11 - PubMed
Nat Rev Genet. 2014 Feb;15(2):107-20 - PubMed
Nature. 2010 May 20;465(7296):363-7 - PubMed
Nucleic Acids Res. 2006 Jan 05;34(1):1-9 - PubMed
Nucleic Acids Res. 2013 Jan;41(Database issue):D613-24 - PubMed
Appl Microbiol Biotechnol. 2004 Oct;65(5):576-82 - PubMed
BMC Bioinformatics. 2006 Mar 29;7:177 - PubMed
Genome Biol. 2012 Nov 29;13(11):R111 - PubMed
Biotechnol Bioeng. 2003 Dec 20;84(6):647-57 - PubMed
Trends Genet. 2014 Jul;30(7):287-97 - PubMed
Curr Opin Biotechnol. 2016 Feb;37:127-134 - PubMed
Nat Biotechnol. 2004 Jul;22(7):911-7 - PubMed
BMC Genomics. 2011 Aug 01;12:385 - PubMed
MBio. 2015 May 12;6(3):e00306-15 - PubMed
Nat Rev Genet. 2013 Jun;14(6):390-403 - PubMed
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W174-9 - PubMed
Genome Res. 2014 Jun;24(6):999-1011 - PubMed
Sci Rep. 2016 Jun 13;6:27761 - PubMed
BMC Bioinformatics. 2004 Jun 09;5:76 - PubMed
Nat Biotechnol. 2012 May 20;30(6):521-30 - PubMed
Nat Biotechnol. 2010 Mar;28(3):245-8 - PubMed
Genes Dev. 2005 Dec 1;19(23):2816-26 - PubMed
Bioinformatics. 2016 Sep 1;32(17):i559-i566 - PubMed
Environ Microbiol. 2002 Mar;4(3):133-40 - PubMed
J Bacteriol. 2002 Dec;184(23):6602-14 - PubMed
Genome Res. 2014 Oct;24(10):1698-706 - PubMed
Science. 2016 Jan 15;351(6270):null - PubMed
Genome Biol. 2007;8(2):R24 - PubMed
Front Microbiol. 2014 Aug 13;5:402 - PubMed
Nucleic Acids Res. 2000 Jan 1;28(1):33-6 - PubMed
PLoS Comput Biol. 2007 Mar 23;3(3):e39 - PubMed
Genome Biol. 2006;7(2):R17 - PubMed

MeSH terms

Publication Types

Journal Article