Display options
Share it on

BMC Bioinformatics. 2016 Sep 15;17(1):379. doi: 10.1186/s12859-016-1231-2.

Pathogen metadata platform: software for accessing and analyzing pathogen strain information.

BMC bioinformatics

Wenling E Chang, Matthew W Peterson, Christopher D Garay, Tonia Korves

Affiliations

  1. Data Analytics Department, The MITRE Corporation, 2280 Historic Decatur Rd, San Diego, CA, 92106, USA.
  2. Data Analytics Department, The MITRE Corporation, 202 Burlington Rd, Bedford, MA, 01730, USA.
  3. Data Analytics Department, The MITRE Corporation, 202 Burlington Rd, Bedford, MA, 01730, USA. [email protected].

PMID: 27634291 PMCID: PMC5025631 DOI: 10.1186/s12859-016-1231-2

Abstract

BACKGROUND: Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results.

RESULTS: We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers.

CONCLUSIONS: This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .

Keywords: BioSample; Biosurveillance; Geocoding; Java; LabKey; Metadata; Pathogen; PostgreSQL

References

  1. Nucleic Acids Res. 2015 Jan;43(Database issue):D1099-106 - PubMed
  2. BMC Bioinformatics. 2013 Jan 17;14:19 - PubMed
  3. Nucleic Acids Res. 2014 Jan;42(Database issue):D581-91 - PubMed
  4. IEEE Trans Vis Comput Graph. 2011 Dec;17(12):2301-9 - PubMed
  5. Nat Biotechnol. 2011 May;29(5):415-20 - PubMed
  6. BMC Bioinformatics. 2011 Mar 09;12:71 - PubMed
  7. Nat Biotechnol. 2008 May;26(5):541-7 - PubMed
  8. Bioinformatics. 2012 Oct 15;28(20):2693-5 - PubMed
  9. PLoS One. 2014 Mar 27;9(3):e92877 - PubMed
  10. Bioinformatics. 2014 May 1;30(9):1312-3 - PubMed
  11. PLoS One. 2014 Jun 17;9(6):e99979 - PubMed
  12. Nucleic Acids Res. 2012 Jan;40(Database issue):D57-63 - PubMed
  13. Nucleic Acids Res. 2015 Jan;43(Database issue):D6-17 - PubMed
  14. Bioinformatics. 2009 Jun 1;25(11):1422-3 - PubMed
  15. Nucleic Acids Res. 2012 Jan;40(Database issue):D593-8 - PubMed
  16. Nucleic Acids Res. 2012 Jan;40(Database issue):D64-70 - PubMed

MeSH terms

Publication Types