Display options
Share it on

J Natl Cancer Inst Monogr. 2013 Dec;2013(47):140-6. doi: 10.1093/jncimonographs/lgt026.

Methodological considerations in analyzing Twitter data.

Journal of the National Cancer Institute. Monographs

Annice E Kim, Heather M Hansen, Joe Murphy, Ashley K Richards, Jennifer Duke, Jane A Allen

Affiliations

  1. RTI International, 3040 Cornwallis Rd, PO Box 12194, Research Triangle Park, NC 27709. [email protected].

PMID: 24395983 DOI: 10.1093/jncimonographs/lgt026

Abstract

Twitter is an online microblogging tool that disseminates more than 400 million messages per day, including vast amounts of health information. Twitter represents an important data source for the cancer prevention and control community. This paper introduces investigators in cancer research to the logistics of Twitter analysis. It explores methodological challenges in extracting and analyzing Twitter data, including characteristics and representativeness of data; data sources, access, and cost; sampling approaches; data management and cleaning; standardizing metrics; and analysis. We briefly describe the key issues and provide examples from the literature and our studies using Twitter data to understand public health issues. For investigators considering Twitter-based cancer research, we recommend assessing whether research questions can be answered appropriately using Twitter, choosing search terms carefully to optimize precision and recall, using respected vendors that can provide access to the full Twitter data stream if possible, standardizing metrics to account for growth in the Twitter population over time, considering crowdsourcing for analysis of Twitter content, and documenting and publishing all methodological decisions to further the evidence base.

MeSH terms

Publication Types