Applying GIS and Machine Learning Methods to Twitter Data for Multiscale Surveillance of Influenza

Chris Allen; Ming-Hsiang Tsou; Anoshe Aslam; Anna Nagel; Jean-Mark Gawron

doi:10.1371/journal.pone.0157734

Applying GIS and Machine Learning Methods to Twitter Data for Multiscale Surveillance of Influenza

PLoS One. 2016 Jul 25;11(7):e0157734. doi: 10.1371/journal.pone.0157734. eCollection 2016.

Authors

Chris Allen¹, Ming-Hsiang Tsou¹, Anoshe Aslam², Anna Nagel², Jean-Mark Gawron³

Affiliations

¹ Department of Geography, San Diego State University, San Diego, California, United States of America.
² Graduate School of Public Health, San Diego State University, San Diego, California, United States of America.
³ Department of Linguistics, San Diego State University, San Diego, California, United States of America.

Abstract

Traditional methods for monitoring influenza are haphazard and lack fine-grained details regarding the spatial and temporal dynamics of outbreaks. Twitter gives researchers and public health officials an opportunity to examine the spread of influenza in real-time and at multiple geographical scales. In this paper, we introduce an improved framework for monitoring influenza outbreaks using the social media platform Twitter. Relying upon techniques from geographic information science (GIS) and data mining, Twitter messages were collected, filtered, and analyzed for the thirty most populated cities in the United States during the 2013-2014 flu season. The results of this procedure are compared with national, regional, and local flu outbreak reports, revealing a statistically significant correlation between the two data sources. The main contribution of this paper is to introduce a comprehensive data mining process that enhances previous attempts to accurately identify tweets related to influenza. Additionally, geographical information systems allow us to target, filter, and normalize Twitter messages.

MeSH terms

Disease Outbreaks
Geographic Information Systems*
Geography, Medical
Humans
Influenza, Human / epidemiology*
Machine Learning*
Public Health Surveillance*
Social Media*
United States / epidemiology

Grants and funding

This research was conducted under a National Science Foundation Cyber-enabled Discovery and Innovation (CDI) grant (award #1028177). This research was also conducted under National Science Foundation grant #1416509.The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.