A Data-Driven Method of Discovering Misspellings of Medication Names on Twitter

Keyuan Jiang; Tingyu Chen; Liyuan Huang; Ricardo A Calix; Gordon R Bernard

A Data-Driven Method of Discovering Misspellings of Medication Names on Twitter

Stud Health Technol Inform. 2018:247:136-140.

Authors

Keyuan Jiang¹, Tingyu Chen¹, Liyuan Huang¹, Ricardo A Calix¹, Gordon R Bernard²

Affiliations

¹ Department of Computer Information Technology and Graphics, Purdue University Northwest, U.S.A.
² Department of Medicine, Vanderbilt University, U.S.A.

PMID: 29677938
PMCID: PMC6009827

Abstract

Twitter, as a microblogging social media platform, has seen increasing applications of its data for pharmacovigilance which is to monitor and promote safe uses of pharmaceutical products. Medication names are typically used as keywords to query social media data. It is known that medication names are misspelled on social media, and finding the misspellings is challenging because there exists no a priori knowledge as to how people would misspell a medication name. We developed a data-driven, relational similarity-based approach to discover misspellings of medication names. Our approach is based upon the assumption of the identical (or similar) association of a medicine with its effects whether the medication is correctly spelled or misspelled. With distributed representations of the words in tweets posted in recent 24 months, we were able to discover a total of 54 misspellings of 6 medicines whose indications containing headache. Our search results also show that Twitter posts with misspellings of codeine and ibuprofen can be more than 10% of all the tweets associated with each of the medicines. Compared with the phonetics-based approach, our method discovered more actual misspellings used on Twitter.

Keywords: Distributed word representation; Information retrieval; Misspellings; Pharmacovigilance; Postmarking surveillance; Relational similarity; Twitter.

MeSH terms

Humans
Pharmacovigilance*
Social Media*

Grants and funding

R15 LM011999/LM/NLM NIH HHS/United States