A Data-Driven Method of Discovering Misspellings of Medication Names on Twitter

Stud Health Technol Inform. 2018:247:136-140.

Abstract

Twitter, as a microblogging social media platform, has seen increasing applications of its data for pharmacovigilance which is to monitor and promote safe uses of pharmaceutical products. Medication names are typically used as keywords to query social media data. It is known that medication names are misspelled on social media, and finding the misspellings is challenging because there exists no a priori knowledge as to how people would misspell a medication name. We developed a data-driven, relational similarity-based approach to discover misspellings of medication names. Our approach is based upon the assumption of the identical (or similar) association of a medicine with its effects whether the medication is correctly spelled or misspelled. With distributed representations of the words in tweets posted in recent 24 months, we were able to discover a total of 54 misspellings of 6 medicines whose indications containing headache. Our search results also show that Twitter posts with misspellings of codeine and ibuprofen can be more than 10% of all the tweets associated with each of the medicines. Compared with the phonetics-based approach, our method discovered more actual misspellings used on Twitter.

Keywords: Distributed word representation; Information retrieval; Misspellings; Pharmacovigilance; Postmarking surveillance; Relational similarity; Twitter.

MeSH terms

  • Humans
  • Pharmacovigilance*
  • Social Media*