This Student’s Side Project Will Help Decide Musk vs. Twitter

In the battle over Twitter’s future, the number of bots on the platform is a key issue. Problem is, nobody knows how to count them.
A barometer measuring whether a Twitter account is a bot or human.
ILLUSTRATION: ABBR. PROJECTS

Bots Run the Internet

August 5 was not a normal day for Kaicheng Yang. It was the day after a US court published Elon Musk’s argument on why he should no longer have to buy Twitter. And Yang, a PhD student at Indiana University, was shocked to discover that his bot detection software was at the center of a titanic legal battle.

Twitter sued Musk in July, after the Tesla CEO tried to retract his $44 billion offer to buy the platform. Musk, in turn, filed a countersuit accusing the social network of misrepresenting the numbers of fake accounts on the platform. Twitter has long maintained that spam bots represent less than 5 percent of its total number of “monetizable” users—or users that can see ads.

According to legal documents, Yang’s Botometer, a free tool that claims it can identify how likely a Twitter account is to be a bot, has been critical in helping Team Musk prove that figure is not true. “Contrary to Twitter’s representations that its business was minimally affected by false or spam accounts, the Musk Parties’ preliminary estimates show otherwise,” says Musk’s counterclaim.

But telling the difference between humans and bots is harder than it sounds, and one researcher has accused Botometer of “pseudoscience” for making it look easy. Twitter has been quick to point out that Musk used a tool with a history of making mistakes. In its legal filings, the platform reminded the court that Botometer defined Musk himself as likely to be bot earlier this year.

Despite that, Botometer has become prolific, especially among university researchers, due to the demand for tools that promise to distinguish bot accounts from humans. As a result, it will not only be Musk and Twitter on trial in October, but also the science behind bot detection.

Yang did not start Botometer; he inherited it. The project was set up around eight years ago. But as its founders graduated and moved on from university, responsibility for maintaining and updating the tool fell to Yang, who declines to confirm or deny whether he has been in contact with Elon Musk’s team. Botometer is not his full-time job; it’s more of a side project, he says. He works on the tool when he’s not carrying out research for his PhD project. “Currently, it’s just me and my adviser,” he says. “So I’m the person really doing the coding.”

Botometer is a supervised machine learning tool, which means it has been taught to separate bots from humans on its own. Yang says Botometer differentiates bots from humans by looking at more than 1,000 details associated with a single Twitter account—such as its name, profile picture, followers, and ratio of tweets to retweets—before giving it a score between zero and five. “The higher the score means it’s more likely to be a bot, the lower score means it’s more likely to be a human,” says Yang. “If an account has a score of 4.5, it means it’s really likely to be a bot. But if it’s 1.2, it’s more likely to be a human.”

Crucially, however, Botometer does not give users a threshold, a definitive number that defines all accounts with higher scores as bots. Yang says the tool should not be used at all to decide whether individual accounts or groups of accounts are bots.He prefers it be used comparatively to understand whether one conversation topic is more polluted by bots than another.

Still, some researchers continue to use the tool incorrectly, says Yang. And the lack of threshold has created a gray area. Without a threshold, there’s no consensus about how to define a bot. Researchers hoping to find more bots can choose a lower threshold than researchers hoping to find less. In pursuit of clarity, many disinformation researchers have defaulted to defining bots as any account that scores above 50 percent or 2.5 on Botometer’s scale, according to Florian Gallwitz, a computer science professor at Germany’s Nuremberg Institute of Technology.

Gallwitz is an outspoken critic of Botometer, claiming it is polluting the way academics study disinformation on Twitter. In July, he published a paper claiming that out of hundreds of accounts scoring 2.5 and above, not a single one was a bot. “Many of these accounts are operated by people with impressive academic and professional credentials,” the paper reads.

One account that Botometer flags as suspicious using the 2.5 threshold is that of Annalena Baerbock, Germany’s foreign minister, who scores 2.8 (although Botometer warns in the results that “19 percent of accounts with a bot score above 2.8 are labeled as humans”). Baerbock’s team told WIRED that the foreign minister’s account is not automated in any way.

To Gallwitz, these types of false positives prove that Botometer doesn’t work. “It is a tool that everybody can use to produce pseudoscience,” he claims. Gallwitz is frustrated that researchers relying on Botometer do not share examples of the accounts they identified as bots so that others can verify their results. As an example, he points to an August 2022 study by researchers at the University of Adelaide, which used Botometer to claim that between 60 and 80 percent of accounts tweeting pro-Ukraine and pro-Russia hashtags are bots. “We avoid reporting individual-level data due to privacy and ethics,” says Joshua Watt, one of the study’s authors.

Yet Yang is clear: 2.5 should not be a threshold as it signals that the machine learning model is “not really confident.” The allegations in Gallwitz’s study are not new, Yang adds, noting that some people exploit Botometer’s limitations—inevitable for all supervised machine learning algorithms, he argues—to undermine the entire field of study devoted to social bots.

But the threshold is an important detail when assessing the use of Botometer by Musk’s legal team. “Musk’s team didn’t provide any detail on what threshold they used,” adds Yang. “I’m not sure I’m convinced that the number they’ve provided is accurate,” he says. “You can choose any threshold to get any number you want.”

Read next

Read next

I, Pro-Bot