Are These the Hidden Deepfakes in the Anthony Bourdain Movie?

Pindrop, which makes software to identify synthetic audio, found three clips totaling 50 seconds in the nearly 2-hour movie.
Anthony Bourdain x 2
Morgan Neville, director of the movie Roadrunner, said he used software in several places to mimic the voice of celebrity chef Anthony Bourdain, who died three years ago.Photograph: BFA/Alamy

When  Roadrunner, a documentary about late TV chef and traveler Anthony Bourdain, opened in theaters last month, its director, Morgan Neville, spiced up promotional interviews with an unconventional disclosure for a documentarian. Some words viewers hear Bourdain speak in the film were faked by artificial intelligence software used to mimic the star’s voice.

Accusations from Bourdain fans that Neville had acted unethically quickly came to dominate coverage of the film. Despite that attention, how much of the fake Bourdain’s voice is in the two-hour movie, and what it said, has been unclear—until now.

In an interview that made his film infamous, Neville told The New Yorker that he had generated three fake Bourdain clips with the permission of his estate, all from words the chef had written or said but that were not available as audio. He revealed only one, an email Bourdain “reads” in the film’s trailer, but boasted that the other two clips would be undetectable. “If you watch the film,” The New Yorker quoted the Oscar-winning Neville saying, “you probably don’t know what the other lines are that were spoken by the AI, and you’re not going to know.”

Audio experts at Pindrop, a startup that helps banks and others fight phone fraud, think they do know. If the company’s analysis is correct, the deepfake Bourdain controversy is rooted in less than 50 seconds of audio in the 118-minute film.

Pindrop’s analysis flagged the email quote disclosed by Neville and also a clip early in the film apparently drawn from an essay Bourdain wrote about Vietnam titled “The Hungry American,” collected in his 2008 book, The Nasty Bits. It also highlighted audio midway through the film in which the chef observes that many chefs and writers have a “relentless instinct to fuck up a good thing.” The same sentences appear in an interview of Bourdain with food site First We Feast on the occasion of his 60th birthday in 2016, two years to the month before he died by suicide.

All three clips sound recognizably like Bourdain. On close listening, though, they appear to bear signatures of synthetic speech, such as odd prosody and fricatives such as “s” and “f” sounds. One Reddit user independently flagged the same three clips as Pindrop, writing that they were easy to hear on watching the film for a second time. The film’s distributor, Focus Features, did not respond to requests for comment; Neville’s production company declined to comment.

The director of Roadrunner said this clip of the chef musing on happiness was synthesized using AI software.

Audio source: Pindrop

When Neville predicted that his use of AI-generated media, sometimes termed deepfakes, would be undetectable, he may have overestimated the sophistication of his own fakery. He likely did not anticipate the controversy or attention his use of the technique would draw from fans and audio experts. When the furor reached the ears of researchers at Pindrop, they saw the perfect test case for software they built to detect audio deepfakes; they set it to work when the movie debuted on streaming services earlier this month. “We’re always looking for ways to test our systems, especially in real real conditions—this was a new way to validate our technology,” says Collin Davis, Pindrop’s chief technology officer.

Pindrop’s results may have resolved the mystery of Neville’s missing deepfakes, but the episode portends future controversies as deepfakes become more sophisticated and accessible for both creative and malicious projects.

Deepfake technology has become more convincing and easier to access in recent years. Some people have been victimized by pornographic deepfakes used for titillation or harassment. But very few in society have been directly touched, or deceived, by the technology. Despite fearful discussions in academia and Congress about the potential for mass deepfake deception, the threat has so far been largely hypothetical.

Neville’s project made deepfakes very real to Bourdain fans. Millions feel a personal connection with the chef, who could make raw authenticity crackle off the screen. The fake clips were a pointed reminder that those relationships were always filtered through technology and by media professionals like Neville. “If you learn that the technology you thought was enabling this authentic relationship is actually undermining it, that creates a crisis,” says William Little, a media studies professor at University of Virginia. He teaches a class on AI and film and will be adding Roadrunner to the syllabus as a case study in some questions raised by the technology.

Analysts at a fraud-detection startup believe this clip of Bourdain may have been synthesized using AI software.

Audio source: Pindrop

Neville, who never met Bourdain, told GQ that he turned to deepfake audio because he wanted to draw on the star’s thoughts that weren’t available on tape. “I wasn’t putting words into his mouth. I was just trying to make them come alive,” he said. It’s possible he also saw the technology as a way to win publicity for the film.

Deepfaking the subject of this particular film even has a certain logic: Roadrunner is about Bourdain’s different identities and the conflicting feelings they evoked in those around him and the star himself. Was Bourdain the unvarnished but goodhearted hero viewers came to love, or the “pain in the ass” friends say he could be off camera? An empathic explorer or just another white guy parachuting into foreign locales with a camera crew? And why did he struggle to be happy?

Neville’s use of deepfakes in pursuing those questions is in some ways not hugely different from more established and accepted documentary techniques that also have a degree of artifice. Some used in Roadrunner may have seemed deceptive in earlier times.

Neville has Bourdain narrate the film of his own life from beyond the grave in a tapestry of audio drawn from TV shows, audiobooks, radio, and podcasts. The deepfakes provide just a few tiny threads. And the film uses conventional sleights of editing that combine audio and video from different times and places in sometimes reality-bending ways. In one scene, a business associate of Bourdain's recounts a notable phone call, against early footage of the star talking on a flip phone. Did that clip from the archive capture his side of that same call? Likely not, but the illusion helps tell the story.

More than a century since the first motion pictures, audiences are used to such tricks. Media industry and audience expectations for deepfakes are still a work in progress. “This is something everyone is grappling with,” says Sam Gregory, who works on deepfakes policy at the nonprofit Witness and often talks with media producers and tech companies about disclosure. “People generally coalesce around the idea that you need to have some way to signify to consumers or viewers that there is some manipulation.”

The analysts believe this clip of the star talking about Vietnam may have been synthesized using AI software. Audio source: Pindrop

Some directors have tried. In the 2020 documentary Welcome to Chechnya, about LGBTQ activists fleeing persecution, some subjects are digitally masked with synthetic faces that mimic their facial movements. The film’s producers intentionally stopped short of spoofing reality too closely, giving their digital masks an eerie blurriness they call a halo as a form of disclosure.

Audio provides less scope for such signals but it is still possible to inform listeners about the source of what they’re hearing. At one point in Roadrunner, a caption advises viewers they are hearing “VOICE OVER - OUTTAKE.” It’s not clear why Neville didn’t use a “synthetic audio” caption for his AI-generated clips—or if disclosing them in the film, not just interviews in which he boasted they were undetectable, would have softened the backlash.

Pindrop’s contribution to the Roadrunner controversy illustrates how deepfake detectors can help uncover deception but also that such technology is no panacea.

To scan for fake Bourdain, the company processed the film’s soundtrack to remove noise and to make speech more prominent, then ran the segments containing speech through a deepfakes detector based on machine learning that looks for signatures of synthetic voices. Elie Khoury, Pindrop’s director of research, says some of those artifacts can be perceived by the human ear, but others require technological help.

Pindrop’s system gave every four-second segment of speech in Roadrunner a deepfake score from 1 to 100; the company identified the two missing synthetic clips after reviewing the 30 segments that scored highest, which also included the fake clip disclosed by Neville. The results of that process show the power but also some limitations of deepfake detection. Some segments other than the three Pindrop ultimately homed in on also scored highly on the initial scan.

Most were easily eliminated as false positives by giveaways such as that they matched visuals on screen like Bourdain’s lips moving, or drawing on standard audio forensic techniques that detected conventional sound processing, heavy music, or background noise. Davis of Pindrop says that when the company provides fraud detection in call centers, false positives can be checked by prompting a caller who triggered the system to provide extra security information. But not every example of alleged deepfake deception will allow easy verification or cross checking.

A disputed video of a politician detained in the military coup in Myanmar this year illustrates that problem. In the clip, the man claims to have given Burmese leader Aung San Suu Kyi corrupt payments in cash and gold. His voice and face appear distorted. Accusations it was synthetic surged after a screenshot from an online deepfake detector declaring the clip fake with 93 percent certainty was posted to Twitter. The case is far from closed, because there is no way to confirm that claim.

Deepfake detectors are a nascent art and different systems can produce wildly divergent results. Deep audio and video forensic expertise is needed to interpret or check the results from such tools. “If you’re not careful, putting detectors out there can make it more difficult to tell what is fake or not,” Gregory of Witness says. He still considers the Myanmar video’s authenticity unknown.

One remaining mystery about the Bourdain deepfakes suggests the controversy may still have more lessons to teach. Neville told GQ that he had deepfake Bourdains made by four different companies and chose the one that sounded best, but he has not identified any of them.

WIRED contacted 10 companies that advertise their ability to synthesize or clone voices, from small startups to Google and Microsoft—an exercise that highlighted how the technology is now widely available. All denied working with Neville on his project. A Pindrop analysis suggested that Bourdain was likely given posthumous voice using a version of a technique first published by Google’s DeepMind AI division in 2016 that has since been integrated into Google’s virtual assistant and widely reimplemented in open source software. A spokesperson for DeepMind said the company supports the idea that “no voices should be used without permission.”


More Great WIRED Stories