Tech

When Is a Photo Not a Photo? The Looming Specter of Artificially Generated Photographs

Everyone’s talking about ChatGPT. But watch out for new photo-and-art-based AI systems, which may drown us in a sea of synthetic imagery.
This is a synthetic image not a photograph created recently by DALLE  and the author in response to a text prompt asking...
This is a synthetic image, not a photograph, created recently by DALL-E (artificial intelligence software) and the author in response to a text prompt, asking the system to come up with an “iconic photograph that is so horrible it would cause wars to stop” in the style of the great war photographer Robert Capa. It is meant as an homage to Capa, who died in 1954.

In 1984, when photographers were still using film, I began exploring the early use of computers to undetectably modify photographs. In an article in The New York Times Magazine I wrote that “in the not-too-distant future, realistic-looking images will probably have to be labeled, like words, as either fiction or nonfiction, because it may be impossible to tell them apart. We may have to rely on the image maker, and not the image, to tell us into which category certain pictures fall.”

This was two years after National Geographic, at the dawn of the digital image revolution, had modified a photograph of the pyramids of Giza so that it would better fit on its cover, using a computer to shift one pyramid closer to the other. The magazine’s editor defended the alteration, viewing it not as a falsification but, as I wrote then, “merely the establishment of a new point of view, as if the photographer had been retroactively moved a few feet to one side.” I was astonished. It seemed to me that the magazine had just introduced to photography a concept from science fiction—virtual time travel—as if revisiting a scene and photographing it again.

In a few short years, image manipulation software such as Photoshop began to transform the photograph from essentially being a visual record made at a specific moment in time to a malleable medium that could be modified anytime. Critic Susan Sontag’s earlier characterization of the photograph as “not only an image (as a painting is an image), an interpretation of the real; it is also a trace, something directly stenciled off the real, like a footprint or a death mask” would soon become nostalgic. And today, rather than advertisements promoting Kodak’s film-era slogan “let the memories begin,” the Google Pixel camera touts itself as having both a “magic eraser” and a “face unblur.”

We are now experiencing another quantum leap in visual technology. While much has been written about artificial intelligence (AI) systems, such as ChatGPT, which process massive data sets to simulate writing “in the style of,” say, Shakespeare, or lyrics “in the manner of” Lizzo—the potential impact of synthetic imaging has received less attention. Systems like OpenAI, Stable Diffusion, and Midjourney make it possible to use text prompts to produce images, in seconds, that closely resemble photographs of people and places that never existed, or of events that never happened. Rather than search online for “actual” photographs of people and events, one might soon be encouraged to have similar images made to one’s specifications.

Anyone—without the use of a camera—can now create images inspired by the work of famous photographers or, for that matter, the work of painters, musicians, philosophers, scientists, and so on. This has provoked concerns as to whether it is necessary to recognize and compensate those, particularly artists and designers, who have provided, largely without their consent, some of the hundreds of millions of images used to train such systems, and whether the creators should be able to opt out of future involvement. (Others, such as fashion models and stylists, might also find their livelihoods at risk.) Among several lawsuits recently brought against companies with AI image generators, including some filed by artists, Getty Images sued Stability AI this month for a whopping $1.8 trillion, contending a “brazen infringement” of its “intellectual property on a staggering scale” due to, as Getty claims, Stability AI’s unauthorized use of more than 12 million of Getty’s photos for training purposes “as part of its efforts to build a competing business.” A spokesman for Stability AI told Vanity Fair: “Please note that we take these matters seriously. We are reviewing the documents and will respond accordingly.”

Photographer Stephen Shore asked one AI text-to-image system to “Photograph like Stephen Shore.” The result had “a kind of deadpan blankness that I liked,” he said.Courtesy of the artist.

ARTISTS EXPERIMENTING

In this period of limbo—between high-tech invention and cultural acceptance—I’ve been experimenting with OpenAI’s Dall-E, discovering that such systems can be extraordinarily inventive on their own. When I asked the program, for instance, to come up with an “iconic photograph that is so horrible it would cause wars to stop” in the style of war photographer Robert Capa, Dall-E devised an image of a woman pointing a camera and a young girl huddled against her, looking fearful and distraught, the camera itself bent as if impacted by the scene in front of it. Rather than expose the viewer to a potentially traumatizing scene, its horror was indicated by the response of the child, the onlooker; it was now up to the viewer to imagine what had occurred. Similarly, I was surprised when I solicited “a photograph of the greatest mothers in the world” and was provided with a photorealistic image of a verdant setting in which an ape-like animal tenderly holds her baby—not the human mother and child that I had expected.

Such experiments make one aware of how photography has been used to confirm the expected (vacations are fun, celebrities are glamorous), typically relying on stereotypes. Whereas a caption, for all of its merits, can often limit the meaning of a photograph, the text prompts can at times lead to imagery that provokes a rethinking of one’s own biases. (That said, these systems can also come up with misogynistic or racist responses, given the preponderance of these kinds of images in the online data sets upon which they are based.)

These systems have other benefits beyond helping recalibrate bias. They allow for the exploration of arenas previously off-limits to photographs, such as a person’s own thoughts, dreams, and nightmares. Artificial intelligence can also generate useful photorealistic images that show would-be futures, such as depicting the potential effects of climate change on one’s community or the larger ecosystem, based on predictions by scientists. Such pictures, foreshadowing possible outcomes rather than showing a devastating aftermath, could encourage proactive responses.

Some artists are experimenting with AI in provocative ways. Photographer Stephen Shore, for instance, recently asked one text-to-image system to “photograph like Stephen Shore,” and posted on Instagram the resulting frame of a solitary pole in the center of what appears to be an industrial lot, finding that it had “a kind of deadpan blankness that I liked.” Another artist, Adam Broomberg, whose mother recently died, asked for “an image of my mother dying in [the] hospital,” and put up an Instagram post, a hauntingly intimate close-up of an older woman lying in bed. It could have easily been mistaken for a photograph. Broomberg remarked, “I wonder if there is such a thing as a grief bot?”

Courtesy of the artist.

But as with deepfake videos that show falsified events, synthetic imagery has the potential to not only destabilize society, skew discussions, and damage individuals, but to obscure our sense of what is real. As millions of such images proliferate, they undermine the eyewitness evidence of the photographs they resemble. Contemporary occurrences might be more easily disregarded, family albums become increasingly suspect, histories more amorphous. In Ashkelon, Israel, an exhibition created in collaboration with elderly Holocaust survivors depicts synthetic imagery representing their memories of certain long-ago traumatic events, creating a new kind of visual record that can compete with the photographic. And certain boundaries may be crossed: A South Korean firm just announced an AI system that facilitates ongoing conversations of a sort with dead loved ones (after synthesizing their photos and videos); another technology allowed a grandmother in Nottingham, England, to respond to questions from mourners at her own funeral.

In a news cycle often dominated by conspiracy theories and fake news, legitimate, unaltered photographs—instead of confronting us with realities from which we cannot look away (as happened during the Vietnam War and the Civil Rights Movement)—will more easily and automatically be rejected out of hand. It is not a coincidence, for example, that no single iconic image emerged to initiate or sustain a societal discussion about the 20-year war in Afghanistan, the longest in US history. Or that Western support for Ukraine, in its response to Russia’s brutal and ongoing invasion, was galvanized largely by the persistent video and online dispatches of Ukraine’s media-savvy president, Volodymyr Zelenskyy, rather than by a series of iconic photographs.

Certainly, other factors have contributed to the photograph’s reduced role as witness: the disappearance of the newspaper front page, the billions of competing images on social media. But now, rather than the photograph, it is the occasional amateur video, posted online along with a fuller background narrative—such as of the footage of the murder of George Floyd by a Minneapolis police officer—that manages to mobilize meaningful, broad-based response.

STEMMING THE TIDE

So what can we do in the face of this coming AI wave? Forensic scientists are seeking ways to identify undoctored photographs and differentiate them from manipulated or synthesized imagery. The European Union may soon be taking the lead in formulating regulations that would call for more transparency as to how these generative models work, with potential fines for companies that refuse to comply.

In another initiative, the Coalition for Content Provenance and Authenticity (C2PA), a consortium of large tech and media companies, is working to develop “standards for certifying the source and history (or provenance) of media content.” However, much of the burden is placed on the viewer: “…rather than attempt to determine the veracity of an asset, it enables users to make their own judgment by presenting the most salient and/or comprehensive provenance information.” More alarmingly they declare, “Detecting whether or not digital content is fake is currently impossible at internet scale and speed because manipulation software is increasingly more sophisticated, metadata can easily be manipulated and provides no proof of its origins.” Meanwhile, the watchdog group NewsGuard, run by veteran journalists Steven Brill and Gordon Crovitz, has tried to establish an objective scale to rate the reliability of news outlets as a way of combating misinformation.

In 1994 I led an effort proposing a “not a lens” icon, consisting of a small square with a circle in it, crossed by a diagonal line, to label photographs so that viewers knew immediately which pictures had been heavily altered. While the industry was not receptive then, it might now be useful to propose another such symbol—a small square with “AI” printed in the center, perhaps—to accompany synthetic images. Or photorealistic images could be presented online with different kinds of frames that clearly designate how they have been created or modified.

The Four Corners Project, an open-source software that I initially proposed in 2004, might also be useful; it has recently been utilized by the Starling Lab for Data Integrity, a collaboration between Stanford and the USC Shoah Foundation that’s trying to help historians, legal experts, and news organizations navigate such digital shoals. The software contextualizes a photograph by providing a template that can embed the photo’s corners with certain kinds of information: from the photographer’s code of ethics, to the account of a bystander, to related images that might amplify what the photograph depicts. As I suggested in that 1984 article, professional photographers may need to assume larger roles as authors of their images, providing as much context as possible.

We have gone from an era in which “the camera never lies” to one in which photorealistic images can be conjured up without the use of a camera. It will take a concerted effort by the public, companies, and governments, as well as by journalists, educators, and technologists, to understand the implications of this profound change in imagery, to work quickly to limit its potential destructiveness, and to attempt to restore the visual record.

Much is at stake, including an informed citizenry. Otherwise, in the coming decade or so, it will be as if we managed to kill the goose that laid the golden egg: the credible witness.


Fred Ritchin has written several books on the future of imaging, including After Photography and Bending the Frame. He is dean emeritus of the school at New York’s International Center of Photography and former picture editor of The New York Times Magazine.