If you’ve ever wanted to use artificial intelligence to quickly design a hybrid between a duck and a corgi, now is your time to shine.
On Wednesday, OpenAI announced that anyone can now use the most recent version of its AI-powered DALL-E tool to generate a seemingly limitless range of images just by typing in a few words, months after the startup began gradually rolling it out to users.
The move will likely expand the reach of a new crop of AI-powered tools that have already attracted a wide audience and challenged our fundamental ideas of art and creativity. But it could also add to concerns about how such systems could be misused when widely available.
“Learning from real-world use has allowed us to improve our safety systems, making wider availability possible today,” OpenAI said in a blog post. The company said it has also strengthened the ways it rebuffs users attempts to make its AI create “sexual, violent and other content.”
There are now three well-known, immensely powerful AI systems open to the public that can take in a few words and spit out an image. In addition to DALL-E 2, there’s Midjourney, which became publicly available in July, and Stable Diffusion, which was released to the public in August by Stability AI. All three offer some free credits to users who want to get a feel for making images with AI online; generally, after that, you have to pay.
These so-called generative AI systems are already being used for experimental films, magazine covers, and real-estate ads. An image generated with Midjourney recently won an art competition at the Colorado State Fair, and caused an uproar among artists.
In just months, millions of people have flocked to these AI systems. More than 2.7 million people belong to Midjourney’s Discord server, where users can submit prompts. OpenAI said in its Wednesday blog post that it has more than 1.5 million active users, who have collectively been making more than 2 million images with its system each day. (It should be noted that it can take many tries to get an image you’re happy with when you use these tools.)
Many of the images that have been created by users in recent weeks have been shared online, and the results can be impressive. They range from otherworldly landscapes and a painting of French aristocrats as penguins to a faux vintage photograph of a man walking a tardigrade.
The ascension of such technology, and the increasingly complicated prompts and resulting images, has impressed even longtime industry insiders. Andrej Karpathy, who stepped down from his post as Tesla’s director of AI in July, said in a recent tweet that after getting invited to try DALL-E 2 he felt “frozen” when first trying to decide what to type in and eventually typed “cat”.
“The art of prompts that the community has discovered and increasingly perfected over the last few months for text -> image models is astonishing,” he said.
But the popularity of this technology comes with potential downsides. Experts in AI have raised concerns that the open-ended nature of these systems — which makes them adept at generating all kinds of images from words — and their ability to automate image-making means they could automate bias on a massive scale. A simple example of this: When I fed the prompt “a banker dressed for a big day at the office” to DALL-E 2 this week, the results were all images of middle-aged white men in suits and ties.
“They’re basically letting the users find the loopholes in the system by using it,” said Julie Carpenter, a research scientist and fellow in the Ethics and Emerging Sciences Group at California Polytechnic State University, San Luis Obispo.
These systems also have the potential to be used for nefarious purposes, such as stoking fear or spreading disinformation via images that are altered with AI or entirely fabricated.
There are some limits for what images users can generate. For example, OpenAI has DALL-E 2 users agree to a content policy that tells them to not try to make, upload, or share pictures “that are not G-rated or that could cause harm.” DALL-E 2 also won’t run prompts that include certain banned words. But manipulating verbiage can get around limits: DALL-E 2 won’t process the prompt “a photo of a duck covered in blood,” but it will return images for the prompt “a photo of a duck covered in a viscous red liquid.” OpenAI itself mentioned this sort of “visual synonym” in its documentation for DALL-E 2.
Chris Gilliard, a Just Tech Fellow at the Social Science Research Council, thinks the companies behind these image generators are “severely underestimating” the “endless creativity” of people who are looking to do ill with these tools.
“I feel like this is yet another example of people releasing technology that’s sort of half-baked in terms of figuring out how it’s going to be used to cause chaos and create harm,” he said. “And then hoping that later on maybe there will be some way to address those harms.”
To sidestep potential issues, some stock-image services are banning AI images altogether. Getty Images confirmed to CNN Business on Wednesday that it will not accept image submissions that were created with generative AI models, and will take down any submissions that used those models. This decision applies to its Getty Images, iStock, and Unsplash image services.
“There are open questions with respect to the copyright of outputs from these models and there are unaddressed rights issues with respect to the underlying imagery and metadata used to train these models,” the company said in a statement.
But actually catching and restricting these images could prove to be a challenge.