Last year, I wrote an article for the Keynote about Dall E-2 (“AI Art in a Nutshell”), an image generation AI that seemed to kick off a huge trend of development in the field of AI. Dall E-2 was developed by Open AI, and recently the same company released ChatGPT. Now on to the topic of the article: the fusion of Dall E-2 and ChatGPT in Dall E-3.
In case you haven’t heard of it, Dall E-2 is an image generation AI that can take a prompt as input, such as “a realistic photo of a robot making art” and output an image of, you guessed it, a robot making art.
Now, take a look at an image created by Dall E-2 from the same prompt.
Definitely not as good. So what changed? Well, mainly, it was the addition of ChatGPT. The first image by itself is actually not so revolutionary, since images of a similar caliber have already been able to be produced for months now. However, these rely on complex algorithms that are fine-tuned and sometimes these models are then combined. For example, models may be trained to imitate a certain design or character. However, Dall E-3 seems to be able to accomplish realistic images without fine-tuning, meaning it can almost serve as a silver bullet making AI image generation more accessible. This can possibly be attributed to ChatGPT’s more extensive ability to understand what you want. This understanding also enables prompts to become much less complex. The whole goal of AI image generation is to make the consumer’s life easier, which can’t really be done when prompts look like a wall of text. ChatGPT seems to reduce this problem, letting everyone have the ability to make beautiful images. Another bonus of ChatGPT is that you also get the same character in each image, making the model more consistent. While at first glance this might not seem so important, consistency across images is key to making AI generated videos. Additionally, we get a feature that has been long awaited in the world of AI image generation—text.
Finally, Dall E-3 seems to be quite creative. Take, for example, this visualization of the saying “laughter is the best medicine” by @Randomized_AI on Twitter/X (I couldn’t recreate it myself). As a side note, if you’re wondering who’s more creative, a human or an AI, it’s usually an AI, according to a study published in Nature. You’ll be comforted to know, however, that the most creative humans still outperformed the AIs.
A good example of a lot of these new features and how you can go back-and-forth with it using ChatGPT can be found in the release announcement by OpenAI on their YouTube channel: https://youtu.be/sqQrN0iZBs0?si=Oe7XL3E32lrk3vJ0.