In a previous post we looked at the similarity between a selection of images produced by different users employing Midjourney.
As stated in that post, while it is not rare for some users to spot a trend or a prompt on the #general Midjourney servers on Discord or on communities such as Reddit, and reuse it to generate their own images, quite often the similarities between pictures generated with Artificial Intelligence text-to-image applications are extremely striking and they do not depend from users employing other people's prompts. It is indeed not uncommon to stumble upon images generated by users who haven't seen each other's prompts but that look extremely similar one to the other for the clothes sported by the AI generated model in the picture, the angle or the colour palette.
As stated in a previous post, this may occur because of Mode Collapse, a common problem in Generative Adversarial Networks (GANs). Mode collapse occurs when GANs produce a limited set of outputs instead of exploring the full range of training data. Two main causes of mode collapse are catastrophic forgetting and discriminator overfitting.
Catastrophic forgetting happens when a GAN forgets knowledge learned from previous tasks while adapting to new ones; discriminator overfitting, on the other hand, contributes to diverse and high-quality outputs. The discriminator is responsible for distinguishing between real data (i.e., actual samples from the training dataset) and fake data (i.e., data generated by the GAN's generator). The generator aims to produce realistic data that can deceive the discriminator. Discriminator overfitting happens when the discriminator becomes too specialized or too good at identifying real data, to the extent that it becomes excessively accurate in its discrimination. As a result, the discriminator assigns extremely high confidence scores to real data points, approaching 100% certainty that they are real.
This might sound like a good thing, as the discriminator is effectively doing its job well. However, it creates a problem for the GAN's overall performance. When the discriminator becomes too good at distinguishing real from fake data, it becomes challenging for the generator to deceive it. Consequently, the generator's loss (the measure of how well the generator is performing) approaches zero, meaning the discriminator is no longer providing useful gradients to help the generator improve. In such cases, the generator essentially loses its learning signal and fails to produce diverse and realistic samples. It gets stuck in a limited set of output patterns, leading to mode collapse. Mode collapse means that the GAN generates only a few specific outputs repeatedly, rather than exploring the entire range of possible outputs that represent the diversity of the training data.
Understanding these causes helps developing more effective GANs capable of generating diverse and realistic outputs. By addressing mode collapse, GANs can continue to be powerful tools in deep learning for generating a wide range of data types.
But as users, is there a way to tame Mode Collapse? Well, one obvious tip is always challenging Artificial Intelligence with more advanced prompts, playing around with parameters and creating elaborate images; another option is reworking the first image the system generates after elaborating your prompt. After all, in most cases, you may have generated something cool, but not necessarily unique. You can do it by opting for the variation function but also by experimenting with the /describe function on Midjourney.
So, here's an exercise for fashion design students who may want to play around with Midjourney for inspirations. Let's start from two images: I generated the first one with a prompt that asked the Midjourney Bot to come up with an image of a woman wearing a Mexican wrestler's mask. I wanted it to be hyperreal, photographic, extra detailed and I specified the parameters for the quality and stylize functions (--q .5 --s 250).
The Bot didn't perform well, but generated an image of a male model in a feathered headdress and a 15th-16th century costume, think of a crossover between a jester and a conquistador and you get the idea.
Digital artist Fatima Travassos used Midjourney and came up instead with an image of a male model wearing elaborate horns, a sort of golden mask with long hanging wooden beads and a costume with some feather elements.
I'm not sure about her prompt, but, while the two images look different for the costumes donned by the model, they also look strikingly similar, as if we had both booked the same model who donned for the occasion the same black make-up around his eyes.
Can we avoid producing such similar images? Well, let's try altering them with the /describe function: this command requests the bot to write four prompts that describe what it is "seeing" in the image. Then you can pick one (or all) of them and start generating images.
The first description generated a prompt stating "a male costumed in a feathered costume, in the style of detailed portraiture, light red and light cyan, aztec art, porcelain, intense chiaroscuro portraits, historical reimagining, hand-painted details".
Results produced a sort of muscular warrior wearing a feather headdress and a very different expression from the model in the original image. The colours are also brighter in this interpretation – verging more towards vibrant turquoise and bright coral than towards the pastel tones of these shades in the original image.
Also the images generated by the second prompt – "a man dressed in feathered costumes for a masquerade, in the style of pre-columbian art, dark white and light red, detailed portraiture, porcelain, organically inspired body art, vibrant coloration, historical documentation" – relate to a sort of warrior or maybe a man in a ceremonial costume. The colour palette here is set on red and white.
The bot stuck to the same red and white palette for the image of the third prompt, that includes an important variation, as it refers to “a woman in a red and white henna mask, in the style of idealized native americans, detailed costumes, sacha goldberger, dayak art, made of feathers, polychrome terracotta, renaissance chiaroscuro".
The fourth prompt also revolves around the same palette and refers to "a woman dressed as an indian with feather, in the style of moche art, dark white and light red, 8k resolution, manticore, ritualistic masks, 15th century, hand-painted details".
One disappointing fact is that, left unprompted, Midjourney still generates mainly white models in most of these cases, showing an intrinsic latent racism.
While it is possible at this stage to take one of these prompts and alter it manually, let's pretend to be lazy and see where Midjourney takes us. So let's pick the first image and the third one (in this one the young woman portrayed wears an intriguing armour made of scales) from the fourth prompt and Upscale them.
After this stage, let's opt for the "Vary Strong" function to get another two sets of four women with more intricate face and body painting, a fathered headdress and elaborate jewellery/earrings.
As you can see, we have radically moved away from the first image we had, but you can obviously continue experimenting behind this point.
Now let’s move onto the second image generated by Travassos and ask the bot to provide us with four descriptions for that picture.
The first description Midjourney came up with states: "the new era of the viking masks with feathers on top of the head has been invading the contemporary culture, yet many artists are def, in the style of schlieren photography, white background, light gold and beige, futuristic fantasy, manticore, light gold and cyan, contest winner."
In this first prompt Midjourney focused on the actual mask of the model portrayed in the image, providing variations regarding to the shapes of the horns and the facial details.
The second prompt offers another variation - "a woman wearing a gold mask, in the style of hybrid creature compositions, white background, norwegian nature, idealized native americans, fawncore, contest winner, even mehl amundsen".
The horn theme remains, but we have a woman this time in a more elaborate make up and verging more towards the eerie and the horror.
Feathers reapper instead in the third prompt hinting at "a woman wearing a feather and feather embellished outfit and headdress, in the style of light gold and dark beige, masks and totems, white background, norwegian nature, edgy, symbolic use of animals, pop-culture-infused".
In this case the attire donned by the women in the images seems to be a combination of traditional tribal garments and film costumes.
The idea of a costume (for a film? a masked ball? Carnival? The Met Gala? the World of Wearable Art show?) returns in the fourth prompt.
Here the Midjourney Bot conjures up a mysterious figure, a sort of a bird superhero dressed in an elegant attire (maybe going to a wedding?), the description states, "a man with a head sculpted from feathers on a white sheet, in the style of masks and totems, light gold and beige, wildlife photography, women designers, lunarpunk, elaborate costumes, rudolph belarski".
Let's pick the third image and Upscale it, then opt for "Vary Strong" as we did in the other case. Once we get the new set of fourth images, let's Upscale the first and then opt again for the "Vary Strong" function. The results are obviously very different from the image we started from.
Now, while we are not guaranteed that Midjourney will not recreate at some point an image similar to these ones, this exercise has allowed us to come up with a series of variations, but the best thing about the process is that, while writing the descriptions for its prompts the Midjourney Bot added references we may not have thought about or may not have been aware of.
In these cases the Bot mentioned the late American graphic artist Rudolph Belarski, but also contemporary artists such as Norwegian Even Amundsen and French Sacha Goldberger. Actually, this last reference is a key to understand some of the moods in these pictures: Goldberger is indeed famous for combining Flemish moods with superheroes which sorts of explains us the reference Midjourney was using when it first generated my image (even though in my case I never mentioned the words "Flemish" nor "Sacha Goldberger").
Bizarrely, we have generated enough ground for the artists mentioned in these descriptions to sue Midjourney, because we have proved the system knows them and uses their works to generate new artworks, but this wasn't the main aim of this post (yet this point had just allowed us to make some interesting legal considerations).
This post was indeed a way to remind those fashion design students playing around with Midjourney that they should never stop to the first image they generate (or they could upload their own image, ask Midjourney to describe it and then play around with what the prompts they get…), but they should instead keep on researching and educating themselves, also using the references in the prompts generated by the Midjourney Bot with the /describe function to discover other artists, movements and moods and then understand how they can turn them into starting point for their works without incurring in infringements of copyright.
In a nutshell, always take the road less traveled, as Robert Frost states in the poem "The Road Not Taken", because that "will make all the difference".
Comments
You can follow this conversation by subscribing to the comment feed for this post.