image generation Archives - AI News

Midjourney V7: Faster AI image generation

Ryan Daws — Fri, 04 Apr 2025 16:34:12 +0000

Midjourney has announced the alpha release of its V7 image generation model for testing by the AI community. The new model packs improvements in text prompt understanding, image quality, and feature coherence.

“V7 is an amazing model. It’s much smarter with text prompts, image prompts look fantastic, image quality is noticeably higher with beautiful textures, and bodies, hands, and objects of all kinds have significantly better coherence on all details,” Midjourney explained.

A key innovation in V7 is the default activation of model personalisation. Users must initially unlock this feature, a process that takes approximately five minutes. This personalisation can be toggled on or off at any time and is intended to significantly improve the AI’s ability to interpret user desires and aesthetic preferences. Midjourney believes this feature sets a new benchmark for understanding user intent.

Midjourney is also introducing a feature alongside the V7 image generation model called ‘Draft Mode,’ which promises to generate images ten times faster and at half the cost.

This increased speed has enabled Midjourney to implement a unique “conversational mode” on its web interface. Users can now instruct the system to make changes, such as replacing a cat with an owl or altering the time of day to nighttime, and the AI will automatically adjust the prompt and initiate a new image generation task.

Draft Mode also incorporates voice input functionality. By pressing the microphone button, users can verbally articulate their ideas and observe the images as they are generated in near real-time:

Midjourney believes that Draft Mode offers an unprecedented method for refining creative concepts. If a generated image is appealing, users can select the ‘enhance’ or ‘vary’ options to re-render it at full quality. While draft images are of a lower quality compared to the standard mode, their behaviour and aesthetic characteristics remain consistent.

The V7 image generation model from Midjourney will initially be available in two speed modes: Turbo and Relax. The standard speed mode is currently undergoing further optimisation and is expected to be released shortly. Midjourney has clarified that Turbo jobs will cost twice as much as a standard job, while draft jobs will cost half the amount.

The company also provided updates on other functionalities. Features such as upscaling, editing, and retexturing will initially revert to using the V6 model, with updates planned for the future. Functionality for mood boards and SREF is currently operational and performance is expected to improve with subsequent updates.

Looking to the near future, Midjourney has outlined an active development schedule. Users can expect new features every one to two weeks for the next 60 days. A significant upcoming feature will be a new V7 character and object reference capability.

Finally, Midjourney has advised users that V7 is an entirely new model with its own unique strengths and potential weaknesses. They encourage experimentation and feedback on its capabilities, reminding users that it may require different prompting techniques compared to previous versions.

(Image credit: Midjourney)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Midjourney V7: Faster AI image generation appeared first on AI News.

OpenAI pulls free GPT-4o image generator after one day

Ryan Daws — Thu, 27 Mar 2025 12:24:39 +0000

OpenAI has pulled its upgraded image generation feature, powered by the advanced GPT-4o reasoning model, from the free tier of ChatGPT.

The decision comes just a day after the update was launched, following an unforeseen surge in users creating images in the distinctive style of renowned Japanese animation house, Studio Ghibli.

The update, which promised to deliver enhanced realism in both AI-generated images and text, was intended to showcase the capabilities of GPT-4o.

This new model employs an “autoregressive approach” to image creation, building visuals from left to right and top to bottom, a method that contrasts with the simultaneous generation employed by older models. This technique is designed to improve the accuracy and lifelike quality of the imagery produced.

Furthermore, the new model generates sharper and more coherent text within images, addressing a common shortcoming of previous AI models which often resulted in blurry or nonsensical text.

OpenAI also conducted post-launch training, guided by human feedback, to identify and rectify common errors in both text and image outputs.

However, the public response to the image generation upgrade took an unexpected turn almost immediately after its release on ChatGPT.

Users embraced the ability to create images in the iconic style of Studio Ghibli, sharing their imaginative creations across various social media platforms. These included reimagined scenes from classic films like “The Godfather” and “Star Wars,” as well as popular internet memes such as “distracted boyfriend” and “disaster girl,” all rendered with the aesthetic of the beloved animation studio.

Even OpenAI CEO Sam Altman joined in on the fun, changing his X profile picture to a Studio Ghibli-esque rendition of himself:

However, later that day, Altman posted on X announcing a temporary delay in the rollout of the image generator update for free ChatGPT users.

While paid subscribers to ChatGPT Plus, Pro, and Team continue to have access to the feature, Altman provided no specific timeframe for when the functionality would return to the free tier.

images in chatgpt are wayyyy more popular than we expected (and we had pretty high expectations).

rollout to our free tier is unfortunately going to be delayed for awhile.
— Sam Altman (@sama) March 26, 2025

The virality of the Studio Ghibli-style images seemingly prompted OpenAI to reconsider its rollout strategy. While the company had attempted to address ethical and legal considerations surrounding AI image generation, the sheer volume and nature of the user-generated content appear to have caught them off-guard.

The intersection of AI-generated art and intellectual property rights is a complex and often debated area. Style is not historically considered as being protected by copyright law in the same respect as specific works.

Despite this legal nuance, OpenAI’s swift decision to withdraw the GPT-4o image generation feature from its free tier suggests a cautious approach. The company appears to be taking a step back to evaluate the situation and determine its next course of action in light of the unexpected popularity of Ghibli-inspired AI art.

OpenAI’s decision to roll back the deployment of its latest image generation feature underscores the ongoing uncertainty around not just copyright law, but also the ethical implications of using AI to replicate human creativity.

(Photo by Kai Pilger)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post OpenAI pulls free GPT-4o image generator after one day appeared first on AI News.

Stability AI releases most powerful image generation models to date

Ryan Daws — Tue, 22 Oct 2024 16:28:28 +0000

Stability AI has announced the release of Stable Diffusion 3.5, marking a leap forward in open-source AI image generation models.

The latest models from Stability AI include multiple variants designed to cater to different user needs, from hobbyists to enterprise-level applications.

The announcement follows June’s Stable Diffusion 3 Medium release, which the company acknowledges didn’t meet expectations.

“This release didn’t fully meet our standards or our communities’ expectations,” Stability AI stated.

Rather than rushing a quick fix, Stability AI says it invested time in developing a more robust solution.

Introducing Stable Diffusion 3.5, our most powerful models yet.

This open release includes multiple variants that are highly customizable for their size, run on consumer hardware, and are free for both commercial and non-commercial use under the permissive Stability AI Community… pic.twitter.com/KlyE8OjrxN
— Stability AI (@StabilityAI) October 22, 2024

The flagship model, Stable Diffusion 3.5 Large, boasts 8 billion parameters and operates at 1 megapixel resolution—making it the most powerful in the Stable Diffusion family. Alongside it, the Large Turbo variant offers comparable quality but generates images in just four steps, significantly reducing processing time.

A Medium version, scheduled for release on 29th October, will feature 2.5 billion parameters and support image generation between 0.25 and 2 megapixel resolution. This variant is specifically optimised for consumer hardware.

The models incorporate Query-Key Normalisation in transformer blocks, enhancing training stability and simplifying fine-tuning processes. However, this flexibility comes with trade-offs, including greater variation in outputs from identical prompts with different seeds.

Stability AI has implemented a notably permissive community licence for the release. The models are free for non-commercial use and available to businesses with annual revenues under $1 million. Enterprises exceeding this threshold must secure separate licensing arrangements.

The company emphasised its commitment to responsible AI development, implementing safety measures from the early stages. Additional features, including ControlNets for advanced control features, are planned for release following the Medium model’s launch.

Stability AI’s latest image generation models are currently available via Hugging Face and GitHub, with additional access through platforms including the Stability AI API, Replicate, ComfyUI, and DeepInfra.

(Image Credit: Stability AI)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Stability AI releases most powerful image generation models to date appeared first on AI News.

Google pledges to fix Gemini’s inaccurate and biased image generation

Ryan Daws — Thu, 22 Feb 2024 15:11:11 +0000

Google’s Gemini model has come under fire for its production of historically-inaccurate and racially-skewed images, reigniting concerns about bias in AI systems.

The controversy arose as users on social media platforms flooded feeds with examples of Gemini generating pictures depicting racially-diverse Nazis, black medieval English kings, and other improbable scenarios.

Google Gemini Image generation model receives criticism for being 'Woke'.

Gemini generated diverse images for historically specific prompts, sparking debates on accuracy versus inclusivity. pic.twitter.com/YKTt2YY265
— Darosham (@Darosham_) February 22, 2024

Meanwhile, critics also pointed out Gemini’s refusal to depict Caucasians, churches in San Francisco out of respect for indigenous sensitivities, and sensitive historical events like Tiananmen Square in 1989.

In response to the backlash, Jack Krawczyk, the product lead for Google’s Gemini Experiences, acknowledged the issue and pledged to rectify it. Krawczyk took to social media platform X to reassure users:

https://twitter.com/JackK/status/1760334258722250785

For now, Google says it is pausing the image generation of people:

We're already working to address recent issues with Gemini's image generation feature. While we do this, we're going to pause the image generation of people and will re-release an improved version soon. https://t.co/SLxYPGoqOZ
— News from Google (@NewsFromGoogle) February 22, 2024

While acknowledging the need to address diversity in AI-generated content, some argue that Google’s response has been an overcorrection.

Marc Andreessen, the co-founder of Netscape and a16z, recently created an “outrageously safe” parody AI model called Goody-2 LLM that refuses to answer questions deemed problematic. Andreessen warns of a broader trend towards censorship and bias in commercial AI systems, emphasising the potential consequences of such developments.

Addressing the broader implications, experts highlight the centralisation of AI models under a few major corporations and advocate for the development of open-source AI models to promote diversity and mitigate bias.

Yann LeCun, Meta’s chief AI scientist, has stressed the importance of fostering a diverse ecosystem of AI models akin to the need for a free and diverse press:

We need open source AI foundation models so that a highly diverse set of specialized models can be built on top of them.
We need a free and diverse set of AI assistants for the same reasons we need a free and diverse press.
They must reflect the diversity of languages, culture,… https://t.co/9WuEy8EPG5
— Yann LeCun (@ylecun) February 21, 2024

Bindu Reddy, CEO of Abacus.AI, has similar concerns about the concentration of power without a healthy ecosystem of open-source models:

If we don't have open-source LLMs, history will be completely distorted and obfuscated by proprietary LLMs

We already live in a very dangerous and censored world where you are not allowed to speak your mind.

Censorship and concentration of power is the very definition of an…
— Bindu Reddy (@bindureddy) February 21, 2024

As discussions around the ethical and practical implications of AI continue, the need for transparent and inclusive AI development frameworks becomes increasingly apparent.

(Photo by Matt Artz on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google pledges to fix Gemini’s inaccurate and biased image generation appeared first on AI News.

OpenAI’s latest neural network creates images from written descriptions

Ryan Daws — Wed, 06 Jan 2021 18:28:28 +0000

OpenAI has debuted its latest jaw-dropping innovation, an image-generating neural network called DALL·E.

DALL·E is a 12-billion parameter version of GPT-3 which is trained to generate images from text descriptions.

“We find that DALL·E is able to create plausible images for a great variety of sentences that explore the compositional structure of language,“ OpenAI explains.

Generated images can range from drawings, to objects, and even manipulated real-world photos. Here are some examples of each provided by OpenAI:

Just as OpenAI’s GPT-3 text generator caused alarm about implications such as helping to create fake news for the kinds of disinformation campaigns recently seen around COVID-19, 5G, and attempting to influence various democratic processes—similar concerns will be raised about the company’s latest innovation.

People are increasingly aware of fake news and not to believe everything they read, especially from unknown sources without good track records. However, as humans, we’re still used to believing what we can see with our eyes. Fake news with fake supporting imagery is a rather convincing combination.

Much like it argued with GPT-3, OpenAI essentially says that – by putting the technology out there as responsibly as possible – it helps to raise awareness and drives research into how the implications can be tackled before such neural networks are inevitably created and used by malicious parties.

“We recognise that work involving generative models has the potential for significant, broad societal impacts,” OpenAI said.

“In the future, we plan to analyse how models like DALL·E relate to societal issues like economic impact on certain work processes and professions, the potential for bias in the model outputs, and the longer-term ethical challenges implied by this technology.”

Technological advancements will almost always be used for damaging purposes—but often the benefits outweigh the risks. I’d wager you could write pages about the good and bad sides of the internet, but overall it’s a pretty fantastic thing.

When it comes down to it: If the “good guys” don’t build it, you can be sure the bad ones will.

(Image Credit: Justin Jay Wang/OpenAI)

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

The post OpenAI’s latest neural network creates images from written descriptions appeared first on AI News.