gpt-4o Archives - AI News

OpenAI pulls free GPT-4o image generator after one day

Ryan Daws — Thu, 27 Mar 2025 12:24:39 +0000

OpenAI has pulled its upgraded image generation feature, powered by the advanced GPT-4o reasoning model, from the free tier of ChatGPT.

The decision comes just a day after the update was launched, following an unforeseen surge in users creating images in the distinctive style of renowned Japanese animation house, Studio Ghibli.

The update, which promised to deliver enhanced realism in both AI-generated images and text, was intended to showcase the capabilities of GPT-4o.

This new model employs an “autoregressive approach” to image creation, building visuals from left to right and top to bottom, a method that contrasts with the simultaneous generation employed by older models. This technique is designed to improve the accuracy and lifelike quality of the imagery produced.

Furthermore, the new model generates sharper and more coherent text within images, addressing a common shortcoming of previous AI models which often resulted in blurry or nonsensical text.

OpenAI also conducted post-launch training, guided by human feedback, to identify and rectify common errors in both text and image outputs.

However, the public response to the image generation upgrade took an unexpected turn almost immediately after its release on ChatGPT.

Users embraced the ability to create images in the iconic style of Studio Ghibli, sharing their imaginative creations across various social media platforms. These included reimagined scenes from classic films like “The Godfather” and “Star Wars,” as well as popular internet memes such as “distracted boyfriend” and “disaster girl,” all rendered with the aesthetic of the beloved animation studio.

Even OpenAI CEO Sam Altman joined in on the fun, changing his X profile picture to a Studio Ghibli-esque rendition of himself:

However, later that day, Altman posted on X announcing a temporary delay in the rollout of the image generator update for free ChatGPT users.

While paid subscribers to ChatGPT Plus, Pro, and Team continue to have access to the feature, Altman provided no specific timeframe for when the functionality would return to the free tier.

images in chatgpt are wayyyy more popular than we expected (and we had pretty high expectations).

rollout to our free tier is unfortunately going to be delayed for awhile.
— Sam Altman (@sama) March 26, 2025

The virality of the Studio Ghibli-style images seemingly prompted OpenAI to reconsider its rollout strategy. While the company had attempted to address ethical and legal considerations surrounding AI image generation, the sheer volume and nature of the user-generated content appear to have caught them off-guard.

The intersection of AI-generated art and intellectual property rights is a complex and often debated area. Style is not historically considered as being protected by copyright law in the same respect as specific works.

Despite this legal nuance, OpenAI’s swift decision to withdraw the GPT-4o image generation feature from its free tier suggests a cautious approach. The company appears to be taking a step back to evaluate the situation and determine its next course of action in light of the unexpected popularity of Ghibli-inspired AI art.

OpenAI’s decision to roll back the deployment of its latest image generation feature underscores the ongoing uncertainty around not just copyright law, but also the ethical implications of using AI to replicate human creativity.

(Photo by Kai Pilger)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post OpenAI pulls free GPT-4o image generator after one day appeared first on AI News.

OpenAI delivers GPT-4o fine-tuning

Ryan Daws — Wed, 21 Aug 2024 16:18:17 +0000

OpenAI has announced the release of fine-tuning capabilities for its GPT-4o model, a feature eagerly awaited by developers. To sweeten the deal, OpenAI is providing one million free training tokens per day for every organisation until 23rd September.

Tailoring GPT-4o using custom datasets can result in enhanced performance and reduced costs for specific applications. Fine-tuning enables granular control over the model’s responses, allowing for customisation of structure, tone, and even the ability to follow intricate, domain-specific instructions.

Developers can achieve impressive results with training datasets comprising as little as a few dozen examples. This accessibility paves the way for improvements across various domains, from complex coding challenges to nuanced creative writing.

“This is just the start,” assures OpenAI, highlighting their commitment to continuously expand model customisation options for developers.

GPT-4o fine-tuning is available immediately to all developers across all paid usage tiers. Training costs are set at 25 per million tokens, with inference priced at 3.75 per million input tokens and $15 per million output tokens.

OpenAI is also making GPT-4o mini fine-tuning accessible with two million free daily training tokens until 23rd September. To access this, select “gpt-4o-mini-2024-07-18” from the base model dropdown on the fine-tuning dashboard.

The company has collaborated with select partners to test and explore the potential of GPT-4o fine-tuning:

Cosine’s Genie, an AI-powered software engineering assistant, leverages a fine-tuned GPT-4o model to autonomously identify and resolve bugs, build features, and refactor code alongside human developers. By training on real-world software engineering examples, Genie has achieved a state-of-the-art score of 43.8% on the new SWE-bench Verified benchmark, marking the largest improvement ever recorded on this benchmark.

Distyl, an AI solutions provider, achieved first place on the BIRD-SQL benchmark after fine-tuning GPT-4o. This benchmark, widely regarded as the leading text-to-SQL test, saw Distyl’s model achieve an execution accuracy of 71.83%, demonstrating superior performance across demanding tasks such as query reformulation and SQL generation.

OpenAI reassures users that fine-tuned models remain entirely under their control, with complete ownership and privacy of all business data. This means no data sharing or utilisation for training other models.

Stringent safety measures have been implemented to prevent misuse of fine-tuned models. Continuous automated safety evaluations are conducted, alongside usage monitoring, to ensure adherence to OpenAI’s robust usage policies.

(Photo by Matt Artz)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post OpenAI delivers GPT-4o fine-tuning appeared first on AI News.

GPT-4o delivers human-like AI interaction with text, audio, and vision integration

Ryan Daws — Tue, 14 May 2024 12:43:56 +0000

OpenAI has launched its new flagship model, GPT-4o, which seamlessly integrates text, audio, and visual inputs and outputs, promising to enhance the naturalness of machine interactions.

GPT-4o, where the “o” stands for “omni,” is designed to cater to a broader spectrum of input and output modalities. “It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs,” OpenAI announced.

Users can expect a response time as quick as 232 milliseconds, mirroring human conversational speed, with an impressive average response time of 320 milliseconds.

Pioneering capabilities

The introduction of GPT-4o marks a leap from its predecessors by processing all inputs and outputs through a single neural network. This approach enables the model to retain critical information and context that were previously lost in the separate model pipeline used in earlier versions.

Prior to GPT-4o, ‘Voice Mode’ could handle audio interactions with latencies of 2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4. The previous setup involved three distinct models: one for transcribing audio to text, another for textual responses, and a third for converting text back to audio. This segmentation led to loss of nuances such as tone, multiple speakers, and background noise.

As an integrated solution, GPT-4o boasts notable improvements in vision and audio understanding. It can perform more complex tasks such as harmonising songs, providing real-time translations, and even generating outputs with expressive elements like laughter and singing. Examples of its broad capabilities include preparing for interviews, translating languages on the fly, and generating customer service responses.

Nathaniel Whittemore, Founder and CEO of Superintelligent, commented: “Product announcements are going to inherently be more divisive than technology announcements because it’s harder to tell if a product is going to be truly different until you actually interact with it. And especially when it comes to a different mode of human-computer interaction, there is even more room for diverse beliefs about how useful it’s going to be.

“That said, the fact that there wasn’t a GPT-4.5 or GPT-5 announced is also distracting people from the technological advancement that this is a natively multimodal model. It’s not a text model with a voice or image addition; it is a multimodal token in, multimodal token out. This opens up a huge array of use cases that are going to take some time to filter into the consciousness.”

Performance and safety

GPT-4o matches GPT-4 Turbo performance levels in English text and coding tasks but outshines significantly in non-English languages, making it a more inclusive and versatile model. It sets a new benchmark in reasoning with a high score of 88.7% on 0-shot COT MMLU (general knowledge questions) and 87.2% on the 5-shot no-CoT MMLU.

The model also excels in audio and translation benchmarks, surpassing previous state-of-the-art models like Whisper-v3. In multilingual and vision evaluations, it demonstrates superior performance, enhancing OpenAI’s multilingual, audio, and vision capabilities.

OpenAI has incorporated robust safety measures into GPT-4o by design, incorporating techniques to filter training data and refining behaviour through post-training safeguards. The model has been assessed through a Preparedness Framework and complies with OpenAI’s voluntary commitments. Evaluations in areas like cybersecurity, persuasion, and model autonomy indicate that GPT-4o does not exceed a ‘Medium’ risk level across any category.

Further safety assessments involved extensive external red teaming with over 70 experts in various domains, including social psychology, bias, fairness, and misinformation. This comprehensive scrutiny aims to mitigate risks introduced by the new modalities of GPT-4o.

Availability and future integration

Starting today, GPT-4o’s text and image capabilities are available in ChatGPT—including a free tier and extended features for Plus users. A new Voice Mode powered by GPT-4o will enter alpha testing within ChatGPT Plus in the coming weeks.

Developers can access GPT-4o through the API for text and vision tasks, benefiting from its doubled speed, halved price, and enhanced rate limits compared to GPT-4 Turbo.

OpenAI plans to expand GPT-4o’s audio and video functionalities to a select group of trusted partners via the API, with broader rollout expected in the near future. This phased release strategy aims to ensure thorough safety and usability testing before making the full range of capabilities publicly available.

“It’s hugely significant that they’ve made this model available for free to everyone, as well as making the API 50% cheaper. That is a massive increase in accessibility,” explained Whittemore.

OpenAI invites community feedback to continuously refine GPT-4o, emphasising the importance of user input in identifying and closing gaps where GPT-4 Turbo might still outperform.

(Image Credit: OpenAI)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post GPT-4o delivers human-like AI interaction with text, audio, and vision integration appeared first on AI News.