Why did ChatGPT `o1` start responding instantly after the 2024-12-05 update, even though the description still states "advanced reasoning", and why has the response quality become significantly inferior to the previous `o1-preview-2024-09-12` model?

dmitrii_fediuk · December 8, 2024, 4:40am

1. What happened after the 2024-12-05 update?

The 2024-12-05 update for the ChatGPT o1 model changed the nature of its responses, even though "advanced reasoning" remained in the description.
Some users have perceived a noticeable decrease in response quality compared to the initial o1-preview-2024-09-12 version.
Discussions on platforms like Reddit highlight that many community members feel the model's depth and accuracy have been compromised, indicating that this issue is not just a private opinion but a broader sentiment within the user base.
"Advanced reasoning" refers to the model's ability to break down complex problems into logical steps, conduct intermediate calculations or verifications, and provide a well-structured explanation of how it arrives at its conclusions.
It involves not merely stating an answer, but illustrating the thought process behind it.
In practice, however, many users note that while o1 now responds almost instantly, this apparent efficiency comes at the expense of deep reasoning and thoroughness, suggesting a deliberate trade-off between speed and complexity.

2. Changes introduced in the 2024-12-05 update

2.1. Initial o1-preview version

The original o1-preview-2024-09-12 model prioritized deep reasoning.
It employed a "chain-of-thought" approach for step-by-step problem-solving in areas like mathematics, science, and programming.
This method demanded substantial computational resources, which supported detailed, high-quality explanations.

2.2. Post-update adjustments

OpenAI's developers introduced several adjustments that streamlined the model's reasoning process.
This may have impacted the depth and accuracy of answers.
Users who appreciated the earlier version's detailed reasoning have noted a decline in the complexity and thoroughness of responses.
After these adjustments, the model may generate shorter reasoning chains, provide fewer detailed intermediate steps in mathematical solutions, and offer more superficial explanations in scientific or programming contexts.
Additionally, some users suspect that the model was specifically tuned to perform well on certain benchmarks (such as advanced math tests) at the expense of more general, real-world reasoning tasks.
This overemphasis on particular metrics may explain why formal test results have improved, while everyday user scenarios feel degraded.

3. Quality changes after the update

Although the description still mentions "advanced reasoning," in practice, the model's capacity for deep, sequential reasoning appears more limited.
Some users also express concern that the higher-quality features are effectively locked behind the $200-per-month ChatGPT Pro paywall, making in-depth reasoning less accessible, especially for those in regions with lower incomes or limited resources.
This economic barrier has generated frustration and criticism, as it restricts better output quality to wealthier users.
A number of commentators note that the role of the o1 pro mode now seems to embody what o1-preview was originally supposed to represent.
This reframing of the product tiers suggests that o1 pro mode has effectively taken over the mantle of deep reasoning, pushing standard o1 into a lesser role.
Paying subscribers at the $20 level express particular disappointment, feeling that the shift from o1-preview to o1 left them with inferior quality answers.
Many expected some form of compensation or improvement but instead found themselves with a downgraded experience that fails to justify their expenditure.
Additionally, some users describe the updated o1 model's behavior as "lazy," implying that it often gives minimalistic answers and avoids diving deeper into logical steps unless explicitly prompted to do so.
There is a sense that the model is exerting less effort by default, placing more responsibility on the user to push for detail.
There are also suggestions that OpenAI may have simplified o1's reasoning mechanisms to reduce computational expenses.
By trimming down the complexity of the model's reasoning process, the company could be lowering operating costs, albeit at the expense of providing robust and thorough explanations.
This trade-off underscores the tension between cost optimization and maintaining the level of reasoning users previously enjoyed.
Some users note that the model now often resorts to overly simplistic fixes instead of optimizing code or reasoning through complex constraints.
For instance, when dealing with asynchronous code, o1 might suggest running it sequentially, effectively removing the core advantage of the original approach.
This shift reflects a move toward safe but less functional solutions.
Another consequence is that developers are increasingly forced to rely on external tools like diff utilities to verify and merge code provided by o1.
The model's inaccuracies and omissions mean that what was once a seamless coding assistant now requires extra verification steps, turning cooperation with the model into additional overhead.
Prompt engineering, already essential, has now become even more critical.
Users must vigilantly guide the model with strict instructions and repeated clarifications to ensure that it doesn't remove or alter key parts of their code.
Instead of a fluid dialogue partner, o1 acts more like a system that needs constant supervision.
Furthermore, the model's unpredictability over longer sessions has intensified.
It may comply with instructions initially but then gradually veer off, ignoring context or previous constraints as the session continues, making it unreliable for extensive, multi-step projects.

3.1. Comparison of different versions

Version	Reasoning Depth	Image Upload Support	Usage Limits
`o1-preview-2024-09-12`	Deep, step-by-step	No	Limited
`o1` (post-2024-12-05)	Reduced complexity	No	Limited
`o1 pro mode` (ChatGPT Pro)	Similar to o1-preview	Yes (images only)	Unlimited

3.2. Additional user perspectives

Some community members mention that the o1-mini variant can sometimes produce better answers than the updated o1 model, suggesting that not all changes have led to uniformly improved or even consistent quality.
Others speculate that the model's quality might shift again over time.
They recall previous instances where initial disappointments were followed by improvements, implying that further fine-tuning or updates could restore or enhance reasoning capabilities.
Users also criticize the unchanged usage limits, feeling that no benefit or compensation was offered to those who stuck with the service.
The absence of increased quotas or more generous allocations has further fueled dissatisfaction.
In addition, several other important observations have emerged from user discussions:

Loss of transparency in reasoning: Many users feel that the model no longer visibly demonstrates its intermediate thought process, giving final answers without revealing the logical steps behind them.
Partial or conditional code answers: Code responses now frequently contain placeholders or incomplete elements, requiring multiple follow-up requests for a fully formed solution.
Reduced amount of code and explanations: Where o1-preview provided lengthy, detailed code samples and in-depth reasoning, o1 often truncates explanations or provides far less code.
Necessity of repeated clarifications: To achieve a level of detail once offered by o1-preview, users must iterate prompts multiple times.
Increased caution and adherence to rules: The model is more conservative and less inventive.
More template-like answers: Responses feel formulaic and lack the nuance and originality that once distinguished o1-preview.
Comparison with older models: Some users liken the new o1 to earlier-generation models, suggesting regression in capabilities.
Discrepancy between claims and reality: The mention of "advanced reasoning" despite the decline in depth erodes trust in OpenAI's branding.

Beyond these points, there is also a noted decrease in originality and willingness to experiment.
Users miss the model's previously bolder approach, where it would offer more creative or riskier solutions.
Now o1 tends to default to safe, conventional answers, further diminishing its appeal to those who valued its innovative reasoning style.
Finally, the diminished complexity and creativity, combined with persistent quality issues, have led many to feel that the "wow factor" once associated with o1-preview is gone.
While the model may still outperform older-generation systems, the sense of genuine progression and impressive novelty has given way to disappointment and a feeling of regression.
The overall emotional tenor of community feedback ranges from deep frustration — with some canceling subscriptions or migrating to competitor platforms — to a more optimistic view held by others, who note that even the reduced-depth version of o1 remains impressive by historical standards.
Yet this optimism is increasingly overshadowed by the sense that the model no longer provides the inspiring, high-level reasoning that initially captivated users.

4. ChatGPT Pro features

OpenAI offers a premium subscription called ChatGPT Pro at a cost of $200 per month.
This subscription provides unlimited access to the o1 model as well as an exclusive o1 pro mode.
According to OpenAI, o1 pro mode operates approximately at the level of the o1-preview-2024-09-12 version, but also supports the upload of images (though not other file types).
With this upgraded tier, users can enjoy unlimited usage of the model, achieving a depth and complexity of reasoning similar to the earlier o1-preview experience, while also benefiting from the new image-upload capability.
For example, researchers analyzing complex datasets, engineers running intricate simulations, and programmers debugging multi-step logical processes may find o1 pro mode particularly valuable.
The added computational resources help restore the depth of reasoning and handle more specialized and nuanced challenges, making this tier especially appealing for professional or academic use.