Google introduces Gemini 2.0 Flash Thinking to rival OpenAI o1

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more
In its latest push to redefine the AI landscape, Google announced Gemini 2.0 Flash Thinkinga multimodal reasoning model capable of tackling complex problems with speed and transparency.
In a post on the X social networkGoogle CEO Sundar Pichai wrote that it was: “Our most thoughtful model ever :)”
And on developer documentationGoogle explains, “The Way of Thinking is capable of stronger reasoning skills in its answers than the base. Gemini 2.0 Flash model“, which was previously Google’s latest and greatest, released just eight days ago.
The new model only supports 32,000 input tokens (approx 50-60 pages of text) and can produce 8,000 tokens per output response. In a side panel in Google AI Studio, the company claims it’s better for “multimodal understanding, reasoning” and “coding.”
Full details of the model’s training process, architecture, licensing and costs have yet to be released. Now, it shows zero cost per token in Google AI Studio.
Reasoning accessible and more transparent
Unlike competitors’ reasoning models o1 and o1 mini from OpenAIGemini 2.0 allows users to access its reasoning step by step through a drop-down menu, offering a clearer and more transparent view of how the model reaches its conclusions.

By allowing users to see how decisions are made, Gemini 2.0 resolves long-standing concerns about AI operating as a “black box,” and brings this model – licensing terms still unclear – on a par with other open-source models supported by competitors.
My first simple tests of the model showed correctly and quickly (in one to three seconds) answered some questions that have been notoriously difficult for other AI models, such as counting the number of Rs in the word “Strawberry”. (See screenshot above).
In another test, when comparing two decimal numbers (9.9 and 9.11), the model systematically broke the problem into smaller steps, from the analysis of whole numbers to compare the decimals.
These results are supported by independent third-party analysis LM Arenawho called Gemini 2.0 Flash Thinking the number one performance model in all LLM categories.
Native support for image uploads and analysis
In a further improvement over the rival OpenAI o1 family, Gemini 2.0 Flash Thinking is designed to process images from the jump.
o1 was launched as a text-only model, but has since expanded to include image and file analysis. Both models can also return only text, at this time.
Gemini 2.0 Flash Thinking also does not currently support grounding with Google Search, or integration with other Google apps and external third-party tools, according to the developer documentation.
The multimodal capability of Gemini 2.0 Flash Thinking expands its possible use cases, allowing it to address scenarios that combine different types of data.
For example, in a test, the model solved a puzzle that required the analysis of textual and visual elements, demonstrating its versatility in integration and reasoning in all formats.
Developers can leverage these features via Google AI Studio and Vertex AI, where the model is available for experimentation.
As the AI landscape grows ever more competitive, Gemini 2.0 Flash Thinking could mark the beginning of a new era for problem-solving models. Its ability to manage different types of data, offer visible reasoning, and perform at scale positions it as a serious competitor in the artificial intelligence market of reasoning, rivaling the o1 family of OpenAI and beyond.
https://venturebeat.com/wp-content/uploads/2024/12/robot-thinking.png?w=1024?w=1200&strip=all
2024-12-19 18:04:34