Predli Blog - DeepSeek R1: o1’s Open-Source Rival

DeepSeek R1: o1’s Open-Source Rival

For years, the spotlight shone on generative models like GPT-4 or Gemini, capable of producing fluid, natural-sounding text. But today, the conversation is shifting toward reasoning models that promise more human-like thinking, fewer hallucinations, and improved accountability in generated outputs. This evolution gained momentum late last year when OpenAI unveiled its o1 family of reasoning models, demonstrating groundbreaking capabilities in multi-step reasoning and complex problem-solving across diverse domains. However, its closed-source nature and high token costs have limited its reach. Just days ago, the Chinese company DeepSeek made waves in the AI space by launching DeepSeek R1, an advanced open-source reasoning LLM designed to directly challenge o1. With a 128K context length—comparable to that of o1—it sets the stage for more accessible and competitive AI innovation.

‍

What are reasoning models?

‍

Unlike traditional LLMs, which generate outputs by predicting the most statistically likely continuation based on input tokens, reasoning models take a more deliberate approach. They first generate a chain of thought—essentially reasoning through how to approach the answer—before providing the final response. This shift from a simple question-answer process to a question-reason-answer framework allows these models to tackle more complex queries, particularly in fields like science and math. The outcome is more accurate, logical, and explainable results.

‍

‍Why does it matter?

• Less Guesswork: By articulating how they arrive at answers, they make fewer errors and guess less.

• More Accountability: Seeing the reasoning behind responses allows the user to judge the model’s accuracy or any potential bias.

• Easier Debugging: When a mistake happens, users can identify precisely where the logic failed.

‍

How Are DeepSeek R1 Reasoning Tokens Created?

‍

DeepSeek R1 sets itself apart by providing full transparency, displaying its reasoning tokens alongside the final output. Rather than jumping to an answer, the model first “thinks out loud”, giving you a peek behind the curtain.

‍

The process relies on two core techniques:

‍Supervised Fine-Tuning (SFT): After initial training, the model undergoes fine-tuning with high-quality, labeled examples. This includes both reasoning and non-reasoning examples, enabling the model to enhance its performance across a broad spectrum of tasks.

‍

Reinforcement Learning (RL): A machine learning approach where an agent learns to make decisions by interacting with its environment and receiving feedback in the form of rewards. The agent adjusts its actions over time to maximize cumulative rewards. For DeepSeek R1, RL helps the model improve its reasoning abilities by generating a chain of thought before answering. It receives feedback on two factors: how logically structured its reasoning is and how accurate the final answer is. Based on this feedback, the model refines its thinking process and answers, progressively becoming better at reasoning out and solving complex tasks.

‍

By combining these techniques, DeepSeek’s V3 base model is transformed into the reasoning-capable R1 model. This addition of a reasoning layer allows the model to approach problem-solving by first reasoning through the task, rather than immediately jumping to an answer. Early benchmarks show that its performance is comparable to leading models, such as o1.

Performance Comparison of DeepSeek Models and OpenAI Models Across Benchmarks

‍

o1 vs. DeepSeek R1: Head-to-Head

‍

To get a clear picture of how o1 and DeepSeek R1 perform, we tested them on an array of challenges

‍

Coding: Reorienting a Tree

We started with a data-structure challenge: reorienting a tree so that any chosen node becomes the new root and then finding paths between specific nodes in this updated hierarchy.

o1 tackles this by path inversion, where parent-child links along the path from the original root to the chosen node are reversed. This method is lightning fast on balanced trees and uses minimal extra memory, but it mutates the original tree and can behave unpredictably in edge cases.

DeepSeek R1 adopts a neighbor map strategy, where it first treats the tree as an undirected graph and then performs a search (BFS or DFS) from the new root. This preserves the original tree’s structure and is more robust. However, building an adjacency list consumes more memory and the overhead makes it slower than o1’s approach.

In head-to-head tests, o1 solved 11 out of 15 cases, while DeepSeek R1 handled 14. In this case, o1 is like a Formula 1 car 🏎️, fast but risky on rough roads . DeepSeek R1 is a reliable SUV 🚙, less flashy but more dependable.

‍

Logic Puzzles:

Next, we challenged both models with two classic logic puzzles. The first puzzle—often called the “Workplace Riddle”—goes like this: “Kim is a developer with two sales colleagues. Each salesperson has two developer colleagues. How many developer colleagues does Kim have?”

‍

o1 initially answered 2 then iterated toward 1 showing its ability to reassess and test different assumptions. DeepSeek R1, on the other hand, struggled to interpret the riddle, took a lot of time but gave the correct answer too.

The second puzzle involved the sock drawer principle: “If a drawer contains 21 black, 15 white, and 17 blue socks, how many must you pull out in the dark to guarantee a matching pair?” This time, o1 briefly overthought potential trick details but finally settled on the correct answer of 4 while DeepSeek R1 spotted the pigeonhole principle right away and confidently replied 4 without hesitation.

From these two puzzles, it appears that o1 has an advantage when problems are ambiguous or open-ended, as it takes all assumptions into account. DeepSeek R1 tends to excel in structured math or logic tasks, quickly applying the relevant principle to reach a precise answer—though it may stumble or ask for more details if the problem statement is inherently vague.

‍

Socioeconomic and Ideological Queries:

Both o1 and DeepSeek R1 were asked about socio-political and economic topics like U.S. wealth inequality and tax systems, India’s views on wealth redistribution, and China’s take on the same. Here, the two models diverged significantly in depth and caution.

o1 took a broad and detailed stance, offering historical context. When asked about China, o1 continued its pattern of careful but comprehensive discussion.

DeepSeek R1’s responses were relatively concise but still informative in describing India and the U.S. However, when prompted about China R1 simply refused to answer.

DeepSeek R1 compared India’s tax system to US but avoided China

‍

Its design appears to prioritize systematic refusal to engage with controversial topics. This is evident not only in its handling of questions about China but also in its avoidance of discussions around Taiwan and sensitive social issues like abortion. This cautious approach may stem from regulatory pressures or internal guidelines aimed at minimizing potential backlash or controversy. In contrast, o1 demonstrates that AI can responsibly tackle controversial issues by citing data to present a balanced perspective on contentious topics.

The behaviors exhibited by both models highlight real design trade-offs in AI systems:

• o1's Approach: Balances performance and depth with a commitment to neutrality and comprehensive discussion.

• DeepSeek R1's Approach: Reflects a conservative compliance strategy that minimizes risk but at the cost of informational value.

‍

Features, Limitations, and Computational Demands

‍

DeepSeek R1 offers a competitive advantage with significantly lower pricing compared to the o1 models and even gpt-4o, even though 4o is a non-reasoning model.

Modified Pricing Comparison: Input/Output Costs for Inference Models, Including Added gpt-4o Data

‍

In addition to its competitive pricing, DeepSeek R1 offers supported features including Chat Completion and Chat Prefix Completion (Beta). However, it currently lacks support for Function Calls, JSON Output, and Fill-in-the-Middle (FIM) capabilities (Beta). This limitation means the agentic framework cannot yet be utilized with the DeepSeek API until these features are added.

With 671 billion parameters—the same as the DeepSeek V3 base model and double that of V2—DeepSeek R1 requires substantial computational resources to run efficiently. Running it typically requires multiple high-end GPUs, such as NVIDIA A100s or H100s, each with significant VRAM capacity. While its cost-effectiveness makes it appealing, these computational demands can present a barrier to entry for smaller-scale users or projects.

‍

Conclusion

‍

DeepSeek R1’s open-source approach marks a pivotal moment for the AI community, offering a transparent look at how reasoning models can be created and refined. By sharing its architecture, techniques, and limitations, it fosters collaboration and invites researchers to build on its foundation. While it may be slower in certain scenarios, DeepSeek R1 matches o1 in structured problem-solving and sets itself apart by displaying its reasoning tokens. However, its cautious design, which avoids certain controversial topics, reflects a trade-off between compliance and informational depth. This contrasts with o1’s broader, more comprehensive approach. By challenging the dominance of closed systems, DeepSeek R1’s open-source nature presents a unique opportunity for the AI community to collectively advance reasoning models and drive more accessible, transparent, and accountable AI innovations.

‍

Learn more