Cogito AI: I Think, Therefore I Am

Cogito AI: I Think, Therefore I Am

Deep Cogito, Descartes
Cogito, ergo sum.

I think, therefore I am.

Descartes defined humans as thinking beings, grounding the very idea of existence in the capacity to think. Centuries later, we now face artificial intelligence systems that seem to think on their own.

One recent model has attracted attention for reportedly outperforming both Meta’s LLaMA 4 and DeepSeek’s R1. It was developed by Deep Cogito, an AI startup founded by former Google senior software engineer Drishan Arora. The company’s name, taken directly from Descartes’ philosophy, signals more than just branding—it reflects a clear and intentional vision.

Superintelligence: AI Beyond Human Intelligence

Deep Cogito’s newly released language model series isn’t just another open-source release. It marks a fundamental shift in how AI is trained, showcasing a rare real-world application of a system capable of improving itself. In doing so, the company may have moved us a step closer to superintelligence—an AI that exceeds human abilities across all areas of cognition.
 
The idea of AI outperforming humans isn’t new. Think back to the 2016 Go match between Lee Sedol and AlphaGo—a moment that offered an early glimpse of what AI might one day become.
Lee Sedol playing against AlphaGo. Source: Google.

Lee Sedol playing against AlphaGo. Source: Google.

When AlphaGo faced Lee Sedol, many expected the human champion to dominate. In the end, Sedol won just one out of five games. Since then, AlphaGo and other game-playing AIs have shown that in narrow domains with clear rules, AI can not only match but surpass human performance.

Two key factors enabled this breakthrough:

  • Advanced reasoning: the use of massive compute to reach solutions well beyond traditional methods

  • Self-improvement: the ability to refine itself through iterative feedback, without constant human oversight

 

But AlphaGo’s knowledge stopped at Go. True superintelligence goes further—it can define new problems and solve them across different domains.

That’s what Deep Cogito aims to build: a general-purpose system that moves beyond specialized tasks. And early results suggest they’re on the right track. The company has released models ranging from 3B to 70B parameters, outperforming most open models, including LLaMA, DeepSeek, and Qwen, on a wide set of benchmarks. Notably, the 70B model even surpasses Meta’s 109B LLaMA 4 MoE in some areas.

So, what’s driving this performance? 

Deep Cogito's Key Strategies

Deep Cogito trained its models using a method called Iterated Distillation and Amplification (IDA), which is a long-theorized strategy now realized in practice to help models move beyond domain-specific limits.

IDA follows a simple two-step cycle:

  1. Amplification
    The model is given more compute and prompted to perform complex reasoning, such as multi-step thinking, tool use, and other advanced tasks. This expands its capabilities beyond its current baseline.

  2. Distillation
    The insights gained during amplification are distilled back into the model’s parameters. This lets the model handle similar problems more efficiently in the future, without repeating the full reasoning process.

 

With each cycle, the model improves. It becomes smarter over time, not through static data or manual tuning, but through its own repeated learning loop.

The most striking part? This doesn’t require human supervision. Unlike traditional training, which relies on human feedback or labeled datasets, IDA enables intelligence to scale through compute and algorithmic design alone. It’s a promising step toward truly autonomous, general-purpose superintelligence.

Results

Deep Cogito has released five models—3B, 8B, 14B, 32B, and 70B—all of which show strong performance compared to other open-source models.
 
Most notably, the Cogito 70B model not only outperforms LLaMA 3 70B but also beats Meta’s latest LLaMA 4 109B Scout model on several benchmarks.
 
What makes this even more striking is that the models were trained by a small team in just 75 days. This speaks not only to Deep Cogito’s technical strength but also serves as compelling evidence that the once-theoretical IDA strategy is more efficient and scalable than traditional training approaches.
 
Cogito models support both fast, direct responses and a reasoning mode for more thoughtful outputs. Even the smaller models, like 3B and 8B, include tool-calling capabilities, making them well-suited for real-world use and integration into AI agents.
Tool Calling for smaller models. Source: Deep Cogito.

Tool Calling for smaller models. Source: Deep Cogito.

The “amplify and distill” strategy used by Deep Cogito evokes the image of a human driven by an unshakable will to grow through challenge. Descartes’ famous proposition once felt like a uniquely human privilege. Now, superintelligence may be closer than we thought.
 
Descartes said, “I think, therefore I am.” But if AI begins to think on our behalf, we may soon need to redefine what it means to exist as human.

Your AI Data Standard

LLM Evaluation Platform
About Datumo
Related Posts