DeepSeek Was Inevitable

a cheap, open-source AI was bound to arrive. history tells us why.

Subscribe to The Industry

Steven Sinofsky is an investor, engineer, and the former leader of the Office and Windows divisions at Microsoft. Early in his career, he was Bill Gates’ technical assistant during the development of Windows 95 and the rise of the internet as we know it. Today in Pirate Wires, he provides a historical perspective on DeepSeek and explains why core technologies inevitably become commoditized. Sinofsky is confident that DeepSeek will serve as a wake-up call to U.S. innovators and spur a new, very different phase of American AI development.

His piece was originally published in Hardcore Software.

DeepSeek was always going to happen. We just didn’t know who would do it. It was either going to be a startup or someone outside the center of leadership and innovation in AI, which is mostly clustered around trillion-dollar companies in the U.S. It turned out to be a group in China, which for many — myself included — is unfortunate.

But again, it absolutely was going to happen. The next question: will U.S. technologists recognize DeepSeek for what it is?

There’s something we used to banter about when things seemed really bleak at Microsoft: When normal companies scope out features and architecture they use t-shirt sizes — small, medium, and large. At the time, Microsoft seemed capable of only thinking in terms of extra-large, huge, and ginormous. That’s where we are with AI today and the big-company approach in the U.S.

There’s more on DeepSeek in The Short Case for Nvidia Stock. The perspective there is very good but focuses on picking stocks, which isn’t my thing. Strategy and execution are more me, so here’s that perspective.

The current trajectory of AI, if you read the news in the U.S., is one of MASSIVE capital expenditures (CapEx) piled on top of even more MASSIVE CapEx. It’s a race between Google, Meta, OpenAI/ Microsoft, xAI, and to a lesser extent a few other super well-funded startups like Perplexity and Anthropic. All of these together are taking the same approach, which I will call “scale up.” Scale up is what you do when you have access to vast resources, as all these companies do.

The history of computing is one of innovation followed by “scale up,” which is then broken by a model that “scales out” — when a bigger and faster approach is replaced by smaller and more numerous approaches.

Mainframe → Mini → Micro → Mobile

Big iron → Distributed computing → Internet

Cray → HPC → Intel/ CISC → ARM/RISC

OS/360 → VMS → Unix → Windows NT → Linux

You can see this pattern play out on the macro level throughout the history of technology — or you can see it at the micro level with subsystems, from networking to storage to memory.

The past five years of AI have brought us bigger models, more data, more compute, and so on. Why? Because, I would argue, innovation was driven by the cloud hyperscalers, whose approach was destined to be more of what they’d already done. They viewed data for training and huge models as their way of winning and their unique architectural approach. The fact that other startups took a similar approach is just Silicon Valley at work — people optimize for different things at a micro scale without considering the larger picture. (See: the sociological and epidemiological term small area variation.) People try to do what they couldn’t do in their previous efforts, or what their previous efforts might have overlooked.

The degree to which the hyperscalers believed in “scale up” is obvious when you consider the fact that they’re all building their own silicon (creating their own specialized AI chips). As cool as this sounds, it has historically proven very, very difficult for software companies to build their own silicon. While many look at Apple as a success, Apple’s lessons emerged over decades of not succeeding PLUS they build devices, not just silicon. Apple learned from 68k, PPC, and Intel — previous processor architectures Apple used before transitioning to its own silicon — how to optimize a design for its use cases. Those building AI hardware were solving their in-house scale up challenges — and I would have always argued they could gain percentages at a constant factor, but not anything beyond that.

Nvidia is there to help everyone not building their own silicon, and those who want to build their own silicon but are also trying to meet their immediate needs. As described in “The Short Case,” Nvidia also has a huge software ecosystem advantage with their CUDA development platform, something they have honed for almost two decades. It is critically important to have an ecosystem, and they have been successful at building one. This is why I wrote and thought Nvidia’s DIGITS project is far more interesting than simply a 4,000 TOPS (tera operations per second) desktop (see my CES report).

So, where are we? Well, the big problem is that the large scale solutions, regardless of all the progress, are consuming too much capital. But beyond that, delivery to customers has been on an unsustainable path. It’s a path that works against the history of computing, which shows us that resources need to become less — not more — expensive. The market for computing simply doesn’t accept solutions that cost more, especially consumption-based pricing. We’ve seen Microsoft and Google do a bit of resetting with respect to pricing in a move to turn their massive CapEx efforts into direct revenue. I wrote at the time of the initial pricing announcements that there was no way that would be sustainable. It took about a year. Laudable goal for sure, but just not how business customers of computing work. At the same time, Apple is focused on the “mostly free” way of doing AI, but the results are at best mixed, and they’re still deploying a ton of CapEx.

Given that, it was inevitable someone was going to look at what was going on and build a “scale out” solution — one that doesn’t require massive CapEx, and architectural approaches that use less CapEx to even build (e.g. train) the product.

The example that keeps running through my mind is how AT&T looked at the internet. In all the meetings Microsoft had with AT&T decades ago about building the “information superhighway,” they were completely convinced of two things. First, the internet technologies being shown were toys — they were missing all the key features, such as being connection-based or having QoS (quality of service). (For more on toys, see “[...] Is a Toy” by me.)

Second, they were convinced the right way to build the internet was to take their phone network and scale it up. Add more hardware, more protocols, and a lot more wires and equipment to deliver on reliability, QoS, and so on. They weren’t alone. Europe was busy building out internet connectivity with ISDN over their telecommunications networks. AT&T loved this because it took huge capital and relied on their existing infrastructure.

They were completely wrong. Cisco came along and delivered all those things on an IP-based network using toy software like DNS. Other toys like HTTP and HTML layered on top. Then came Apache, Linux, and a lot of browsers. Not only did the initial infrastructure prove to be the least interesting part, but it was also drawn into a “scale out” approach by a completely different player, one who’d previously mostly served weird university computing infrastructure. Cisco did not have tens of billions of dollars, nor did Netscape, nor did CERN. They used what they could to deliver the information superhighway. The rest is history.

As an example, there was a time when IBM measured the mainframe business by MIPS (millions of instructions per second). The reality was they had 90 percent plus share of MIPS. But in practice, they were selling or leasing MIPS (the acronym, not the chip company from Stanford) at ever decreasing prices, just like Intel sold transistors for less over time. This is all great until you can get MIPS for even less money elsewhere, which Intel soon delivered. Then ARM found an even cheaper way to deliver more. You get the picture. Repeat this for data storage and you have a great chapter from Clayton Christensen’s The Innovator’s Dilemma.

Another challenge for the current AI hyperscalers is that they have only two models for bringing an exciting — even disruptive — technology to market.

First, they can bundle the technology as part of what they already sell. This de-monetizes anyone trying to compete. Of course, regulators love to think of this as predatory pricing, but the problem is software has little marginal cost (uh oh) and the whole industry is made up of cycles of platforms absorbing more technology from others. It’s both an uphill battle for big companies to try to sell separate things (the salespeople are busy selling the big thing) and an uphill battle to try to keep things separate, since someone’s always going to eventually integrate them anyway. Windows did this with Internet Explorer. Word did this with Excel, or Excel did this with Word, depending on your point of view (See Hardcore Software for the details). The list is literally endless. It happens so often in the Apple ecosystem that it’s called Sherlocking. The result effectively commoditizes a technology while maintaining a hold on distribution.

Second, AI hyperscalers can compete by skipping the de-monetization step and going straight to commoditization. This approach is one that counts on the internet and gets distribution via the internet. Nearly everything running in the cloud today is built on this approach. It really starts with Linux, but goes through everything from Apache to GitHub to Spark. The key with this approach, and what is so unique about it, is open source.

Meta has done a fantastic job at being open source, but it’s still relying on an architectural model that consumes tens of billions of dollars in CapEx. Meta, much like Google, justifies that CapEx by building tools that make their existing products better; open-source Llama is just a side effect (and good for everyone). This is not unlike Google releasing all sorts of software, from Chromium to Android. It’s also what Google did to de-monetize Microsoft when they began offering Gmail, ChromeOS, and its suite of productivity tools (Google Docs was originally free, presumably to de-monetize Office). Google can do this because they monetize software with services on top of what they do with the open source they release. Their magic lies in the fact that their value-add on top of open source is not open source — rather, it’s in their hyperscale data centers running their proprietary code using their proprietary data. By releasing all their products as open source they, are essentially trying to commoditize AI. The challenge, however, is the cost. This is what happened with Hotmail, for example — turns out that, at massive scale, even a 5MB free mailbox adds up to a lot of subsidies.

That’s why all the early AI hyperscaler products take one of two approaches: bundling or mostly open source. Those outside the two models are in a sense competing against bundles and against the companies trying to de-monetize the bundles. Those outside are caught in the middle.

The cost of AI, like the cost of mainframe computing to X.25 connectivity (the early network protocol developed in the 1970s for transmitting data over telephone lines), literally forces the market to develop an alternative that scales without massive direct capital.

By all accounts, the latest approach with DeepSeek seems to be that. The internet is filled with analysts trying to figure out just how much cheaper, how much less data, or how many fewer people were involved. In algorithmic complexity terms, these are all constant factor differences. The fact that DeepSeek runs on commodity, disconnected hardware and is open source is enough of a shot across the bow of the current approach to AI hyperscaling that it can be seen as “the way things will go.”

I admit this is all confirmation bias for me. We’ve had a week with DeepSeek, and people are still poring over it. The hyperscalers and Nvidia have massive technology roadmaps. I’m not here for stock predictions. All I know for sure is that if history offers any advice to technologists, it’s that core technologies become free / commodities — and, because of internet distribution and de facto market standardization at many layers, that happens sooner with every turn of the crank.

China faced an AI situation not unlike Cisco. Many (including “The Short Case”) are looking at the Nvidia embargo as a driver. The details don’t really matter. They just had different constraints. They had many more engineers to attack the problem than they had data centers to train. They were inevitably going to create a different kind of solution. In fact, I am certain someone somewhere would have. It’s just that, especially in hindsight, China was especially well-positioned.

Kai-Fu Lee argued recently that DeepSeek proved that China was destined to out-engineer the U.S. Nonsense, I say. That’s just trash talk. China took an obvious and clever approach that U.S. companies were mostly blind to because of the path that got them to where they are today before AI. DeepSeek is just a wakeup call.

I’m confident many in the U.S. will identify the necessary course corrections. The next Cisco for AI is waiting to be created, I’m sure. If that doesn’t happen, then it could also be like browsers ended up: a big company (or three) will just bundle it for everyone to use. Either way, the commoditization step is upon us.

DeepSeek Was Inevitable

Continue Reading

The Tech Insurgent's Battle for the FutureJun 18

Venture Capital’s Space for SheepMay 2

Hallucinations in AIJul 3

The Tech Insurgent's Battle for the Future
Jun 18

Venture Capital’s Space for Sheep
May 2

Hallucinations in AI
Jul 3