Our Vanishing Internet: An Interview with Dr. Larry Sanger

wikipedia cofounder dr. larry sanger on the establishment takeover of wikipedia, corporate control of online knowledge, why information disappears from the internet, and more
Mike Solana

Subscribe to Mike Solana

Millennials were raised to fear the internet’s permanence, which evolved into a kind of truism after Facebook: think before you share, or forty years from now we’ll all be talking about those drunk photos at your Supreme Court hearing. Our assumption was that every piece of information online was immortal, and freely accessible by everyone. Which was awesome, actually. Embarrassment was simply the price that we would have to pay for the greatest invention in human history: a decentralized, democratized, liberated Library of Alexandria at our fingertips — forever. Well, turns out we were wrong about pretty much everything.

Today, the shifting, morphing, vanishing internet, with entire Millennial subcultures, and whole historical records degraded into the abyss, is probably the single most important thing almost nobody is talking about. The internet is not only impermanent, but malleable, and that combination of qualities inherent to the information ecosystem our entire world is built upon is driving our civilization into a state of chaos. While I’ve written about these subjects with interest for a few years, Dr. Larry Sanger has dedicated most of his life to the architecture of human knowledge. Speaking with him was an honor.

Dr. Sanger is a co-founder of Wikipedia. Before his work with Jimmy Wales on Nupedia, the predecessor to Wikipedia, he was a working academic philosopher. Today, he’s the President of the Knowledge Standards Foundation (KSF), a 501(c)(3) nonprofit focused on creating the standards and tools for the future encyclosphere.

We first connected after I wrote Encyclopedia Titanica, at the very beginning of Pirate Wires’ entrance into the Wikipedia conversation (most recently carried on in a great new piece by Ashley Rindsberg, How the Regime Capture Wikipedia). We discussed why the original vision of Wikipedia — and the safeguards the founding team put in place — weren’t enough to prevent it from becoming biased toward the establishment’s point of view; how AI will kill Wikipedia, but is just as vulnerable to state censorship and establishment bias; and Sanger’s work creating a truly decentralized encyclopedia that’s resistant to censorship and memoryholing.

--

Larry Sanger

“It feels like the internet is vanishing,” I said. “Am I right to be concerned?”

DR. LARRY SANGER: Yes, I think you're very right to be worried about that problem. I mean, if the Internet Archive doesn't keep up, or if they are shut down, or if they selectively delete certain things, then there might not be any other place.

There are other archiving companies. One of our partners is a cryptocurrency company called DARA that is trying to do the same thing, but it's an enormous problem. They can start to create the technology, but to actually make it happen, to redo essentially everything the Internet Archive has done, is a massive, massive undertaking.

So you are right. Partly because of the sheer technical difficulty of archiving everything, first of all. And partly because some people have a vested interest in taking some information down. Like if there's some big, faceless corporation hosting a gamer community full of extensive chats, and the gamers made relationships, and there's like a whole history they care about — nobody else cares about it, but they care about it — there's just some people in a boardroom somewhere who will say, ”Oh, well, we gotta shut down this website, it's not making any money anymore.” Usually for legal reasons, they'll just take everything down entirely. And, right, when you were a kid, you weren't thinking about that — that maybe the only thing that would keep information online is the people who originally put it there.

And if it's not them, then it has to be something like archive.org. And unfortunately, with archive.org, although I think they're relatively good, they're allied with Big Tech to a certain extent, and they've already demonstrated that they are not entirely on board with the free speech maximalism of the 70s and 80s.

MIKE SOLANA: What was the vision for Wikipedia when you founded it? How did you think about the internet back then, and how has that changed over the last couple of decades?

The antecedent to Wikipedia was NuPedia, which started in 2000. It was my job to get that started. About a year after it had made slow progress, a friend of mine told me about wikis, and I was thinking, well, we could actually apply wiki technology to the problems we were having with NuPedia. So, in early 2001, Wikipedia was born. I managed the site in its first 14 months or so of life.

At the time, my notion of the internet was pretty positive. I thought it would gradually connect everyone. Major corporations were barely online and not investing large amounts of money into it. And I was thinking — I guess I was pretty naive, but I was younger than you are, at that time — that when the big corporations got involved, all kinds of new things would be possible. There would be new kinds of conversations that would be able to take place, and something like Wikipedia would just explode and blossom with money thrown at it by all these big players.

That did happen, but it didn’t have the positive effect I thought it would. I believed that in the long run, everyone would get smarter by having everything available all the time. You could ask any question that you have, and get an answer right away. And well, it's actually been a while since that has been true at this point. But what I didn't anticipate as being as important as it was, was essentially when all the corporations got online, when governments started getting highly interested in what was going on online, it became not just a medium of education, it became a locus of control. I won't say that it was a terrible surprise to me, but it was a great disappointment.

Around about, 2012, 2014, I started having bad experiences with social media, when they actually started imposing degrees of...


authoritarianism?

Exactly. Unfairly at times. They started cracking the whip, actually.

You mentioned free speech before. I remember having a discussion with people in my old college alumni group in 2012 or 2013, and I actually saw the wokeness revolution, or whatever you want to call it, happen right before my eyes while I was there. When I joined that group, it was kind of like how things were in the 80s and 90s. People were just talking, it was a free for all. Then some younger people got involved, and it became very clear, due to the things they said, that they did not actually support free speech as a principle.

Yeah, it was a pretty radical and rapid shift. Even when I was in college — so, 2007 — these ideas were not popular. But it almost doesn't matter what the ideology is, right? This just happened to be the first ideology that came around with significant support among elite people and institutions that hold power in the country. And because it captured that small group of people, the ideology was able to exert tremendous control over the entire country in a pretty rapid way. The next one could be worse.

How do we build systems that defend against this? Have you seen these ideas shape Wikipedia? And where do you see the architecture of knowledge going? You've written recently about AI as a kind of Wiki-killer.

So, first of all, I think it’s a cultural problem. I don’t think it’s first and foremost a technical problem. The roots of the cultural revolution we’re experiencing now lie in critical theory in literature and philosophy departments back in the 1970s and 1980s. The kind of language that people started using back then has become mainstream. I’m amazed that it became mainstream.

When Wikipedia started, it seemed decentralized because anybody could get involved and represent various points of view. We had a robust neutrality policy that allowed people to work together and represent a global array of perspectives on every topic.

Today, that is no longer the case. I won’t get into all the reasons or descriptions of the problem, but the basic issue is that Wikipedia now represents an establishment point of view. It’s not neutral; it’s the establishment’s view. If the establishment allows a narrow range of controversies on some topics, then that’s okay. You can debate those things, but anything outside that Overton window is not going to be represented fairly — if at all — on Wikipedia.

Now, Wikipedia is just an example of a broader phenomenon found in the media generally. There was a time when I thought maybe Wikipedia might resist the drift towards media bias, but it didn’t. It just followed in lockstep.

I would love to know how that happened. Wasn’t Wikipedia designed to avert this problem?

I mean, at least policy-wise, it was designed that way. The whole idea is that these are people who don’t have to declare who they are. They could be from anywhere in the world, and can work on the website at any time. It’s a free-for-all. There’s a policy that says you have to let other people have a say and make an effort to represent their points of view fairly — the neutrality policy.

So, at least as far as that design is concerned, you would think it would be resistant, but it hasn’t worked out that way. From almost the beginning, there have been people who have given lip service to the neutrality policy or have creatively interpreted it to allow themselves to be biased as hell. More and more of such people have basically taken over. It’s been one of the institutions that the left has marched through.

You've written about artificial intelligence as it applies to searching for information and cataloging information. I think it could be a Google killer as much as it could be a Wikipedia killer. How do you see it? And how do you see the way that we disseminate information and consume information changing?

Well, people use encyclopedias in two different ways. One is to get a particular fact, to get a piece of information that they don’t have. They have a question, they just need an answer to the question, and then they can move on. The other way is when they want a general introduction to a topic.

AI is going to be, within a few years, really, really good at both. It’s already pretty good — it’s already pretty useful at answering a lot of our factual questions. But when you start asking hard questions, like interpretive questions about literature, AI can badly summarize the sorts of answers it finds and is right only about 80% of the time.

Going forward, it’s going to become increasingly sophisticated at answering increasingly sophisticated questions and not just sort of blather. I think within a few years — it’s going to happen very fast — we will be able to ask it things like, “Compare the concepts of civil freedom and free will in all of the writings of John Stuart Mill, and give me quotations, and also throw in some academic references from the last 50 years,” or something like that, and it will actually give a really good answer. It will be able to answer those questions really well.

But the thing is, it’s still going to tell you what it wants you to believe about that question. It can subtly influence the direction of your intellectual development in precisely the same way that an indoctrinating college professor would do.

Subscribe to Mike Solana

Yeah, I try to be a discerning person, but I often catch myself default trusting people I know and like. I think this is how we're designed — to sit around a campfire and share stories with people who we like, and to listen to them. To trust them. I think AI has begun to fill this role, as a likable, trusted source of information. Just by being friendly.

Well, I’ll put on my philosopher’s hat. In the branch of epistemology called social epistemology, one of the things they discuss is what they call testimony, which is one of the basic ways we learn about the world. Most social epistemologists believe it cannot be reduced to anything else. In other words, it’s a fundamental assumption that if someone says something to you, then you have at least a little reason to believe what they say.

So, you’re absolutely right that an AI can influence you just by mentioning something. Even if the AI has been hallucinating, you’ll think, “Okay, yeah, that sounds possibly wrong, I’ll go and investigate that,” but you’ll take for granted other things it said that were actually wrong.

Right, if it sounds like it knows what it's talking about... This is a strange quality: something that sounds true. I mean, any number of facts that we encounter a day, we don't judge or even think twice about.

That’s really important.

The reason why something sounds true is that it has some degree of fit with your background assumptions. You have a certain set of beliefs, and if something is fully consistent with those beliefs, then you’re much more likely to believe it. Whereas if it collides a little bit with some of your beliefs, you’ll be less likely to believe it.

If your beliefs are wholly cut off from the real world, and from a truly diverse set of opinions from many different people who each have connections to the real world, then you can be indoctrinated. I actually think this is what happens with people in cults, and I think the worst ideologues online are like this. They don’t talk to other people; they live in their own silos. Those people are going to be the ones writing our AI chatbots and editing the settings of those chatbots.

In our conversation last week, you said AI is potentially a Wikipedia killer not just because it's better at analyzing information and has access to more information. It's also because it’s just the way people prefer to learn. They prefer to ask a question and receive an answer, rather than do research.

Well, I want to acknowledge that AI is very useful, just like Wikipedia. Wikipedia is still useful; I have to give it that — especially for particular topics. AI is potentially even more useful. It’s easy and fast to ask it a question and get an instant answer, whereas before you might take 15 minutes or a full hour to research something thoroughly.

That being said, there is a lot to worry about. In the same way that there was institutional capture of Wikipedia, the AI chatbots are being built by institutions that are already captured. So, of course, that’s why when I was trying to ask for a full account of the bombing of the Nord Stream pipeline recently, the chatbot I was using just refused to give me an answer.

I kept feeding it little bits of information: “Okay, well, the name of the journalist was this, and he said this,” and I started asking specific questions about what he said. At a certain point, it actually started giving me some answers, but only as much as it had to in order to answer the question, and then it just cut it off.

What you're describing is definitely dangerous, but it's going to become... 'Oh, well, whatever, I can find this information online. It doesn't matter.' But as AI takes off, and we become increasingly reliant on AI to answer our questions —

You might try to be skeptical, but if you don’t actually think of certain questions to ask, you won’t ask them. You will just find yourself with a new belief that is convenient for the establishment to give you.

It’s going to be like that with regard to all the questions we would ask an encyclopedia or a search engine. We will ask those questions of a chatbot. Unless we know to follow up on certain aspects, our opinions will be directed in a certain way. If our background assumptions have been molded and pushed in a certain direction, it will be harder for us to arrive at a fully nuanced picture of reality.

How do we solve the problem of our impermanent internet record? This is something that you're working on now. You mentioned your work on Oldpedia. If you could re-architect our entire system of record, what would that look like?

Let's start with the Internet Archive.

That's a hard question. Because like I said, I don't envy them. The problems that they have are very hard...

What problems are they facing?

I mean, they’re just technical problems. It’s massive, massive amounts of data, right? As soon as you gather that much data together, it needs to be mirrored and made available, hopefully, if it’s all in one place. That’s another thing: there are only two mirrors of the Internet Archive that I know of. One is in Amsterdam, the second one is in Egypt.

Then there are all the problems associated with metadata. In order to make that data, whether it be a book or a webpage or a snapshot of a webpage, properly searchable and represented, certain kinds of metadata need to be captured. So there are all those technical problems.

And then, of course, there are also the managerial and legal problems. Increasingly, I think there will be people putting pressure on the Internet Archive to get rid of certain things. Even now, that sounds ridiculous — even the most far-left leftist isn’t going to say, “We’re not going to delete Mein Kampf from the Internet Archive,” because it is useful as a historical document to learn from.

But people will want to erase history, right? Not just the most offensive parts, but also stuff that is not in line with certain views. It happens, and it has happened throughout history. How do you insulate a non-profit that depends on millions of dollars from giant corporations that are getting woke? How do you insulate that organization from those sorts of pressures? You can’t.

So the only solution is to make what they are trying to do decentralized. But good luck with that, because then in addition to having metadata, you actually have to have a lot of different people agree on the metadata and agree on the sort of network structure that enables them to share the files in lots of different places. But we are making a start, at least, at the Knowledge Standards Foundation.

What is your approach?

We are crawling lots of different encyclopedias, beginning mostly with open content encyclopedias. There are quite a few of those, maybe not as many as you might think, but on the order of many hundreds. Then there are also some others. The idea is to create a file format for individual encyclopedia articles. That’s what we’ve done.

The file format is basically a kind of ZIP file. So, if there’s an article about George Washington from, say, Citizendium, we will have the HTML, a text-only version, and a metadata file with things like the title, author, publication date, and other details. There will also be a digital signature, which is a technical way of proving that the file originated from a certain server. If a file claims to be from citizendium.org and is signed by citizendium.org, you can prove that because the signature was added to that file. This is important because, in a decentralized system, unless you digitally sign your files, someone could change a file on a server, and nobody would be the wiser.

Another thing we are doing is settling on the actual directory structure so it’s easy to exchange files between different aggregators. We’ve got one aggregator called EncycloReader, run by a CERN developer, and another called EncycloSearch, run by my son. They have built everything independently, with independent readers, search engines, and directories, and they are constantly iterating on how they exchange files.

At this point, it’s possible to do it automatically. If one person makes a new ZWI file from a new encyclopedia, the other can automatically import it, and it can propagate to others. The cryptocurrency company I mentioned, DARA, has also made an aggregator and imported some of our stuff, making new files themselves. When an article is up for deletion on Wikipedia, they’ll now make a snapshot of it and keep a copy.

So to be clear, this is going to get us to a world where the record of the internet at least is — I don't want to say permanent — but it's more permanent than it is today?

Yeah, well, it is shared across lots of different computers. Another thing we are working on, and my son hasn’t actually started yet but will shortly, is a desktop app. He’s already created a plugin for Chrome and Brave that will seed articles into a decentralized format. This will be like a desktop server. It will turn your computer into a server where you can allocate a gigabyte or whatever amount of space and become another node or aggregator, automatically seeding articles.

Then, anyone will be able to download articles from the nearest node. This is an example of the sort of decentralized content network that needs to exist. If we can build it for encyclopedias, then we ought to be able to build it for other kinds of content as well. But we’re starting with encyclopedias.

Last question: the average person is not nearly as steeped in all of this as you are — specifically, the topic of how we know what we know, and how the internet is designed to help us know what we know. What is something about this subject the average person doesn't think much about, which you wish they would?

The average person gets their beliefs from the mainstream media and their education. I would want the average person to know just how completely controlled, manipulated, and captured — to use that word again — they are if they do not escape their indoctrination from mainstream education and media. There is so much more information out there, and I’m not just talking about conspiracy theories, although a lot of those are more important and plausible than you might think.

But no, I’m not talking about that. I’m talking about, for example, who is the most important black intellectual alive today. It’s Thomas Sowell, but how many people would know that, who are regular viewers of CNN and graduates of your local public high school and places like Ohio State University? They wouldn’t know that. I’m biased because I’m ideologically aligned with him to a certain extent — not entirely — but, speaking as a philosopher who has actually read a half dozen of Sowell’s books, I can tell you, he is a master. He’s one of the greatest intellectuals that America has produced, period.

And that’s something you wouldn’t know. That’s not a conspiracy theory. It’s not even a conspiracy theory to say that that is a fact. It’s just an opinion, but it’s an unpopular opinion and a perspective you wouldn’t learn if you relied entirely on what the talking heads say is really credible and important.

Also, I would say, read the Bible.

Well, thank you very much for your time. This has been great.

— SOLANA

This conversation has been edited for clarity and brevity.

Subscribe to Mike Solana

0 free articles left

Please sign-in to comment