Section 230 Redux

The debate over Section 230 is one of those unfortunately common cases where an issue has been sucked into the great consumer packaged ideology wars despite having almost no ideological content. The left and the right each have their ordained opinions, though neither make any sense or are particularly related to broader principles of political thought.

To understand why Section 230 exists, why it was necessary, why it is now somewhat flawed, and what should be done about it, it’s first necessary to set aside any preconceived notions you may have about why Section 230 matters. The issues at stake here are not free speech or censorship and, contrary to what some in the cultocracy of pundits like to assert, neither is the fundamental working of the internet.

Section 230 was designed to solve a specific problem. In the early days of the internet, it became apparent that one of the best uses of the web was to allow people to communicate directly. The very first internet applications were essentially bulletin board systems, and the history of the internet and the web can be traced in the increasing scale and sophistication of these type of applications.

AOL was, at heart, a bulletin board, a direct message service, and a publisher as well as your ISP. A kind of one-stop shop for the Web. And it’s quite conceivable that the world could have gone in the direction of a few closed AOL-like ecosystems rather like the one Apple currently runs in IOS. That isn’t the way it went down. Instead, ISPs became mere gateways, content services fragmented, and a few large messaging services filled various niches with Google search holding everything together.

What immediately became clear given the scale and nature of these connection services is that they could not be treated as publications. When I pick up my phone and call you, AT&T is not responsible for the words I speak just because they created the connection. Similarly, when I post a message to a friend on Facebook, Facebook can’t and shouldn’t be involved in editing it or deciding whether or not to make it available.

Yet there’s the beginning of a grey area here, because while content on messaging applications is always point-to-point between consenting parties, bulletin board services (and their modern counterparts) post messages publicly. This comes closer to something like publishing, but with huge and fundamental differences. It seems clear (and Section 230 mostly just codified this understanding), that bulletin board systems are merely providing a platform. Indeed, if you removed bulletin boards, it’s still possible for someone to build a public web site and display their messages on it. If they do so, one could suggest (and, in fact, some have suggested) that the web hosting services and the DNS naming services would be functioning as publishers. And, in fact, bulletin board services are at least somewhat responsible for ensuring that certain kinds of plainly illegal content are not pushed on their platform. A billboard painter is, in this sense, similar. One cannot put up a billboard displaying child pornography even though, in the general sense of the world, it seems clear that a billboard painter is not a publisher.

This is the world that Section 230 was written for. It recognized that the web provided a technological infrastructure that could be used to massively scale point-to-point communications and to provide for an essentially infinite billboarding capability. It also recognized that the platform companies making those services available could not reasonably function if they were responsible for monitoring and censoring point-to-point communications or were liable for whatever libels or misinformation were publicly displayed. Not only is it asking the impossible of companies to make these judgements, there is no reason to think that bestowing that power would be beneficial. Given the scale of user-generated content, companies would have to massively constrain speech that might actually be useful to protect themselves adequately.

In other words, Section 230 was an obvious, straightforward, necessary and very successful regulation. What changed all this was an evolution in the way those already very successful online platforms worked.

In their early days, platform communities worked largely by allowing people who already knew each other to connect and by creating specific destination points where people who had a specific interest could find others who shared that interest. Let’s call this a “pull” strategy. People could search to find a person they know or they could go to a “place” (many different online terms capture this concept including room, channel, board, and thread) for those with a specific interest.

In a similar vein, search became the de facto means of navigating the vast amounts of content on the web. Within a site, one might search for a place, and on the web in general, you searched for sites. Search is always a pull strategy, yet it too contains a kind of grey area. The problem with search in the online world isn’t finding a result, it’s finding too many. For a search to be useful, it must try to decide which of the tens, hundreds, thousands and even billions of results that could be returned are the most useful.

Enter “World War 2” into the Google search engine and it will return about 4.5 billion results (in a little under a second). The top result (for me, anyway) is a Wikipedia entry, followed by a Britannica entry, two entries and an entry for the (excellent) WW2 history museum in New Orleans. Good choices? Who knows. The google search engine doesn’t know if I’m looking for deep research, a quick overview, good videos, or an educational place to visit with my kids. For really broad (stupid) searches like “World War 2”, the quality of results returned is pretty terrible because it has to be.

Now type “best electric sedans” into Google. The first thing I get is a bunch of ads. Then I get articles by MotorTrend (“Top Rated Electric Car Models”), Edmunds (“Best Electric Cars of 2023”), and (“Top Rated EVS”). In this search, about half of what Google returns on the first page is advertising. It’s there because companies paid for it to be there. And payment for that position is what makes Google one of the most valuable companies in the world. The rest of the results are there because Google’s algorithm judged them relevant to your specific request. That relevancy is determined by the words you used, what other people have clicked on before, rankings Google gives to specific publications (there’s a reason Wikipedia shows up all the time), and your past click behavior.

What’s important to understand about search is that establishing this relevancy cannot be done even remotely well without a number of judgements some of which have undeniable biases. For instance, Google rewards sites that people tend to click through to by ranking their content higher. This has a conservative (in a non-political sense) impact on traffic flow. Established brands will win over newcomers. Not because they are better or more relevant, but because they are more recognizable. There are other even more arcane biases including a very common bias point – people who can afford to spend money on expertise in creating better Google rankings will nearly always get traffic at the expense of people who can’t.

In other words, even though search is a pull mechanism, the order in which results are returned contains considerable (and inevitable) biases. These biases really do matter because the order in which search results are returned is critical.

It’s widely acknowledged that only a tiny fraction of searchers will ever go to the second page of search results. On the other hand, it’s not unusual for 40-50% of searchers to click on the first result returned. Nor is this simply a matter of Google being very good at what it does. In a great many cases, there is no real logic to the ordering of top search results. For queries like WW2, the idea that the Wiki article is somehow 20x better than the History Place article or the BBC History page is nonsense. Search optimization experts generally consider that each search position ranking will generate about 2x traffic compared to the position below it. And their business is charging people to improve that ranking. Keep in mind that in doing that, these experts almost never do anything to improve the actual content of the website. They aren’t like great editors, making your online site content better. Usually, in fact, their business is making your content worse for humans but better to Google or creating the illusion that it is authoritative with various tricks.

It’s easy to see how relevancy might become a manipulable tool for bias. Suppose that Google’s CEO decided he didn’t like Joe Biden and ordered his engineers to change the algorithm so that for any search involving Joe Biden, only the most unflattering, negative and hostile results would be surfaced first. In this scenario, the search engine isn’t generating any content, but it seems to have become something quite different than a billboard painter. In the extreme “Biden” case, it would be hard to argue that the search technology is a “platform” not a publisher.

There’s no evidence that broad internet “pull” mechanisms have ever been this self-consciously manipulated. What’s more, because search is a utility function, a version that consistently responded in less relevant ways would be vulnerable to competition.[1]

In any case, it isn’t “pull” mechanisms and search relevancy that have created the real problem around Section 230. It’s the evolution of social media companies and their growing adoption of push mechanisms.

The rationale behind the adoption of push mechanisms for content is easy to understand. It’s all about monetization. Sites like Facebook, TikTok, Twitter, and YouTube are free to use. They make money by selling advertising. For these companies, there are two ways to make more money. The first is to get more viewership. The second is to increase the value of each view. One way to get more views is to steer more people to your site. That’s why Google owns YouTube, and it’s why a video on YouTube is more likely to show up in search results than a video on Vimeo. Another potential monetization lever is creating more content and better content. The more content you have and the better it is, the more people will want to watch it. Finally, you can find technology-based ways to get people to spend more time on your site and consume more content.

There are natural limits to a tool’s ability to steer or commandeer search results. Indeed, pull mechanisms are great for building an audience but very poor for holding and monetizing one. In terms of the quantity of content, crowd sourcing largely solved the creation problem for companies like YouTube and Twitter. They don’t need to hire writers to turn out articles and they aren’t limited by staff costs. YouTube hosts billions of videos (500 new hours are uploaded every minute) which it paid nothing to create.

Of course, most of those videos aren’t the equivalent of a Spielberg movie or a NY Times article – and I know because like everybody else, I’ve uploaded my share. So the hard part for YouTube is how to steer people to the good stuff. Because without that, all those billions of videos don’t really matter very much.

Search, of course, can do this. But with search people must ACT to see what’s next. If I go onto YouTube and search for a video about Nuclear SMRs, I’ll likely consume a video or two and then leave. From YouTube’s perspective, that’s sub-optimal.

What social media platforms discovered was that pull mechanisms, while necessary, are very inefficient for full monetization. By suggesting additional content, these platforms can increase consumption. How do they suggest additional content? Mostly, they started out doing it in a manner fairly similar to generating search results. If you viewed X content, they looked at what other people who viewed X content also viewed. Then they suggested more of that.

This strategy works but has its own limitations. It isn’t exploratory enough and it’s very poor at finding other content types you might watch. Increasingly, social media companies have developed algorithmic feeds that engage in a constant process of testing and supervised learning. They show you a carefully calibrated mix of very popular (proven) content, then measure your reactions. These measurements may be obvious (read or watched) or very subtle (tiny variations in scroll speed on TikTok). Based on your reactions, they give you more or less different kinds of content. The more content you consume, the more they can measure your reactions and give you what you will consume. And the more content you consume, the more they can explore what other kinds of content you might spend time on.

Algorithmic feeds are a “push” mechanism. They are surfacing content to you that the platform wants you to engage with. It would be easy to re-write that last sentence as “thinks you’ll want”, but that’s not exactly what’s going on. First, while all algorithmic feeds are designed to understand what you might respond to, they aren’t necessarily optimized to give you what you want. Amazon, for example, uses algorithmic feeds to suggest products to buy. This certainly implies that the feed thinks there’s a chance you’ll buy them, but it’s not necessarily because the feed thinks it’s something you’ll like. It may just be that Amazon has too much of that product and there’s at least some chance – however small – that you might buy it. Second, it’s important to recognize that the underlying metric being optimized isn’t your satisfaction, it’s your attention. The headlines on clickbait articles aren’t designed to steer you to content that you’ll admire or even enjoy. They are just designed to get you to click so an ad can be displayed and billed. In an attention economy, some strategies are optimized to build loyalty (TikTok) and others to a constant flow of random traffic capture (Huffington Post). Most companies use some mix of both these strategies (YouTube). Similarly, if there are two videos on how to assemble a piece of furniture and one is long enough to fit two ads and the other only one ad, then the algorithm will steer you to the video that has the most ads (or the most remunerative ads).

Algorithmic feeds transformed the nature of online platforms. Facebook went from a place where people connected with friends to a dominant news site. Sites like Twitter and Instagram became much less of a way to connect with other people and much more of a way to consume content. TikTok is almost a pure publishing platform – with very little of the original “social” structure that first evolved within these platforms.

It’s obvious that with an algorithmic feed, tools like Facebook are no longer simply connection platforms. Indeed, it would be hard to find a way to distinguish what Facebook does in its newsfeed from what the NY Times does in its editorial room. A publisher like the NY Times is mostly deciding that certain content is better than other content (and God knows they are right). They are packaging up the content they like – content produced by writer X – and putting it in a single place where it can be consumed. It’s certainly true that the Times is paying for the content to be created but note that this is hardly essential to being a publisher. Surely nobody thinks poetry magazines (printing largely unpaid submissions from people who aren’t their employees) aren’t publishers or somehow have protection from printing libelous materials!

Nor is the fact that, for the NY Times, a human editor is providing the algorithmic judgement a very compelling difference. It’s hard to see what the editor could be doing that would distinguish a human from an algorithmic editor. An editor may or may not have political biases, may or not be consciously trying to boost circulation, and may or may not care much about good writing.

Suppose (and this is far from fanciful), the NY Times replaced it’s editorial staff with a ChatGPT editor trained heavily on past NY Times content. This machine editor could take content from any submitter, decided if it was of the type, message, and quality of content that fit the Times, and then choose to print or reject it. In blind tests, imagine this system made the same decisions as existing editorial staff in 99.9% of all articles. In this case, the Times might no longer have a human editor. Would it magically become a platform not a publisher?

The heart of the problem for Section 230, then, is that the widespread use of push algorithms has transformed platforms that were originally one thing into something quite different. It’s hard to see how the algorithmic promotion of content by Facebook is different than the human (or algorithmic) selection of content by the NY Times or the Huffington Post.

Nor would removing algorithmic feeds break the internet. Yes, it would break TikTok, and it would reduce the monetization opportunities for YouTube, Instagram, Twitter and Facebook. But these platforms survived and thrived without algorithmic feeds, they just weren’t quite as profitable. What’s more, a very strong case can be made that algorithmic feeds, while great for monetization, are damaging to every other aspect of the value these online tools provide. Does anyone really think Facebook is better as a news platform than as a way to connect friends?

If removing Section 230 protections for companies engaging in push feeds resulted in the removal of all algorithmic feeds, the internet would be less profitable, but it would likely be better in almost every other respect. If it didn’t force that change, then it’s hard to see how it would make things worse than they currently are.

What would break the internet as we know it, would be the removal of Section 230 protections for pull mechanisms like Google search. On the one hand, without search the web could not function as it does. Yet it’s clear that with sufficient manipulation, an internet search engine could become very much like a publisher.

If the line between promoting content (algorithmic push) and providing a path to it seems both intelligible and clear enough to be the basis of sound regulation, is there a similar distinction to be made in pull mechanisms like search? If the Biden example above crosses a line, is there a satisfactory way to define where that line should be?

For pull mechanisms, it seems likely that “relevance” to the query is a reasonable standard that we should expect a search engine algorithm to adhere to. Many different concepts of relevance are potentially reasonable, including differing ideas about what to count as authoritative. Drawing a very generous line like this would not protect search from its tendency to favor established brands or be gameable by those with the money to hire appropriate expertise. But such problems are everywhere in society and are demonstrably hard to regulate.

This isn’t an irretrievably slippery slope, however. Most legal arguments simply require that cases far on one side of the line or the other be distinguishable – not that cases right on the boundary line be easily or consistently decidable. That’s what legal argument is for. It would be hard to make the case that a search engine that ranked results by their hostility to Biden was using relevance. It would be hard to argue that a search engine that used keyword matching and all previous searchers click preferences wasn’t at least attempting to provide relevance.

The argument over Section 230 is real. The evolution of technology platforms and algorithmic feeds has made the initial regulatory setup obsolete. It happens. In none of this could an objective observer see much that would lend itself to ideological interpretation. Is there anything in progressive, classic liberal, or conservative principles that is tied into this debate? If so, it’s elusive. There are no free speech issues here. No real issues about government regulation or control. There is simply a classification question arising from the growth of a new technology (algorithmic feeds). Unlike so many other ideological battlegrounds, the issues around Section 230 are substantive, real and kind of interesting. They are also at least somewhat important. That being said, these arguments aren’t derived from or dependent on any set of philosophical or political principles, and there is a reasonably straightforward path for adapting the existing regulatory structure to the new realities of the technology platforms.

[1] That being said, search has huge cost barriers to entry and there is considerable merit in comparing Google to a utility and online search bears many characteristics of a natural monopoly.