YouTube remains a juggernaut of online spaces. Recently, it crossed the threshold of 2 billion logged-in users per month (Saima, 2019). Perhaps even more important for this research project is the time spent by users within this environment. Users spend around 250 million hours on the video sharing platform every day (Saima, 2019). The time “inhabiting” YouTube marks it out as distinct from Facebook, and suggests a different kind of influence over time, something slower and more subtle. Indeed, as will be discussed, radicalized individuals have noted how influential YouTube was in shifting their worldview over longer periods of time, a medial pathway that nudged them towards an angrier and more extremist stance (Roose, 2019). While this is just one highly politicized facet of YouTube, it signals the stakes involved here—not only the anger available to be tapped into, but the influence such an environment might have in shaping the ideologies of its vast population.
One key focus of recent critiques of YouTube has been its recommendation engine (Regner, 2014; Schmitt et al., 2018; Ribeiro et al., 2020). The design of the recommendation system is central to YouTube’s user experience for two reasons. Firstly, it determines the content of each user’s homepage. Upon arriving on the site, each user is presented with rows of recommended videos, with each row representing an interest (e.g. gaming), channel (e.g. the Joe Rogan Experience), or an affiliation (“users who watched X enjoyed Y”). As with similar designs such as Netflix, the YouTube homepage is the first thing that users interact with, and the primary “jumping off” point for determining what to watch.
Secondly, the YouTube recommendation system is crucial because it also determines the related videos appearing in the sidebar next to the currently playing video. By default, the Autoplay feature is on, meaning that these sidebar videos are queued to play automatically after the current video. This design feature means that, even if the user does nothing further, the next video in this queue will play. Even if the Autoplay feature has been manually turned off, this sidebar, with its dozens of large thumbnails, presents the most obvious gateway to further content. With a single click, a user can move onto a video which is related to the one they are currently viewing.
From a design perspective, the homepage and the sidebar form the crucial interfaces into content consumption. Search, while possible, is a manual process that requires more effort and has been deemphasized. Browsing recommended results, with its scrolling and tapping, provides a more frictionless user experience. It is unsurprising then, that “we’re now seeing more browsing than searching behavior”, stated one YouTube designer (Lewandowski, 2018), “people are choosing to do less work and let us serve them”. This shift has meant an even greater role for the recommendation engine. In theory, users can watch any video on the vast platform; in practice, they are encouraged towards a very specific subset of content. Indeed, YouTube’s Chief Product Officer revealed that recommended videos account for over 70% of watching time on the platform (Solsman, 2018). This is a single algorithmic system that exerts enormous force in determining what kinds of content users are exposed to and what paths they are steered down.
How is this recommendation system designed? In a paper on its high-level workings, YouTube engineers explain that it comprises two stages. In the first stage, “the enormous YouTube corpus is winnowed down to hundreds of videos” that are termed candidates (Covington et al., 2016, p. 192). These candidates are then ranked by a second neural network, and the highest ranked videos presented to the user. In this way, the engineers can be “certain that the small number of videos appearing on the device are personalized and engaging for the user” (Covington et al., 2016, p. 192). Based on hundreds of signals, users are presented with content that is attractive by design: hooking into their interests, goals, and beliefs. This recommendation engine is not static, but rather highly dynamic and updated in real-time. Your profile incorporates your history, but also whatever you just watched. As YouTube’s engineers (Covington et al., 2016, p. 191) explain, it must be “responsive enough to model newly uploaded content as well as the latest actions taken by the user”. As content is consumed, an individual’s interests and ideologies in turn are shaped (Fig. 4).
Of course, these technical explanations remain at a high-level. The recommendation system, as a proprietary technology owned and operated by YouTube, will always remain to some extent a black box. Yet even these general principles provide insight into the system’s design. First, the system is designed to promote “engaging” videos. Which videos are most engaging? As one former developer (Chaslot, 2019) explains:
We know that misinformation, rumors, and salacious or divisive content drives significant engagement. Even if a user notices the deceptive nature of the content and flags it, that often happens only after they’ve engaged with it. By then, it’s too late; they have given a positive signal to the algorithm. Now that this content has been favored in some way, it gets boosted, which causes creators to upload more of it. Driven by AI algorithms incentivized to reinforce traits that are positive for engagement, more of that content filters into the recommendation systems. Moreover, as soon as the AI learns how it engaged one person, it can reproduce the same mechanism on thousands of users.
Recommending content based on engagement, then, often means promoting incendiary, controversial, or polarizing content. The closer a video gets to the edge of what’s allowed under YouTube’s policy, the more engagement it gets (Maack, 2019). In other words, as even Zuckerberg (2018) has admitted, borderline content is more engaging. Because of this dynamic, designing for engagement goes beyond mere customer satisfaction to deeply influence the kind of content that promoted. As the developer quote above suggests, the system’s design establishes a series of powerful feedback loops. Creators create more of this toxic yet high-performing content and the system recommends it more often to users, not only individually, but at scale.
Secondly, the system is designed to be responsive, to be dynamic enough to generate new recommendations based on what was last viewed. The design challenge, as the engineers explain (Covington et al., 2016, p. 194), is to predict “the next watched video”. While again high-level, this creates a design with a degree of self-similarity, promoting more of the same kind of content. And yet if this content stays within the same topic, it is typically more intense, more extreme. “However extreme your views, you’re never hardcore enough for YouTube” attests one article (Naughton, 2018). Based on the strong performance of borderline content discussed earlier, YouTube’s recommendations often move from mainstream content to more incendiary media, or politically from more centrist views to right and even far-right ideologies.
The dynamism designed into the recommendation system establishes a vector, a gradual movement as each video is completed. Based on the current values designed into the system, users can be suggested material that progressively becomes more controversial, more political, more outrage-inducing, and in some cases, more explicitly racist, sexist, or xenophobic (O’Callaghan et al., 2015). Indeed, one analysis (Munn, 2019) suggests that YouTube can form a key part of an “alt-right pipeline”: users are incrementally nudged down a medial pathway towards more far-right content, from anti-SJW videos which demean so-called “social justice warriors” to gaming related misogyny, conspiracy theories, the white supremacism of “racial realism” and thinly veiled anti-Semitism. In a recent paper analyzing approximately 330,925 videos across 349 channels, a study found that “users consistently migrate from milder to more extreme content”, shifting from viewing so-called Alt-Lite material to more strident Alt-Right channels (Ribeiro et al., 2020, p. 131).
What is particularly powerful about this design is its automatic and step-wise quality. Users do not consciously have to select the next video, nor jump suddenly into extreme material. Instead, there is a slow progression, allowing users to acclimate to these views before smoothly progressing onto the next step into their journey. Recommendations are “the computational exploitation of a natural human desire: to look ‘behind the curtain’, to dig deeper into something that engages us”, observes sociologist Zeynep Tufekci (2018): “As we click and click, we are carried along by the exciting sensation of uncovering more secrets and deeper truths. YouTube leads viewers down a rabbit hole of extremism, while Google racks up the ad sales”. At the far end of this journey is an angry and radicalized individual, a figure that has increasingly emerged over the last few years, from Christchurch in New Zealand to El Paso, Texas and Poway, California in the United States. Yet along with these extreme examples, equally troubling is the thought of a broader, more unseen population of users who are gradually being exposed to more hateful material.
The result of these design choices is that the recommendation system emerges as a hate-inducing architecture. From a metrics point of view, the system is successful, delivering “engaging” content while ramping up view counts and watch time on the platform. And yet to do so, the system appears to consistently suggest divisive, untrue, or generally incendiary content. “YouTube drives people to the Internet’s darkest corners” warns one article (Nicas, 2018). In this sense, the design of the current recommendation serves the company well, but not necessarily individual users or online communities, particularly those that are already marginalized (Fig. 5).
It should be noted that one recent working paper has questioned the role of the recommendation system in hate speech and far-right indoctrination. Munger and Phillips (2019) argue that the central role given to the recommendation engine is overplayed, and suggests instead a supply and demand explanation. For the duo, YouTube lowers the barriers of media production to almost zero, it offers easy distribution online through hosting and sharing, and it incentivizes content creation via monetization. These conditions have led to a diversification of channels that politically stretch beyond the mainstream center-left/center-right poles. As Munger and Phillips argue (2019, p. 6): “these aspects of YouTube allow new communities that cater increasingly well to audiences’ ideas to form”. The YouTube platform allows for the proliferation of niche media and a greater variety of alt-right and far-right material. The duo essentially argue that a radicalized audience already existed, it was simply constrained by too little supply of radical material.
On the one hand, the report is a productive reminder that social media is a sociotechnical system. Technologies are never purely determinist and any analysis should strive to account for the political and cultural background of users, their relations to others in the world, and the racial and gendered worldviews that “link” content together, even without an engine or automated system. As Rebecca Lewis (2018) has shown, the network of alt-right influencers on YouTube is a social network in the conventional sense—a web of individuals who share particular ideologies, use common phrases, and even recommend each other’s channels organically through formats like the talk show.
On the other hand, however, Munger and Phillips are using a rather conventional economic model to understand online environments. Their analysis presupposes an offline, radicalized audience with their minds already made up. In doing so, it fails to register the psychological and cognitive force exerted by platform environments, a force potentially magnified both by time spent consuming media and by the young age of particular users. Contrary to the duo’s straw man caricature of such influence as a “zombie bite”, this force is not an instant contagion, but something far more drawn out and subtle, a quiet influence that alters individuals as they inhabit online spaces over the months and years. As Wendy Chun (2017, p. x) observes, media exerts force over a “creepier, slower, more unnerving time”, effectively “disappearing from consciousness”. Media derives its power precisely by catering to the curiosities and desires of the user rather than overpowering them.
Along with the recommendation engine, another problematic design element identified in this analysis is YouTube’s comment system. For years, YouTube has consistently held a reputation for being an environment with some of the most toxic and vitriolic comments online (Tait, 2016). Even those used to online antagonism admitted that “you will see racist, sexist, homophobic, ignorant, and/or horrible comments on virtually every popular post” and yet the same post from 2013 naively claims that the problem will soon be solved with new technical features (Rose, 2013). Far from being solved, the years since have seen toxic communication on the platform proliferate and take on concerning new forms. While regarded as a “cesspool” for over a decade, the latest indictment has been a large number of predatory and sexual comments on the videos of minors (Alexander, 2018).
Why is YouTube so toxic, so angry? One common explanation is that YouTube is simply one of the largest platforms. For some, its extremely broad demographic explains its trend towards the lowest common denominator in terms of intelligent, relevant commentary. Yet while the platform certainly has a massive user-base, there also seem to be clear design decisions exacerbating these toxic comments. “Comments are surely affected by who writes them”, admits one analysis (Polymatter, 2016), “but how a comment system is designed greatly affects what is written”. For instance, YouTube comments can be upvoted or downvoted, but downvoting doesn’t lower the number of upvotes. This suggests a design logic that favors any kind of engagement, whether positive or negative. The result is that provocative, controversial, or generally polarizing comments seem to appear towards the top of the page on every video (Fig. 6).
The design choices built into both YouTube’s recommendation engine and its comment system might be understood as natural outcomes of an overarching set of company values. As recent articles have shown (Bergen, 2019), YouTube as purposefully ignored warnings of its toxicity for years—even from its own employees—in its pursuit of one value: engagement. Of course, this should come as no surprise for a publicly listed company driven by shareholder values and the broader dictates of capitalism. However it opens the question into what values are prioritized within online environments and how design supports them. Rather than grand vision statements or aspirational company values, what are the incentives built into platforms at the level of design: features, metrics, interfaces, and affordances?
Echoing this low-level design influence, the community manager I interviewed underlined how the typical all-consuming focus on likes and shares could be damaging. A key part of a community manager’s role is to foster healthy relations between members, to encourage beneficial content, and to block, delete or demote toxic posts—in short, to facilitate “more of the good and less of the corrosive”. But her fellow community managers often speak of “algorithm chasing”, where they attempt to combat or counteract the features built into the systems they use. There are often “competing logics” on a platform, she explained, an opposition between the value of creating a cohesive and civil community, and the values seen as necessary for platform growth and revenue such as expanding a user-base, extending use times, and attracting advertisers. Social media and community are often an awkward fit, and “marketing efficiencies are not social efficiencies”. On YouTube specifically, these designs privilege engagement above all else, resulting in a community that can be toxic and angry. Yet design might be rethought to prioritize an alternative set of values.
How might design contribute to a calmer, more considerate and more inclusive environment? One concrete intervention would be a redesigned recommendation system. Programmer and activist Francis Irving (2018) has found that the current system described earlier is both populist, prioritizing the popular, and short-term, using criteria to find videos that you’ll watch the longest. What kind of design interventions would make it more conducive to user well-being? For one, the system could be intentionally broadened, breaking its hyper-focused bubble and instead providing access points into a range of communities and a diversity of political views—even those that run counter to the user. Of course, other possibilities abound. Irving (2018) suggests one playful alternative: ask whether a YouTube user is more or less happy 6 months later, and use this signal as a way to improve video recommendations. As another option, Irving (2018) speculates about removing automated recommendations altogether, and moving to a more user-centered recommendation model. Like film or music, such a model would elevate taste makers who could curate great “playlists” of content.
Secondly, the comment system might be rethought entirely. It is clear that the current upvote/downvote binary is not working, rewarding quick immediate comments that are provocative—at best flippant, at worst, hateful or degrading. It also seems apparent that the relative anonymity of commenters and lack of any concept of reputation means that there is no real disincentive for consistently generating toxic comments. As one analysis noted (Polymatter, 2016): “Each comment stands on its own, attached to nothing, bringing out the worst in every commenter”. Introducing a reputation system into this environment would be one concrete design intervention. Reddit, for example, features a Karma system that rewards high quality comments while docking points for comments against community guidelines. Such a system, while naturally not perfect, significantly “thickens” the identity of a user. Each user has a history of contributions and comments that persists over time. Based on this past behavior, they have a combined score that signals whether or not the community has found their contributions helpful or beneficial. Even if this score is mainly symbolic, these reputation systems hook into offline conventions of social standing within a community, introducing a degree of accountability.