No echo in the chambers of political interactions on Reddit
We gather data from Reddit, through the Pushshift collection33. Reddit is organized in communities, called subreddits, that share a common topic and a specific set of rules. Users subscribe to subreddits, which contribute to the news feed of the user (their home) with new posts. Inside each subreddit, a user can post, or comment on other posts and comments. Thus, the overall discussion under each post evolves as a tree structure, growing over time. In addition, users can also upvote posts and comments to show approval; they manifest disapproval with a downvote. Each message therefore is associated to a score, which is the number of upvotes minus the number of downvotes it has received.
Given the two-party nature of the US political system and the polarized state of its political discourse, we approach the problem by modeling the interactions between groups of users labeled by their political leaning—specifically, according to which candidate they support in the 2016 presidential elections. We then model such interactions as a weighted, directed network, where nodes represent users and links represent comments between them. On top of political leaning, we also characterize the users in terms of their activity, i.e., their propensity to engage in interactions with other peers, and popularity, as given by the score assigned to their comments. In the remainder of this section we explain these three steps more in detail.
Political leaning of Reddit users
We identify the political leaning of Reddit users by looking at their posting behavior. With respect to the 2016 US presidential elections, users can be characterized as supporting the Democratic candidate, Hillary Clinton, or the Republican candidate, Donald Trump. On Reddit, we identify specific subreddits dedicated to supporting the main presidential candidates. For Donald Trump, we select the subreddit r/The_Donald; for Hillary Clinton, we choose the subreddits r/hillaryclinton and r/HillaryForAmerica.
The subreddit r/The_Donald was created in June 2015, at the beginning of Donald Trump campaign for the Republican party nomination. It has been one of the largest online communities of Trump supporters, with 269,904 users in November 2016. Participation in this subreddit is a valid proxy to study Donald Trump support, as the rules of this subreddit explicitly state that the community is for “Trump Supporters Only”, and that dissenting users will be removed. As such, it has been previously used in literature to analyze the behavior of Trump supporters34,35. r/hillaryclinton and r/HillaryForAmerica are the main communities that supported the Hillary Clinton’s campaign in 2016. The former was created in 2015, while the latter was created in 2016 specifically to support her presidential bid. In November 2016, they were able to attract 35,142 and 3025 Reddit users, respectively. Since the stated goal of these communities is to support her presidential campaign, and they forbid the use of the subreddit to campaign for other candidates, we consider active participation in these communities as a good proxy for support for the Democratic party candidate. We call these subreddits the home communities for each candidate. We identify 117,011 users who actively posted on r/The_Donald in 2016, and 13,821 on r/hillaryclinton and r/HillaryForAmerica. Given the massive use of Reddit as a political tool by Trump’s campaign23, the difference in size between the two communities is not surprising.
Although these subreddits are dedicated to supporters of the candidate, we find that 3702 users post in both subreddits (2.9%). In order to disambiguate the leaning for these users, we retrieve the Reddit score of their comments in the home communities. The score represents the difference between the number of upvotes and the number of downvotes assigned by other users visiting the same subreddit. Upvotes are generally understood to encode approval, appreciation, or agreement; downvotes encode their opposites. Thus, a user with a higher score on Clinton and a lower score on Trump is most likely a Democratic supporter. Following this reasoning, among all users who posted on both home communities, we consider a user as Clinton supporter if they have an average score on their comments on the Clinton home community that is larger than the score on the Trump home community and vice-versa. Users with tied scores are discarded, as they represent only 5% of the set of tied users (0.145% of the overall set of users).
Therefore, we define the political leaning of a user u as a binary label (L_u), assigned as Clinton supporter ((L_u =C)), if they post only on Clinton’s home community, or they posts on both communities and have a larger average score on Clinton’s community, and as Trump supporters otherwise ((L_u =T)). Our method identifies 10,240 users as Clinton supporters and 110,806 users as Trump supporters.
Network of interactions on Politics
To study the interactions between the two sides, we need a community that is visited regularly by both groups, but which is still topically related to politics and popular enough. The best candidate for such a role is r/politics, since it is the largest political subreddit. We collect all submissions and comments in the year 2016. From the collected comments, we reconstruct the network of political interactions among the users we previously identified. Among these users, 31,218 authored a message on r/politics in 2016 and thus appear as nodes V in the graph ((N_T=27,012) Trump supporters, (N_C=4206) Clinton supporters). Nodes correspond to users with known political leaning, while a weighted, directed link (u, v) corresponds to user u posting a comment as a response to user v. The weight (w_uv) corresponds to the number of such interactions from u to v. Note that the link direction represents the interaction, and is opposite to the information flow (user u should have read what v wrote to answer, but it is not guaranteed that v will read u’s reply).
In the Politics network, the probability to find a node labelled
as (X in C,T) (henceforth, X node for brevity) in the network is (P(X) = N_X/N), corresponding to (P(T) simeq 0.87) for Trump, (P(C) simeq 0.13) for Clinton. The main properties of the Politics network are reported in Table 1. The joint probability to observe an interaction from an X node to a Y node reads
where the rows of the matrix indicate the leaning of the author of a comment, and the columns the one of the target, (W = 716,765) is the total weight of the links in the network (that is, the number of interactions between all considered nodes), and (W_XY) is the weight of directed links from X nodes to Y nodes:
$$beginaligned W_XY = sum limits _u,v in V mid L_u =X wedge L_v =Y w_uv. endaligned$$
We denote with (W_rightarrow X = sum _Y W_YX) the number of interactions received by X nodes ((sum _Y) denotes the sum over all possible label assignments to Y), and (W_X rightarrow ) the ones originated by X nodes. It follows that (sum _XY W_YX = sum _XW_X rightarrow = sum _XW_rightarrow X = W).
Diagonal elements of the matrix in Eq. (1) correspond to the interactions within political groups, off-diagonal to those across groups. The sum by rows (columns) of the matrix in Eq. (1) corresponds to the probability that an X node initiates (receives) an interaction, (P(X rightarrow ) = fracW_X rightarrow W,) ((P(rightarrow X ) = fracW_rightarrow XW)). From Eq. (1), interactions across communities, or cross-interactions look symmetric between Clinton and Trump communities. However, joint probabilities do not take into account the difference in size between the two groups. This result stems from the fact that the probability that Clinton nodes initiate an interaction, (P(C rightarrow ) = W_C rightarrow /W simeq 0.35) is much larger than the fraction of Clinton supporters in the network, (N_C/N simeq 0.13), which implies that Clinton supporters have much larger weighted out-degree than Trump ones.
These characteristics can be further inspected by considering the conditional probability to observe an interaction from an X node to a Y node, given that the first node has leaning X,
By looking at the columns of Eq. (2), in absence of homophilic or heterophilic effects, one would expect elements of each column to be equal: given the author of a comment, the probability to interact with the two groups would be equal, given only by the size of the group. Instead, we can observe that Clinton supporters tend to interact more with Trump supporters (72% of interactions) than Trump supporters themselves within the community (62%). The same effect is visible for Trump supporters, who are more likely to interact with Clinton ones (38% of interactions) than the Clinton community within itself (28% of interactions). These intuitions will be solidified in Section 3, by comparing these values to a null model of random social interactions.
Finally, we compare the average sentiment polarity of each type of interaction. To do so, first we measure the sentiment polarity (ranging from (-1) to 1) of the textual content of each interaction according to VADER36; then, we compute the average values according to the possible pairs of labels. In this way, we obtain:
First, we observe that interactions within Trump supporters are more negative than interactions within Clinton supporters (average sentiment of 0.0575 vs 0.0126). In addition, cross-cutting interactions between groups have on average a more negative sentiment than interactions within groups. That is, Clinton supporters commenting Trump supporters have an average sentiment of 0.0110, while when commenting on other Clinton supporters the average sentiment is 0.0575. The same is true for Trump supporters. This difference is consistent with the hypothesis that cross-cutting interactions are a potential expression of conflict.
Reddit score and activity of users
Political interactions on Reddit can be further characterized in terms of the score assigned to each comment or submission, and the activity of users, i.e., their propensity to engage in interactions with other peers.
In network terms, the activity of a user u, (a_u), can be measured by the total weight of out-going links from node u, which corresponds to the out-strength of node u: (a_u =sum _v w_uv). Figure 1a shows the activity distribution P(a) in the Politics network, plotted separately for Clinton and Trump supporters, both with typical heavy-tailed behavior. The activity distribution of Trump supporters decays more rapidly than for the Clinton ones, thus indicating a propensity to engage in a larger number of interactions from Clinton supporters.
The Reddit score of a comment is a measure of its popularity and, as such, it strongly depends on the subreddit where this comment is posted: popular comments posted on the subreddit r/The_Donald will be likely unpopular in subreddit where opposite political views dominate, such as Clinton-oriented subreddits. We define the popularity of a user u on a subreddit as the average score of their comments on that subreddit, (s_u), and it will thus depend on the subreddit under consideration. Figure 1b shows the popularity distribution P(s) of users in the Politics network, separately for Clinton and Trump supporters. While the function form of the P(s) distribution is similar for Clinton and Trump supporters, comments by Clinton supporters have much larger scores on average, while the scores of Trump supporters span a larger interval of values. This observation implies that the overall attitude on the politics subreddit is more favorable to comments from Clinton than from Trump supporters, although users classified as Trump supporters are a much larger set than Clinton supporters.
This liberal bias in the general opinion of r/politics, however, does not seem to discourage Trump supporters from commenting in large numbers. Therefore, since we wish to study the two communities and how they interact, r/politics is the best arena to observe such interactions. Our set of users of interest is not a representative of r/politics users. Nevertheless, we are not interested in studying the typical behavior of users in this subreddit, but in analyzing how these two polarized communities interact in this arena. The fact that the two communities are not representative of the politics subreddit is therefore of no consequence.