In my PyCon talk, I sort of jokingly presented a social network graph of the characters in Othello. In this post I’m going to show you how to make that graph with Python:
In order to go about this, we’ll be using a few things:
- PlayShakespeare.com’s XML-marked-up Othello
NetworkX, a Python library for creating and drawing graphs
- An IPython Notebook. If you’re doing data analysis/graphing in Python, use one of these!
A Word on Graphs
Graphs consist of nodes and edges: nodes are points in the graph, and edges are the connections between those points. So in a social network graph, the nodes are the people in the social network, while the edges are the connections between those people.
While each major social media platform generally agrees on what constitutes a node—a person, namely—each has a different definition of what constitutes an edge. On Facebook, users are linked by being “friends” with each other. On Twitter, users follow one another. Edges on Facebook are symmetric: I’m friends with you if and only if you’re friends with me. That’s not the case with Twitter; I can follow you without you following me, and vice versa.
Our Shakespeare social network will more closely resemble Facebook’s than Twitter’s, since our edges will be similarly directionless. We’ll say two characters are connected if they appear in the same scene together. We’re also going to add a further dimension, specifying the weight of an edge. The weight will be determined by how many scenes two characters appear in together. This way we can distinguish the relationships between characters that appear rarely together from those between characters that interact more often.
So here’s some data we’ll want to associate with our nodes and edges:
- Node: character name, gender, number of lines
- Edge: if and how many times characters appear together in the same scene
Now let’s take a look at our raw data.
All the information we could want from a character—name, gender, number of lines, scenes—can be found in the
<persona/> XML tag:
With a few lines of Python (using
lxml), we can pretty straightforwardly write a function to extract the data we need to construct our nodes and edges:
Creating our Graph
Now that we know what metadata we’re looking for and how to get it, we can construct our nodes:
And then our edges:
Now that we have our graph, we can compute interesting things about it. Let’s take a look at the 7 most “central” characters in Othello. The metric we’ll use is degree centrality, a normalized count of the number of edges connecting to a single node. So, in this context, degree centrality will measure how many other characters a given character appears in a scene with, which will give us a sense of how socially central each character is in the play.
By this model, Desdemona, Iago, and Othello have the same degree centrality; an interesting fact considering the disparity in the number of lines between Desdemona and the two men (Desdemona has 365 lines; Iago and Othello have 895 and 848, respectively). Desdemona is as socially embedded into the play as Iago or Othello is, but she isn’t afforded a voice commensurate with her network centrality.
Plotting our Graph
Computing attributes about graphs is cool… if you’re a nerd.
But let’s be real; the reason we’re all here is to learn how to make pretty social network graphs for
Let’s just throw our graph into NetworkX’s
draw_networkx function to see what it spits out:
Wow, that looks horrible. The colors of the edges are too dark, the nodes are scrunched too close together, and the border makes no sense. However, we do see that edges with greater weights are shorter than lighter edges. This provides a nice visual cue about the strength of the social connections between characters.
Let’s clean this up a little:
Now this is starting to look a whole lot better. For our finishing touches, let’s scale the size of each node by number of lines, and let’s assign color by gender.
There we go! Hang this one up on the fridge: a nice social network graph for Othello.
You can check out the code I used for this post in this IPython Notebook.