In my PyCon talk, I sort of jokingly presented a social network graph of the characters in Othello. In this post I’m going to show you how to make that graph with Python:

social network graph of Othello

In order to go about this, we’ll be using a few things:

  1. PlayShakespeare.com’s XML-marked-up Othello
  2. NetworkX, a Python library for creating and drawing graphs
  3. An IPython Notebook. If you’re doing data analysis/graphing in Python, use one of these!

A Word on Graphs

Graphs consist of nodes and edges: nodes are points in the graph, and edges are the connections between those points. So in a social network graph, the nodes are the people in the social network, while the edges are the connections between those people.

While each major social media platform generally agrees on what constitutes a node—a person, namely—each has a different definition of what constitutes an edge. On Facebook, users are linked by being “friends” with each other. On Twitter, users follow one another. Edges on Facebook are symmetric: I’m friends with you if and only if you’re friends with me. That’s not the case with Twitter; I can follow you without you following me, and vice versa.

Our Shakespeare social network will more closely resemble Facebook’s than Twitter’s, since our edges will be similarly directionless. We’ll say two characters are connected if they appear in the same scene together. We’re also going to add a further dimension, specifying the weight of an edge. The weight will be determined by how many scenes two characters appear in together. This way we can distinguish the relationships between characters that appear rarely together from those between characters that interact more often.

So here’s some data we’ll want to associate with our nodes and edges:

  • Node: character name, gender, number of lines
  • Edge: if and how many times characters appear together in the same scene

Now let’s take a look at our raw data.

PlayShakespeare.com’s XML

All the information we could want from a character—name, gender, number of lines, scenes—can be found in the <persona/> XML tag:

<persona gender="male">
  <persname short="DUKE." numberOfLines="65" numberOfVerseLines="64" numberOfProseLines="1" numberOfLyricsLines="0">
    Duke of Venice
  </persname>
  <persscenes numberOfScenes="1">
    <persscene>1.3</persscene>
  </persscenes>
</persona>

With a few lines of Python (using lxml), we can pretty straightforwardly write a function to extract the data we need to construct our nodes and edges:

def extract_metadata(persona):
    """
    `persona` is an XML element from a document that's been parsed with
    lxml.
    """
    name = persona.find('persname').text
    name = "\n".join(name.split())
    gender = persona.attrib['gender']
    number_of_lines = int(persona.find('persname').attrib['numberOfLines'])
    scenes = set(scene.text for scene in persona.iterdescendants('persscene'))

    return name, gender, number_of_lines, scenes

Creating our Graph

Now that we know what metadata we’re looking for and how to get it, we can construct our nodes:

# Read the XML file.
from lxml import etree
with open(othello_file) as f:
    othello_xml = etree.fromstring(f.read())

# Initialize our graph.
G = nx.Graph()

# Iterate over the personae xml elements.
G = nx.Graph()
for persona in play_xml.iterdescendants('persona'):
    name, gender, number_of_lines, scenes = extract_metadata(persona)

    # To avoid noise, let's only consider characters who
    # have more than 5 lines.
    if number_of_lines > 5:
        G.add_node(
            name,
            gender=gender,
            number_of_lines=number_of_lines,
            scenes=scenes
        )

And then our edges:

import itertools
for (n1, data1), (n2, data2) in itertools.combinations(G.nodes(data=True), 2):
    # Since each node's 'scenes' value is a set, we can easily count
    # how many scenes two characters have in common by counting
    # how many elements are in the sets' union.
    scenes_together = len(data1['scenes'] & data2['scenes'])
    if scenes_together:
        G.add_edge(n1, n2, weight=scenes_together)

Now that we have our graph, we can compute interesting things about it. Let’s take a look at the 7 most “central” characters in Othello. The metric we’ll use is degree centrality, a normalized count of the number of edges connecting to a single node. So, in this context, degree centrality will measure how many other characters a given character appears in a scene with, which will give us a sense of how socially central each character is in the play.

sorted(nx.degree_centrality(G).items(), key=lambda x: -x[1])[:7]
>>> [(u'Desdemona', 1.0),
    (u'Iago', 1.0),
    (u'Othello', 1.0),
    (u'Roderigo', 0.9375),
    (u'Cassio', 0.8125),
    (u'Emilia', 0.75),
    (u'Montano', 0.625)]

By this model, Desdemona, Iago, and Othello have the same degree centrality; an interesting fact considering the disparity in the number of lines between Desdemona and the two men (Desdemona has 365 lines; Iago and Othello have 895 and 848, respectively). Desdemona is as socially embedded into the play as Iago or Othello is, but she isn’t afforded a voice commensurate with her network centrality.

Plotting our Graph

Computing attributes about graphs is cool… if you’re a nerd. But let’s be real; the reason we’re all here is to learn how to make pretty social network graphs for Twitter Shakespeare.

Let’s just throw our graph into NetworkX’s draw_networkx function to see what it spits out:

nx.draw_networkx(G)

poorly executed first try at generating a social network graph for Othello

Wow, that looks horrible. The colors of the edges are too dark, the nodes are scrunched too close together, and the border makes no sense. However, we do see that edges with greater weights are shorter than lighter edges. This provides a nice visual cue about the strength of the social connections between characters.

Let’s clean this up a little:

import matplotlib.pyplot as plt

plt.figure(figsize=(13,8))  # make the figure size a little larger
plt.axis('off')  # remove the axis, which isn't meaningful in this case
plt.title("Othello's Social Network", fontsize=20)

# The 'k' argument determines how spaced out the nodes will be from
# one another on the graph.
pos = nx.spring_layout(G, k=0.5)

nx.draw_networkx(
    G,
    pos=pos,
    edge_color='gray',  # change edge color
    alpha=0.3,  # make nodes more transparent to make labels clearer
    font_size=14,
)

social network graph with some improvements, Othello

Now this is starting to look a whole lot better. For our finishing touches, let’s scale the size of each node by number of lines, and let’s assign color by gender.

# First, let's create a list of node sizes:
node_size = [data['number_of_lines'] for __, data in G.nodes(data=True)]
# Then, let's create a list of what the node colors should be:
node_color = [
    'blue' if data['gender'] == 'male' else 'red' for __, data in G.nodes(data=True)
]
# Finally, we pass this into the "draw_networkx" function:
nx.draw_networkx(
    G,
    pos=pos,
    node_size=node_size,
    node_color=node_color,
    edge_color='gray',
    alpha=0.3,
    font_size=14,
)

social network graph of Othello

There we go! Hang this one up on the fridge: a nice social network graph for Othello.

You can check out the code I used for this post in this IPython Notebook.