What’s even more fun than listening to love songs? Analyzing love songs!
I’ve spent my spare time in the last few days writing a web scraper to collect a large number of lyrics from LyricWikia. I’ve collected 16,699 lyrics from 1980-2014, to be precise. When looking through the data, I was struck by the variety of terms we use in songs to refer to woman. Here are the frequencies of the most common terms for women across the entire data set.
In contrast, only two terms appear for a man within the most common 500 words, and both are neutrally toned:
The pair that really caught my eye was bitch and angel – both extreme characterizations of real human beings occurring at roughly the same frequencies. How do these dichotomous terms compare over time?
Starting in the 80s, the terms were used at roughly the same levels. Starting around 1992, however, “bitch” begins to take off. Often both terms rise together – for example, the hump from 1995-1999 and 2005-2007, but in recent years “bitch” is leaving “angel” far behind.
The effect of both terms rising and falling together was not really interesting to me, since it could just be a result of more male artists putting out albums in that year, for example. Subtracting bitch minus angel gives me what I’m looking for – a single metric for the “bitch”-iness of the year.
There are a lot of other metrics you could use to determine the significance of a word in a subset of a corpus (TF-IDF comes to mind), but I like my bitchiness metric because it is very easy to communicate.
So, without further ado, here’s this post’s money plot, the first quantification of the bitchiness of music over time: