Author Archives: anthonygarvan

About anthonygarvan

Web Developer and Data Scientist in Chicago.

Goodbye WordPress

Hey there! I’ve decided to deprecate this blog, so I won’t post here anymore. I’ve moved the (hotly in demand domain) anthonygarvan.com to point to my github pages domain, which is now running a simple jekyll template.

The original vision for this blog was to be a place for “all my projects,” but I have found that I like to do a lot of projects that provide rich web-native experiences (like Word Galaxy and recently Rich Dot, Poor Dot). By using a jekyll template, I can directly link to all of my work as an internet person, from those native pages to write ups on Medium. I could continue to write here and link to it, but Medium is just a much better writing, reading & sharing experience.

So long WordPress!

 

Advertisements

YoCongress!

This labor day weekend I wrote a twitter bot that forwards your tweet on to the representative in the US House that corresponds to your location. It’s never been easier to complain to the government! So, for example, if you write this tweet in Chicago:

yoCongressOriginalTweet

The bot will tweet this:

yoCongressForwardTweet

I need testers! Please pick your favorite issue and give it a whirl, remember to enable location on your tweet by clicking / pushing the pin icon. Note that it takes up to a minute to forward your tweet on (due to twitter API rate limitations), and that it only forwards original tweets or replies, not retweets.

Also, I have had some trouble with twitter flagging the account as spam in some brutal bot vs bot action, but I’ve made changes to my bot that should avoid that in the future. Hopefully.

If you are interested in the code (<200 lines!), check it out on github here.

MEMBERS OF CONGRESS CAN OPT OUT BY TWEETING TO @YOCONGRESS WITH HASHTAG #OPTOUT.

Republic: A Democratic Microservice Governance Platform

Note: This is a blog post about a github project I am working on – to skip all the words and check out the code, go here.


Ever since Charles Babbage’s Difference Engine, programmers have been drawing an analogy from computers to brains. Fundamental terms like “memory” and “compute” are borrowed from the animal kingdom, and hot trends like deep neural networks continue the tradition today. Perhaps we draw the analogy as part of a fantasy of making a machine in our own image, or perhaps it is part of a hope that, by creating a thinking machine, we can better understand ourselves. Whatever the motivation, one thing is clear: the computer-brain analogy is not is a useful way to think about how computers work.

Computers are rigid, impractical, and if there is even the tiniest thing wrong they completely grind to a halt. They are not creative, they are not struggling to survive, they don’t have compassion, and they won’t tell you what’s really going on with them unless you really wrench it out of them. In other words,

Computers aren’t like brains, they’re like bureaucracies.

All this time, we were drawing the wrong analogy! The structure of code much more closely resemble social patterns than thought patterns. This is ingrained in an idea in computer science called Conway’s Law. It states that the architecture of a piece of software (i.e., the pattern of function calls) will mirror the communication pattern of the organization that produces it. Or, in my paraphrasing,

Our software works like how we work with one another.

So, if your organization is a little monarchy, with people at the top deciding on fundamental technology choices which persist throughout your applications, you will have a monolithic architecture. Dictatorships are great for getting things done quickly, but they have a hard time evolving because decisions made a long time ago are difficult to undo. For example, you may have decided in the 1980s that your banking system should run on Cobol because it is the hottest new thing. A hundred million lines of code later, you have an entire industry written in a dead language. Monarchies are so bad at adapting, in fact, that they often just get replaced completely by new regimes- more nimble competitors who aren’t burdened by legacy code.

One solution to avoiding that collapse is by delegating more power to small groups, and asking that they all work together to make the organization function. Each group gets a tiny plot of code to manage, but they get full reign for that tiny slice of the business – managing all technology choices, hiring, and deployment of their code as they see fit. This is service-oriented architecture (or microservices, if they are really small), and it’s the architecture that runs Amazon, Netflix, and Google. These architectures are a constellation of fiefdoms: many lords ruling over little plots of the business, only collaborating through well defined interfaces. The problem with fiefdoms is that, while they can evolve, they can only evolve. It is difficult to pull off big infrastructure projects in fiefdoms because you have to get all the little lords to agree to make it work. This is why medieval England didn’t have a great interstate system!

What I am proposing today is a middle ground, inspired by one of the enlightenment’s hottest trends. If software architecture is like social structure, we should choose an architecture that has been proven to be fantastically scalable, and strikes a balance between the rigid monolithic monarchies and hopelessly quibbling fiefdoms.

The software architecture that strikes the best balance between sustainability and competitiveness at a large scale will be a democracy: independent microservices governed by en elected body.

To make the idea more concrete, I have implemented a democratic microservice government in javascript, geeks can check it out on github here.. Sorry, no live demo for now, just a well tested API. Here’s how it would work:

  1. A microservice applies for citizenship with the government web service. The government service manages a library of API interfaces, and checks the applicant service url to verify that it passes some simple tests. These could be, for example, “is alive” checks, logging protocols, etc.. If the url passes the tests, it becomes a citizen.
  2. Developers who work on a codebase that is a citizen can run for office and / or vote in an upcoming election. An “office” is a role in maintaining the government codebase in github – ie, a collaborator who manages pull requests for a specific portion of the code.
  3. At some point, election day comes and new representatives are elected. These reps are automatically made collaborators for the github project, and anyone who didn’t get elected (except for the owner) gets booted. These elected reps are the ones who set the vision for the government, define the interfaces, etc.

This way, everyone is free to make virtually any technology choices they want for their slice of the system, but they still get regulated, and in return they have basic guarantees on what other citizens are up to and how they can interact. To prevent stagnation in leadership and resentment of that regulation, they have a say in who is at the top.

Yay democracy!

Budget Fadeaway: Every line item in the White House’s 2016 Budget Proposal, Interactively Visualized

Budget Fadeaway is my latest large-scale data visualization project, you can explore it yourself here.


Quick instructions:

  1. hover over a square to see the item subcategory, account name and expenditure amount,
  2. drag & drop to pan
  3. scroll wheel or pinch to zoom
  4. search for a word or phrase contained in the line item using the search box on the upper right.

Also, it works much better on desktop – mobile is a little funky.


 

For the first time ever, the White House open sourced the raw data files for its 2016 budget, allowing citizens to explore the proposal in more depth than ever before. You can get high-level summaries of the budget in many places – including the White House website. But executive summaries are a form of lossy compression: I prefer gritty, raw, piercingly high resolution.

The goal of Budget Fadeaway is to let you experience the scale and explore the role of the federal government. So, have fun – or panic if that’s your style – and feel free to comment below or tweet me @anthonygarvan.

Unlike most of my other projects, this one does not involve any machine learning: it’s just a plot of all the data where the area of the item is proportional to the dollar amount of the line item. The position of the items is not meaningful: I just used a bin packing algorithm with fixed spacing to give it a layout that I thought was attractive and clear. To handle such a high number of points, the front end is built off of a high performance 2D game engine, pixi.js. All of the code is available on github.

Word Galaxy: An Interactive Map of English

Today, I introduce Word Galaxy, an interactive map of the 20,000 most common words in English. To explore the map for yourself, click here.

Word Galaxy plots words in such a way that words with similar meaning appear closer together. To obtain the meaning of a word, the algorithm rely’s on Firth’s law: “You shall know a word by the company it keeps.” It’s based on the latest and greatest research in natural language processing, specifically Google’s deep neural network Word2Vec trained on 200-300 million words of Wikipedia, and Lauren van der Maaten’s t-SNE dimensionality reduction. The data is rendered with a great 2D game engine, Pixi.js. For more about the algorithms and technologies behind Word Galaxy, you can view the source code and references on the Word Galaxy Github page.

Technology aside, the point of word galaxy is to let you explore meaning and culture spatially. Please, explore away and feel free to leave your comments and insights below or on the reddit post. If you are more into the technical side, I also posted it to /r/MachineLearning here.

I noticed a few things about the structure of the data that I will share with you because hey, you’ve read this far! Generic words tend to be more in the center, while domain-specific words tend to be towards the periphery. Science and technology words (e.g., “logarithm”, “variable”) are on the east, while historical and social words (e.g., “crusades”, “Ptolemy”) are on the west. If science is on the east, and the humanities are on the west, what’s in the middle? On the north, it goes mathematics -> music -> sports -> famous people’s names -> place names -> history. On the south, it goes mathematics -> physics -> optics -> electrical engineering -> software -> finance -> law -> religion -> history. Kewl. Another fun fact: “spreadsheet” is almost exactly opposite of “vegas”.

For example, here is a plot which links calculus to Jesus:

calculus to jesus

If any of you can find any patterns in the generic core of the map, let me know! I struggled a bit to find large-scale patterns there, but they are probably there.

Angel or Bitch? How We Sing About Women

What’s even more fun than listening to love songs? Analyzing love songs!

I’ve spent my spare time in the last few days writing a web scraper to collect a large number of lyrics from LyricWikia. I’ve collected 16,699 lyrics from 1980-2014, to be precise. When looking through the data, I was struck by the variety of terms we use in songs to refer to woman. Here are the frequencies of the most common terms for women across the entire data set.

woman in songs

In contrast, only two terms appear for a man within the most common 500 words, and both are neutrally toned:

men in songs

The pair that really caught my eye was bitch and angel – both extreme characterizations of real human beings occurring at roughly the same frequencies. How do these dichotomous terms compare over time?

angel and bitch

 

Starting in the 80s, the terms were used at roughly the same levels. Starting around 1992, however, “bitch” begins to take off. Often both terms rise together – for example, the hump from 1995-1999 and 2005-2007, but in recent years “bitch” is leaving “angel” far behind.

The effect of both terms rising and falling together was not really interesting to me, since it could just be a result of more male artists putting out albums in that year, for example. Subtracting bitch minus angel gives me what I’m looking for – a single metric for the “bitch”-iness of the year.

There are a lot of other metrics you could use to determine the significance of a word in a subset of a corpus (TF-IDF comes to mind), but I like my bitchiness metric because it is very easy to communicate.

So, without further ado, here’s this post’s money plot, the first quantification of the bitchiness of music over time:

bitch minus angel