The Lego Wall

No comments:
This has nothing to do with data visualization, but I'm counting it as a hack project.

My wife and I have had an unused fireplace in our downstairs den. We were looking for a colorful and fun way to cover it so our pets wouldn't climb in. This is our solution. It is brilliant, which is how you can tell Sarah (wife) came up with it!

Check the rest of the album out here...

Lego Wall

Happy Holidays!

2016 Election Issues, by the Numbers

No comments:
You might have read or heard somewhere that in 2016 the US will once again hold a presidential election. For my international readers, a US election is a lot like a game of poker being played by circus animals on live TV: there may be a "winner", but typically we all lose.

There was much media chatter after the Democratic candidate debates (e.g. on NPR) saying how they focused more on "the issues" than the Republicans. This might be empirically true, given how much smaller a field the Democrats are currently competing in. It got me thinking: what are these "issues", and what does each candidate actually say about them?

I'm of the opinion that a good metric to judge a presidential candidate by is their website. They have total control of the content, layout, and aesthetic, and most are well funded (or, well enough to hire a startup web developer at least). You should be able to see what the candidate really thinks of their own image and platform, and their stance on "the issues". It's amazing how terrible some of these websites are...

Words Words Words

Still, websites offer a ready source of information on the distilled message each candidate has prepared on "the issues".  So to gather this data, I went to each of the candidate's websites, found whatever page (or usually pages) they had listed as "issues" or similar, and wholesale copied all the text. This totaled over 128k words, 74k from the Democrats and 54k form the Republicans. Here are two word clouds (generated via showing all of this text for the Democrats and Republicans:

These simple word frequency clouds show fascinating contrasts and similarities. The top words for the Dem's include candidate names, while for the GOP it's "President" and "tax". Over representation of social issues is clear for the Dems (e.g. "health", "care", "women") while the GOP has a strong emphasis on "Security". The word "government" is much more prominent for the GOP than the Dem's.

When I dig down in to the small/medium words, I'm struck by how similar they really are. The main bipartisan "issues" of the day appear to include "security", "jobs", and "health", but that probably doesn't surprise anyone.

Comparing Numbers

While Martin O'Malley may not have had a ton to say in the democratic debates, he certainly has a lot to say online. We get it, O'Malley, you probably were a NaNoWriMo winner three years running too.

On the other end of the spectrum (both in politics and verbiage) is Lindsay Graham , whose website squeaked out a dismal ~2200 words... about the length of two Op Ed pieces in the NYT or WSJ.

I don't know what the optimal number of words is, but I find it fascinating how few some of the GOP candidates have put up, and how much more the Democrats did. Speculating, the paucity of words from GOP candidates could be due to a believed lack of interest among potential voters in long-form content. Typically the GOP websites had fewer topic headings in their "Issues" pages (a notable exception is Rand Paul's page).

One of the best and worst types of analysis you can do with written language is study it's complexity. A very simple version of this, which I've featured repeatedly on this site in the past, is the Flesch-Kincaid Reading Level. This metric attempts to compute how complex language is by measuring the # Words per Sentence, and the # Syllables per Word.

There's good reason to think this is not a fair measure of intellectual complexity, but still there have been curious trends found with it in the past. You typically see a slew of articles about the Readability or Reading Level of State of the Union speeches each year. Also, the government requires some documents to pass certain Reading Level tests to promote more accessibility to the laws of the land.

With appropriate caveats in mind, here is the readability metric for each candidate's "Issues" page(s):

Trump's "Issues" page ranks lowest in terms of reading grade level. Bush isn't far behind him. Interestingly, they had the most number of total words among the GOP sources.

Rubio tops it off by a wide margin. Some say the ideal sentence should be 15-20 words. This randomly selected sentence has 42 words, and certainly contributes to the complexity of his text:
"The horrific mass shootings should prompt us to ask what causes people to commit these acts — like what can be done to improve the way we treat serious mental illness — rather than seize on the weapons they used." [source]
The Dem's, however, are all over the map compared to the GOP, but really are quite close in actual score. I suppose I'm not surprised O'Malley is the champ here too. Given there's only 3 candidates, and no strong outliers, it's not clear what we can say overall about the Dem's readability scores.

Pick your Favorite

For fun, here's the word cloud for each of the candidate's "Issues" pages. (pro tip: make a drinking game with these, maybe for the next debate?) I'll leave these without comment, except to say I'm sure you can interpret them any way you please.

View post on

There's a lot you can learn from a bag of words, and a lot of neat games you can play. I've put the data and snippet of code freely available on GitHub. A neat project would be, say, to create a probabilistic sentence generator based on text from each candidate... It would also be awesome to see what we might learn from archived webpages of successful candidates from elections past. Does text-heavy win? Or are people excited about the short-and-sweet message?

I can't say I'm excited about this coming election season, but at least it will give material for Saturday Night Live to work with... and that's something we can/should all appreciate.