Gender of White House Visitors

Last summer, as as part of my internship working with these awesome people at MSR,  I spent a lot of time playing with public data sources. One fascinating dataset that I chose as a benchmark (for what is currently known as Tempe at MSR) is the White House Visitor records, which (as of last July) had over 3 Million records of visitors to the White House during the Obama administration.

This dataset has been in the news before, and is (in my opinion) a great example of public disclosure that we should be pushing for in government. A whole other conversation of course is how/when such records should be released, and by whom. The White House Visitor dataset is also known to be incomplete, censoring records for national or personal security reasons, and maybe other reasons too.

Here is just one question I came up with: Do more men or women visit the White House? My guess was that a majority of visitors would be men.



To make this slightly more interesting, I also posted a very simple survey last summer that asked people to guess if more men or women visited the White House. The survey itself was hastily done (read: poorly done) but nearly 300 people kindly responded with their guesses. The distribution of answers looked like this:
Survey results, from 297 participants.
Basically the most common guess agreed with me, that it would be around ~60% men.


First challenge: How does one assign gender, when all you're given is a name?
I've described how I do this before, but this was actually the project where I first tried it! I downloaded the US Social Security Administration's full Baby Name dataset, which has a huge list of name-gender info for more than a century. I've limited myself to names only since 1920 here.

For every name in the SSA dataset I count the # male and # female instances, and assume a flat probability of gender. In other words, I assign fractional people to each gender (e.g. 0.74 of a woman and 0.26 of a man for a given name), with no fractional thresholding. This is not the best way to assign genders, but it is the most straight forward.

aside: I would love to test the robustness of this method using a large corpus of names with known genders (e.g. some personnel records or similar)

The White House Visitor dataset included 3,246,486 entries. Of these 3,105,695 (about 96%) had a name match to my SSA dataset. Of these names, only 4.7% had a SSA-based assumed gender that was lower than 75%. In other words, over 95% of the White House Visitors had a first name that was a single gender more than 75% of the time in the SSA dataset. This means we can actually answer the initial question...

Gender Ratio of White House Visitors:

3,105,695 visitors

That's not bad! Consider: the gender ratio of the entire US (all ages), according to the 2010 Census is 49.1% male and 50.9% female. Note also there are many repeat visitors to the White House, which may induce a gender bias. Undoubtedly there are also some data entry problems, but we can assume those are gender-neutral.

But that's not all... the dataset also included a column describing who the visitor was scheduled to visit! Here are the gender ratios for a few selected descriptions:

1) Tourists

2,070,385 tourists
This is for all records with any variant of the words tour/tourist/tours/etc. I find this a little surprising, and it should be looked in to further!

2) POTUS

172,794 POTUS vistors
Here I've included only records that included the exact term "POTUS" (stands for President of the United States). The "President's Men" are by and large just that: men. This makes sense to me, given the high fraction of men among CEOs, military leaders, and politicians. I'm even a bit surprised it's this close, honestly. I think this is actually a sign of very good progress.

3) FLOTUS

27,989 FLOTUS visitors
The First Lady has 6x fewer visitors listed, and is dominated by female visitors. This implies a very different source of visitors to the First Lady. If watching all 7 seasons of West Wing taught me anything, it's that the First Lady is expected to deal with women's issues. There's a fascinating discussion to be had on the role of the First Lady, and what constituency she should be expected/allowed to deal with, particularly for one as well educated and brilliant as Michelle Obama.

Conclusions

This study is in no way conclusive, and each of the subsets of visitors I have selected may have large overlaps. However, it does provide a hint that the White House is not simply a "Boys Club", but instead some gender equality does appear to be reaching the highest office in the land. How will this change in a different administration, or with a different political party in office? (I'd love to see the records from the previous administrations!) What if a woman is elected POTUS? These are questions that only more data and time can answer. As I said in my survey of gender in astronomy talks, if we can bias the answer simply by studying the problem I'd be thrilled.

No comments:

Post a Comment

Inappropriate comments, advertisements, or spam will be removed.
Posts older than 2 weeks have moderated comments.
(Anonymous commenting disabled due to increasing spam)