Race in US Colleges

Be sure to subscribe for updates on this and all my other data analysis projects! 

The Chronicle posted a nice spreadsheet containing race, ethnicity, and gender data from ~4300 institutions of higher education across America. (Note: the article and data file are now behind a paywall, which was not in effect when I downloaded the data set)

It's a really intriguing data set, and I thought it was worth a few minutes of my time to play with it. My results are amusing, but I don't think I've fully captured the rich potential this data has to offer serious researchers.

My first question was simple: what does the most basic racial composition of US Colleges look like?

I then remembered one of the first questions I ever asked when studying diversity/race in Universities: How does the racial composition of a school compare to the state as a whole?

To answer this, I needed to match the Chronicle table against the US 2010 Census data by State. Using every school I have plotted as an example the % of black students versus the % of black citizens in the entire state. Colors of each pixel represent the density of points. The orange-ish "1 to 1" line would represent perfect agreement: 43% of schools have more black students per capita than their state, and 57% of schools have fewer. I actually think this is remarkably good, considering how bad the data probably looked even 30 years ago. Still one might argue there is generally need for improvement. Further, the more illuminating graphic might be % minority graduating students versus % minority state citizens.

I then wanted to look at the spatial distribution of these schools, which is easiest to do (in my experience at least) by plotting the lat/long of ZIP codes (easiest way to "geocode" addresses across the nation). Pity, the Chronicle table doesn't have ZIP codes... So I cleaned up the Institution Name column it did provide and matched this against the US Dept of Education Accredited Postsecondary Institution database (a truly fascinating data set on its own). About 2/3 of the schools in the Chronicle list matched up easily (strictly) to the Accredited database. Rather than chasing down more string matching ghosts, I called that good enough. From this I could match accredited schools' ZIP codes to the Census! 

Here is the US map of accredited institutions...

And here is the map broken down by schools with a higher % black students than their state (Blue) versus schools with a lower % (Red) -- Note: this is somewhat misleading, especially in New England, because I have plotted Red after Blue, which covers Blue points, making it seem like Red is dominating in some parts when it's actually quite comparable.

To help break this visual degeneracy, here is an animation of each map...
Just by eye you can see more rural schools are red...

And for kicks, here is the cumulative distribution for each sub-map as a function of latitude. South of about 40deg latitude the curves diverge a bit, though admittedly not in the fashion I naively expected. The Blue line rises quicker at the Southern latitudes, indicating a higher fraction of the schools with more black students per capita than their state. The Red schools, those with lower % black students per capita than their state, seem to be slightly more preferred in the North.

However, going back to the Red versus Blue maps, there are certainly other geographic trends with these schools. My "by-eye" analysis (read: shooting a bit from the hip here) is that Blue schools tend to be more clustered around big cities, and Red schools in more rural areas - especially in Texas and the midwest.

This has been a simple demonstration of just some of the fascinating stories this data has to tell. I certainly hope many more big surveys like this are published, and would be fascinated to see some with more details about the student body. The ultimate goal is of course to find where/when we are failing to serve people in higher education, and how we can improve!


  1. You should try a map similar to the cumulative latitude map, but in longitude. You might find that naive rise you were looking for. I wonder also how the graph might look if you control for HBCUs which are by nature artificially inflated in black people.

  2. Also, I'm wondering what the map would look like for hispanic students. I'm willing to bet it's a lot worse just because they haven't had the same degree of civil rights push that blacks have had which have resulted in a lot of institutionalized pushes for blacks to get into college.

    Then maybe a split on gender if that data's available...


Inappropriate comments, advertisements, or spam will be removed.
Posts older than 2 weeks have moderated comments.
(Anonymous commenting disabled due to increasing spam)