Racial Perceptions - A First Glance

Be sure to subscribe for updates on this and all my other data analysis projects! 

Last night I had a few hours to kill at a cafe. The cafe had decent coffee, and slow wifi, so I thought it would be a good chance to mess around with the preliminary results from my race perceptions survey! I am intending to dress this up quite a bit when I get more respondents (the survey will remain open for now).

For now, here is a first "quick and dirty" look at how our perceptions of racial composition track with reality.

Map of Respondents

Here is the updated map, with ~380 people responding across the country. There is major clustering around certain major cities, owing to my sending the survey to my Facebook friends (largely in WA), and two region-specific subreddits (r/seattle and r/sanfrancisco). The sample of respondents is ~80% white, 55% male, and 90% are within the 18-25 or 25-35 age groups.

In red: density of zip codes across the US. Survey respondents are marked in blue.

Census Maps of Race in the US

These are some simple (UGLY!) maps of the racial composition of the US by ZIP code. Nothing surprising here... Large percentages of asians are only typically found in major cities, blacks are a larger contribution to the south/eastern US population, and the north is very white.

Perception versus Reality

And now the reason you're probably reading this! Again, these figures do not carry my highest mark of quality, but are interesting at least. First, here is the raw "guesses verses actual Census data" figure for the 4 primary races. I have renormalized people's estimates to add up to 100%. In general, people have the right idea. The finer details (below) tell an interesting story, though...

Perceptions versus reality. Broad agreement.
Note colors do not correctly match with below... 

If you calculate the "geometric distance" of the guesses from the actual numbers (basically adding the differences in quadrature) you get a distribution of how close to reality people are. The distribution is strongly peaked around ~10% (median is 12), meaning people in total get the "racial landscape" correct to within 10%. Not bad!

My colleagues will be pleased that the (very Poisson-like) distributions are the same for men and women. The women's median "correctness" is only larger by 1%. I haven't run a K-S test on these distributions or anything fancy like that, but I'd say this is within the sample errors.

Finally, I've taken the difference between peoples guess and the real population, and plotted as a function of the actual population. If people are spot-on, then their answers would lie on the thick grey zero-line.  If they over-predict the contribution from a given race, then the answer lies above the line, and visa versa for under-predictions.

For Asians we see an interesting decreasing prediction with increasing population (with 1 very large outlier in the top right). This is a shape seen in the distributions for all the guesses about races, except white.

The prediction distribution of white people has a slightly different result. The same decreasing-type trend is seen, but significant over predictions dominate when there are actually low % of white people.


These last 3 figures could be interpreted to tell this story:

We seem to think there are lots of minorities in communities that are actually very white. This is shown as the over-prediction in the black and asian figures at low actual %'s.  Very quickly the interpretation flips, and we seem to fairly consistently under-predict the contribution from minorities (and over-predict the % of whites) in more diverse communities. 

I'm not convinced this is the whole story told by the data, but it's an interesting first look. There are many more variables at work here that I can study. For example, I haven't broken it down by any of the supplemental data taken on the respondents themselves yet! Examples include:
  • Which race predicts the racial landscape more "correctly"?
  • Which gender does?
  • Which age group?

We can imagine lots of fun ways to cut the data with all these variables at work!


I have gotten lots of feedback from people noting shortcomings in the survey. I want to just reinforce/acknowledge a few excellent points/questions that have been raised:
  1. Yes, I am aware that polling my friends/colleagues/facebook/reddit will not give a robust or unbiased sample (however you choose to define such a thing).
  2. I still think the results have meaning, despite said selection effects, provided one does not over interpret the data.
  3. Hispanic is not a race, but an ethnicity.
  4. I agree that the lack of multi-racial or multi-ethnic options greatly limits the survey, but the bigger requirement was to get as big a sample as possible. Hard to do by myself, it turns out! I would love to get 1,000 people from the same city together and ask them a 4-page survey that includes many racial/ethnic options! (actually a possible follow-up to this project)


Incredibly, 40% of respondents have given me their email addresses. I solemnly promised to not spam you, and since this is not the final analysis of the survey, I have opted to not email them yet. If you think I should instead send this initial post around, let me know!

OK! That's all I've got at the moment. Please keep sharing/posting/tweeting/blogging the survey - I'd love to get the sample size to 4-digits! 

Lastly: a HUGE thanks to all ~380 of you who've contributed so far!


  1. Very interesting. It looks to me like people consider local racial demographics on larger regional scales than their zip code areas, so that their perceptions tend to lean toward a safe guess near the the national average. If you were to fit lines to these downward trends in predicted - actual % vs. actual %, would these cross the actual prediction line near the national average percentages? That could be interesting to look at.

    Looking forward to more data!

    1. I live in Memphis, and the city itself is 70% black. However, white flight has created the scenario that during the day, about 60% of the people in the city are white. Maybe most major cities have the same situation, where the place where people sleep and work are far enough away from each other that demographics aren't entirely accurate.

  2. I came upon a similar thought last night: some of the "noise" in the data, due to people's perceptions of their regional scale being larger than the ZIP code, can be hammered down by averaging the few surrounding ZIP's.

    One possible way would be to average the Census data surrounding each respondent within, say, a half mile.

    Curious idea about finding the national average. I'll poke around more in the next iteration!

  3. How do you define race? You seemed to take the definitions as a given.

    Is someone from Saudi Arabia considered Asian? Or someone from Papua New Guinea? Are Mexicans white? Where did you get "Hawaiian" from as a race, and by that token, what happened to Native Americans?

    I'm curious what definitions were used for your source statistics, and whether you actually provided any in your survey questions. I've heard the clarifying term "non-Hispanic whites" since Latin Americans have a combination of New World and Old World roots. On your part, you seem to have excluded the concept of Hispanics entirely, though they are judged to be the most quickly growing "racial" minority in the US.

    Just my two cents. Race is a construct, so you have to define your terms before you can jump to the math.


Inappropriate comments, advertisements, or spam will be removed.
Posts older than 2 weeks have moderated comments.
(Anonymous commenting disabled due to increasing spam)