If We Assume: January 2014

Galactic Map Projections

6 comments: Topics: Astronomy, maps, visualization

One of my absolute favorite scenes from The West Wing (one of my favorite shows) wasn't about politics, or war, or even the President. Instead, it was a quirky plea to change the default map projection used when teaching children about the world. For your amusement, here's the clip...

sorry - clip has been removed from YouTube. I'm sure you can search and find the West Wing clip about map projections with little effort...

For the curious, some more details about the Gall-Peters projection!

Map projection isn't just a concern for cartographers, city planners, or ocean travelers. Astronomers deal with maps all the time as well! (Sometimes we astronomers forget this.) For example: one of my absolute favorite graphs from the last decade in astronomy comes from the SDSS, and is titled the "Field of Streams"...

Credit: Vasily Belokurov, SDSS-II Collaboration

This is a map of very faint sub-structure seen in our Galaxy using the SDSS. To give you a sense of scale, here is a comparable region (angularly speaking) of Earth using an even-spaced grid of lat/lon as in the figure above. Distortion of the far northern latitudes is (or should be) very apparent!

So let's playfully extend the logic from the West Wing. If using traditional Earthly map projection can lead to generations of international strife, then our concern for heavenly projections should be far greater! Consider the interstellar politics at play, and the need to avoid a false sense of Orion-Imperialism. I hereby create the Organization of Astronomers for Celestial Equality (OACE - now accepting membership applications).

Let us consider, what does your Galaxy really look like?

For this example, I've grabbed 2.5 million random stars from 2MASS, a famous infrared all-sky survey. Here is a density map of the Milky Way (from our vantage point) in the most basic map projection: uniform steps in latitude and longitude.

Here is the same data in a handful of other projections...

and my personal favorite of the set...

Goode Homolosine

Look at how far the LMC and SMC appear to move between projections! What do you think, which is the best projection? Which is the most accurate for astronomy?

Tweets & Readability

4 comments: Topics: Microsoft, Reading Level, technology

As I've mentioned previously on this site, this past summer I had the great fortune to intern with MSR in Redmond, WA. Much of the summer was spent discussing, imagining, and thinking about data and science. Additionally, I spent the last few weeks there writing a short paper on one of the data explorations we undertook with Twitter! (I also made a poster about pataphysics, video games, and pandas... but that's another story)

I found writing a paper in another field delightfully challenging. It's like going back to kindergarten... you have to learn the language, the structure, the pacing and voice. Most importantly, you have to stumble through their literature, trying to appear competent enough to contribute to the scientific process! (Mostly you try to quack like the right breed of duck without looking like a total fraud!) My mentors at MSR helped in this last step as much as they could.

The publication process in CS is quite different from Astronomy. For example: publishing in conferences instead of journals, two-way anonymous refereeing, low acceptance rates. I enjoy the sheer number of places you can submit your work. In astronomy we have a fairly small number of respected journals to cite literature from, while CS seems to have endless numbers of specialized conferences on every sub-discipline. Pros/Cons to both models abound.

I submitted my paper to a well regarded conference, but eventually it was not selected (though reviews were quite positive!) Probably I'll make another set of changes and submit it again. In the meanwhile I wanted to give it a stable online home, so I did what any astronomer would do: submit it to the arXiv.

"The Readability of Tweets and their

Geographic Correlation with Education"

Check it out!

A major part of this project was based on the US Census, which (as ever) was a fascinating data source. Here's a figure not included in the paper, but made from Census data. It shows the relation between median household income and the fraction of college degree holders within a given ZIP code.

Remember kids: correlation != causation.... but stay in school.

The paper outlines how we gathered a large sample of Tweets and measured their Readability (reading ease). Here's a cute figure for tweets with geo-data (lat,lon), grouping in to ZIP code areas and measuring the average readability (high reading ease #'s = simpler sentences). No large scale coherent trend is present, but there does appear to be sub-structure. This is something I'd love to follow up more, using some actual statistical/spatial analysis.

Finally, this is the "money graph" for the paper. Here we've shown the average reading ease score in each ZIP code (actually a ZCTA) compared with the % of college graduates. There is a significant anti-correlation present, which I think is very interesting! More intriguing, we didn't find a strong correlation with median income, nor the high school graduation rate.

Average Readability score as a function of college graduate rate. Lower scores indicate more complex text.

A few things could be the underlying cause of this apparent relation:

There are significant demographic differences between ZIP codes with very high #'s of college grads and those without. These higher educated people may use more complex language in their tweets, but this seems too speculative to be convincing to me.
The content type of tweets may be different in these higher education ZIP codes. For example, promotion of news/events versus personal status updates. Content-tagging a massive number of tweets is needed to understand the dependence content has on linguistic complexity.

To my knowledge only a handful of (very interesting) studies have investigated linguistic complexity within Twitter, and none I'm aware of in its geographic or regional dependence. The neat thing about Twitter is that it is a (massive!) living data set, and you can repeat these experiments every day.

Just for fun, here are a few neat projects/studies being done with data gathered from or derived from Twitter:

Hedonometer (global happiness measured from Twitter)
Geography of Happiness
Anagramatron

What's Trending in Astronomy - #AAS223

No comments: Topics: Astronomy, word cloud

Once again it's time to play Astronomy, the Gathering, as the 223rd meeting of the American Astronomical Society is in full swing.

Here is a word cloud I created, using the entire science abstract book. I've shown the 250 top used words (excluding numbers). I always enjoy making this diagram, silly though it may be. Many of the most used words you see are "mechanics": affiliations, days of presentation, etc. You can zoom in and get a sense for what the hot topics are currently in Astronomy.

Some other fun things jump out. A few names are evident (e.g. David and John). You also can quickly see the two powerhouse states for doing astronomy (or at least for presenting at AAS): CA and MD.

Click image for full resolution version

Have fun, dear astronomers! Enjoy talking about Kepler, galaxies, and stars... and if you're so inclined, help me gather data for my project on gender in talks!!

Welcome to 2014

No comments:

I'm not a fan of "year in review" articles, at least not until the year is properly over! But now it's 2014, everything looks so fresh and new. Flying cars are expected any day. Big Data is going to be elected president. Web 4.0 will come this year, you mark my words.

(incase you're wondering, Web 4.0 is the internet of babies and great-grandparents)

2013 was a great year for If We Assume. I started the year wondering if I'd be able to recapture the excitement that was generated by the United States of Starbucks. I'm grateful that two other posts generated lots of interesting conversations/interactions: Airports of the World, and the (De-)evolution of My Laptop Battery.

I spent my summer as a stranger in a strange (but wonderful) land, as an intern at MSR. Looking back on it now after a few months, this internship was one of the most interesting experiences of my life.

I also launched a new project, Mock Twain. This ridiculous bot is already in to Chapter 2 of tweeting the entirety of The Adventures of Tom Sawyer. It's a combination of art and technology, and I'd love to chat with you about it.

2014 promises to be action packed, and likely more Astronomy focused. I'm about 1.5 years from finishing the PhD, have about a half dozen conferences/trips planned, and hopefully will start seriously shopping my "science brand" around.

On the blog/data/project side, my New Year's Resolution is to write more. Not just more blog posts (though that would be good) but more of everything. I'm exited by this new website Medium and I have a few cool blog post ideas that I've been sitting on for several months.

So here's to productive and exciting new adventures, hoping the third year of If We Assume is as fun as the last two, and that you all have a grand 2014.