If We Assume: 2012

Talk With Your Hands

No comments: Topics: Everyday Data, statistics

In the last few years TED conferences have come to represent the top rung for "intellectual rock stars." What began as a small annual talk series in the early 1990's, has ballooned into an international sensation since they began providing videos online in 2006. It's a remarkable powerhouse that has made data experts like Hans Rosling a household name. TED now comes with legions of fans, people willing to pay several thousand dollars for the privilege of sitting in the audience, a fellowship program (a shining example), and even a "play along in your hometown" version called TEDx.

TED is not without critics, as with all successful endeavors. The founder of TED is no longer involved, and is trying to "reinvent conferences again". Here's an interesting review of the growing "intellectual populism" that TED is contributing to; not necessarily a bad thing, but it's interesting.

One thing that I have taken away from watching a fair number of TED talks since 2006, as well as watching the late Steve Jobs speak, is the value in being a seriously good public speaker. There's a lot involved in making a great talk, from slide design and composition (a recent post I wrote on the importance of contrast in slides for Visually), to speaking tone, pace, inflection, and body language.

Recently I noticed something about the thumbnails shown for TED videos: the frame chosen from the talk seemed to usually feature people gesturing with both hands.

Gesturing with your hands is very normal when you talk, especially when you're giving a lecture or speech. Funny or unusual hand motions when talking can become iconic (I've always loved Dana Carvey's George Bush Sr. impression for this). Good hand motions can even help you convey your authority or confidence when talking, which is absolutely an asset.

I wondered, looking at a few dozen of these thumbnails, how many hands do people typically gesture with in TED talk thumbnails?

So, consumate scientist that I am, I gathered some data!

[ Continue Reading ]

Plots that Changed the World - III

No comments: Topics: history, maps, visualization

Today I'm very happy to publish the next installment of my continuing series of posts on visualizations that changed the world. (Part I, Part II) The title is, of course, a double entendre as I'm both reviewing the plots (graphs/maps/diagrams) that are themselves historic or represent key points in history, as well as the "plots" of the story behind them.

Magellan - Traveling the World

As a child, you should come across some variant of this map once or twice. If your childhood was anything like mine, it was sandwiched between boring spelling lessons and unimaginative math quizzes. But, dear reader, ponder for a moment the absolute magnanimity of what these people accomplished. They sailed around the %&@$ planet!!!

The voyage of Ferdinand Magellan, from Wikipedia

[ Continue Reading ]

The Chart that Wasn't There

No comments: Topics: color, contrast, visualization

Hop on over to the Visually blog today to see my guest post:

The Chart that Wasn't There: Avoiding Disappearing Plots in Presentations

You can see my Visually page here, where I post versions of some of the figures that have been featured here on If We Assume.

Thanks to Drew Skau and the great folks at Visually for inviting me to write a piece!

Kepler Binary Clock

No comments: Topics: art, Astronomy, visualization

Here's a fun animation I created, artistically visualizing over 1,300 eclipsing binary stars from Kepler.

Binary Clock from James Davenport (or see it on Visually)

Each "hand" represents a binary pair, and each concentric ring groups the binaries by their orbital period. In the center are hundreds of binary systems with orbits shorter than a day (the shortest is just 6 hours!) At the speed of the video, these orbits are a blur. The outer-most hand tracks a binary star system with an orbital period of about 120 days. This is considered quite a long period binary, even though it's 3 times shorter than an Earth year!

Here are some extra details:

Period data from v1.96 of the Kepler Binary Catalog.
Animation is 1500 postscript frames,
converted to png with ImageMagick,
animation made with ffmpeg.
Music: As Colorful As Ever by Broke For Free

The Best Data Visualization Sources

2 comments: Topics: visualization

If you're like me, you spend way too much time surfing for cool visualizations. Here's the list of blogs/websites/etc I read to keep up to date on the growing field of big data and data visualization.

Reddit (where better to sink your time?)

News

Chart of the Day - The Atlantic Wire
Chart of the Day - Business Insider
Graphic Detail - The Economist
Data Store - The Guardian

Blogs

chartsnthings - blog from the masters at the New York Times graphics department
Visually - a veritable warehouse of visualizations, though a bit infographic heavy for my taste
Data Pointed - Stephen Von Worley's blog
dubly - infrequent, but very thoughtful posts
Visualizing Data - by Andy Kirk
eagereyes - by Robert Kosara
Flowing Data - Nathan Yau, see also his list of data/visualization blogs
Information is Beautiful
Edward Tufte - website of the master

Misc

astroplotlib a collection of astronomy plots. I think this has lots of promise, but I'd like to see it more fully realized. Mostly only useful for astronomers...

Where else do you get your viz from? Put some links in the comments!

(shameless self promotion welcomed... within reason)

Subscribe for free updates to "If We Assume" by email

[ Related Posts ]

Plots as Art

Colors in Visualizations

Evil Color Schemes

Washing Pants, by the Numbers

1 comment: Topics: Everyday Data, humor, statistics

Be sure to subscribe for updates to If We Assume!

Last week I asked a simple 3 question survey about washing frequency of pants... specifically: in your opinion how long can pants be worn before needing washing? This is a question I thought up a while back when reading about a kid who wore his jeans for 15 months (crazy!). So finally I did the responsible thing and gathered data! You know, for science...

I wanted to keep the survey short and sweet, and submitted it to about a half dozen subreddits, my facebook, and my twitter feeds. I'm sure there's all manner of other variables that would have been interested to study this against (e.g. jeans versus slacks, as many pointed out), but I felt keeping it short would provide an easier survey and yield a higher response rate. Here is a screen shot of the survey:

[ Continue Reading ]

Colors in Visualizations, a Rainbow of References

2 comments: Topics: color, visualization

“...I wondered if it was blasphemous to tell God that rainbows are kitsch.”
--Steve Toltz, A Fraction of the Whole

Color is one of the most fundamental, and sometimes most challenging, aspects of data visualization. Many times you may not know why a given color scheme looks bad (or good), but your eye can quickly pick it out. There are many schools of thought about color families, color meanings, complimentary colors, and which you should use in figures/plots. The rainbow color table, a default in many programs/languages, frequently produces horrible results. You can do better! Your research deserves better. If people have to squint and struggle to decrypt your colors, then your result isn't being communicated.

Below is a list of links/articles/references I've found useful when thinking about colors in visualization, with some rough organization. Favorites of mine in each section are in bold. The list was compiled with help from my friend Ryan, and I hope it will be of use to you!

[ Continue Reading ]

Viz: Rain in Seattle? Really?

No comments: Topics: Seattle, visualization, weather

Today I'm happy to feature a cross-post from my good friend and fellow graduate student, Nicholas Hunt-Walker. He's written a fun post about data and weather, and has some great looking visualizations! The post is reproduced below, but check out his awesome blog: The Roda, The Stars, The Lessons, The Life

Take it away, Nick:

This is my face most days...

Before I moved out here to Seattle for graduate school I was repeatedly warned about how rainy it tends to be out here. One person even described it as "London-esque". However, when I first visited back in March 2010 (this becomes relevant later), not only was it barely raining, but it was absolutely gorgeous for each of the 4 days I was here. I left here thinking, "where's all this rain people were talking about? It's beautiful here!"

Two years and three months later, I can confirm the well-known Seattle drear. It feels like it's cloudy more often than not, and it's nearly always wet. ALWAYS! Am I just suffering from confirmation bias, or is Seattle's weather always just generally crappy? Well, being the scientist that I am, I sought to answer these interesting and personally-relevant questions.

[ Continue Reading ]

The Reddit Effect

1 comment: Topics: Everyday Data, technology

Today I'm going to discuss my thoughts on a silly topic that has gone by many names over the years. These days I see it dubbed the "Reddit Effect", usually meaning the impact on your website from being posted on Reddit. This has also been called the "digg effect", the "twitter effect", the "stumbleupon effect", the "slashdot effect"... Basically an explosive increase in web traffic to your blog/site that can cripple servers, and has the same (though accidental) result as a DDoS attack.

Of course, bloggers/writers crave this kind of attention. Your home-grown webpage can suddenly get exposed to 100K viewers overnight, if you can navigate the social media/news network waters. Nobody can sign up for your email list, however, if your discount webserver gets fried under peak traffic load. Thus, it's a sensible and practical phenomenon to study.

[ Continue Reading ]

Password Strength

5 comments: Topics: technology

Here's a fun and simple experiment I recently did...

Some time ago I started using phases to remember passwords. This can work a couple different ways:

1) use the first letter from every word in a sentence.

Example: The quick brown fox jumped over the lazy dog = Tqbfjotld

2) use a whole sentence as a password.

Example: The quick brown fox jumped over the lazy dog

When I attended SDSU, they required the most obtuse password rules. It was something along the lines of "must have uppercase, lowercase, numbers, and symbols. Cannot use more than 2 of each consecutively. Must be more than 8 characters." Oy...

[ Continue Reading ]

Race in US Colleges

2 comments: Topics: academia, demographics, maps, race, statistics

Be sure to subscribe for updates on this and all my other data analysis projects!

The Chronicle posted a nice spreadsheet containing race, ethnicity, and gender data from ~4300 institutions of higher education across America. (Note: the article and data file are now behind a paywall, which was not in effect when I downloaded the data set)

It's a really intriguing data set, and I thought it was worth a few minutes of my time to play with it. My results are amusing, but I don't think I've fully captured the rich potential this data has to offer serious researchers.

My first question was simple: what does the most basic racial composition of US Colleges look like?

[ Continue Reading ]

The Graph that Wasn't There

1 comment: Topics: humor, soapbox, technology, visualization

Yesterday I wrote a post on this blog that, upon reflection, was far more cynical than I ever intended. I have removed it, not to hide or to acquiesce, but because I have always wanted this blog to be about celebrating science and data and visualization.

Here is the surviving piece of that post, which I will leave as a means to start a conversation (and a future blog post):

Visualization Design: It's not about obtuse color theory, or infographics, or artistic style, or minimalist Tufte chart theory, or fancy-ass 3D plotting with the latest/hottest software. It's about effectively communicating your story to other people.

Best Selling Book Covers

5 comments: Topics: art, books, Everyday Data, visualization

Be sure to subscribe for updates on this and all my other data analysis projects!

After I finished my masters degree in San Diego, a good friend of mine gifted me a book he thought I'd enjoy. Probably unbeknownst to this friend, my parents and family had long since given up on trying to get me to read books for pleasure. While I'd pour through pages on the internet, and have always loved cinema, I stopped reading (outside of school) when I was about 16.

I was 25, the book was classic science fiction, and it literally changed my life. I read it every day while walking to and from my office on campus. Strolling slowly to school I would get about an hour of reading in per day, and it still took over a month to finish! Not reading fiction for a decade makes your mind out of shape. Now I love books, and have been trying to consume classics that I'd been recommended so many years ago.

This metamorphosis has made me passionate about books again, concerned for libraries, and an active reader. One thing I noticed right away, especially when buying used paperback science fiction, is how bizarre book cover art can be. They range from basic solid hues, to gaudy airbrushed scenes of romance. This was a culture, an entire art scene, that I knew nothing about!

I do know a bit about movie posters, particularly from my youth working in a movie theater. Movie poster styles rely a lot on templates; basic layouts if you will. These are often very similar within the same genre (e.g the heroes in a V formation).

Color choice in movie posters is also fascinating (e.g. orange/blue contrast use in serious/action movies). There was an AWESOME blog post a few months ago by Vijay Pandurangan on the distribution of movie poster colors over time. Seriously, if you like my blog go read that post here! He found that blue has become much more prevalent in the last ~20 years. Neato!

I started to wonder: are there trends to be found among book covers?

Popular colors? Common layouts? Once again late night musings necessitate data!

Gathering the Data

So I gathered the book covers for Top 10 Best Selling books from USA Today. I wrote a script to grab the Top-10 covers every week (actually 4 weeks per month, 48 weeks a year) from 2000 to 2012. USA Today does a great job of aggregating book sales information from tons of sources, and their Top-10 list is easier to use than most other similar digests if you want a broad census for what people are reading.

How does one visualize ~6000 book covers (about 1300 individual books)? ALL AT ONCE!

Here is what 12 years of Top-10 book covers looks like. It is organized in 1-year "bricks", with 2000-2011 top to bottom. In each brick are 48 columns (weeks), Jan 1 on the left, New Years on the right. Rank 1...10 are top to bottom within each brick.

[ Continue Reading ]

VisWeek Ignite Talk

1 comment: Topics: visualization

Just got home from a super fun event! In conjunction with the big VisWeek conference that's being hosted in Seattle this week, Noah Iliinsky and the fine folks at Tableau put on an awesome party at the Hard Rock Cafe downtown. Party attendance was around 400 people!

Besides a couple gratis beers, and great finger food, there were Ignite talks, including one by yours truly about my recent Starbucks article! Here's the video... (with great thanks to my lovely wife for the thoughtful iPhone capture!!)

Yikes, I'm fidgety! But the room had tons of energy!... and I couldn't pull the mic off the stand.

The United States of Starbucks

27 comments: Topics: Coffee, maps, visualization

Check out the 5-minute Ignite Seattle talk I gave on this project:

As you might gather from my blog's archives, I really enjoy maps. They are a widely understood and broadly engaging way to convey information, especially when overlaid with other data.

One of my favorite data-maps is a well known piece by Stephen Von Worley: The Contiguous United States Visualized by Distance to the Nearest McDonalds. The furthest you can get within the USA is ~107 miles, incidentally. It's a brilliant post; fun, personal, and on a subtle level is discussing a deep part of Americana.

Another chain restaurant by which we might measure our lives, particularly in the PNW, is Starbucks. This Seattle-based-behemoth has played an integral role in the coffee culture around the world, and along with the Nerd Triumvirate (Microsoft, Boeing, and Amazon) has secured our city's place on the global stage.

Starbucks has been used in the past to gauge economic health, and I've found it used as a standard metric in geography classes. The closer you live to a Starbucks the higher your rent is likely to be, and in NYC the density of locations can reach as high as 150 with a radius of 5 miles!

I wanted to look at not only how Starbucks were distributed across the USA (like in Von Worley's McMasterpiece) but how we are distributed around Starbucks.

Here is the USA as mapped by Starbucks-owned locations (with thanks to my friend David B), connected using a Delaunay triangulation. As with Mc D's, these latte-slingers are clustered around major cities and highways. As an aside, I wonder if anyone has tried to calculate the optimal path for visiting every location...

[ Continue Reading ]

US Population By Longitude & Latitude

4 comments: Topics: maps, visualization

My friend Eddie reminded me of this nice set of figures that made the rounds a few months ago on Visual.ly (and also Reddit).

I thought it would be cool to zoom in on the USA in high detail using data from the 2010 US Census... so that's what I did. As a bonus, the pixels represent the density of ZIP codes in small bins, which essentially track population density. Enjoy!

[ Leave a comment ]

NFL Replacement Referee Bias

1 comment: Topics: sports, statistics

Today I'm happy to feature the first guest post on If We Assume, written by fellow astronomer Peter Yoachim! He's discussing the now-famous debacle by the replacement referees (see also here) that occurred in last night's Seattle Seahawks game (Some are calling it the "worst call in NFL history"). Take it away Peter...

Getty Images

After watching the refs botch Monday Night Football (Go Seahawks!?), I was wondering if there's a way to quantify just how bad the NFL replacement referees are.

One thing that stood out in the game was how many calls went the Seahawks' way on the final drive--which reminded me of the discussion of home-field advantage in Scorcasting. They concluded that referee bias is the primary driver in home-field advantage across sports. They even note that in the NFL from 1985-1998, the home team won 58.5% of the time, but after instant replay was introduced, the home team only won 56% of the games (1999-2008).

If the replacement refs are much worse than the regulars, we might expect the home-field advantage to grow. My logic being, if the refs are botching more calls, those botched calls will tend to be in favor of the home team, that gives them an advantage, so they should win more.

How have home teams fared so far? After 48 games this NFL season, the home teams have a record of 31 wins and 17 losses, for a whopping 64.6% win rate! But is that significantly more than 56%? 31 wins is actually only 4 more wins than we would have expected with the regular refs. As always happens when I try to calculate the statistical significance of something, I got bogged down in an arcane wikipedia page, when it told me to look up some value from a table. Whenever a statistician tells me to look something up in a table, I reply, "Fuck that, I can Monte Carlo this in 5 lines of Python." So I did:

#play 10,000 seasons of football with 48 games each
hg = np.random.rand(10000,48)
#home team wins 56% of the time
hg[np.where(hg <= 0.56)]=1
#the rest are losses
hg[np.where(hg < 1)]=0
#total up the wins per season
ack = np.sum(hg, axis=1)
print 'probability of home team winning 31 or more games with 1999-2008 refs = %.2f'%(np.size(np.where(ack >= 31)) /10000.*100)+'%'

If you run that, you find out that we would expect 31 (or more) home team wins only 15% of the time. To turn it around: 85% of the time the home teams have fewer wins at this point in the season. We normally say something is significant when we reach the 5% level, so we're not there yet. If the home teams keep winning at a 65% rate (or higher) for 3-4 more weeks we should make it to significance! That's about the only reason I've found to root for the replacement refs sticking around--damned, scabs!

NCAA Football Coach Salary vs. Wins

3 comments: Topics: academia, costs, sports, statistics

Be sure to subscribe for updates on this and all my other data analysis projects!

A question was posed to me the other day: "Is Steve Sarkisian (head coach for UW's football team) worth the money we're paying him?". For the record, his salary is currently around $2.25 million, though he's not paid by tax payer money.

The question of an employee's worth intrigues me. No doubt people have strong opinions/feelings on the matter. How do we quantify this to answer it objectively?

In the case of a factory worker, we might say that the number of gizmos he/she produces per hour without error determines their value. I don't think this kind of metric works for things like teachers...

Still, football coaches are often judged based on their team's performance. So I decided that the best way to answer the question was to compare the salaries and win/loss records for NCAA FBS (aka Division I-A) coaches.

Detailed data on coach salaries wasn't super easy to find, I would have liked to find a neat & tidy table with salary broken down by year for each coach, alas. I did find this nice compilation by USA Today. I grabbed win/loss stats here. Note: for my analysis I have not followed up on any of these stats/teams individually, so no doubt there have been hires/fires and raises/cuts which will affect the specific details.

The correlation between higher pay and better winning percentage is promising. The median salary is $1.46 million. Texas is doing well, but boy they're paying for it! I then subtracted a linear fit (dashed line) from the winning percentages to determine the typical scatter.

The standard deviation in winning percentage at a given salary is +/- 12%. All the coaches that fall within this "region of acceptable performance" are highlighted in purple. I believe these coaches are "worth it". Twice the standard deviation is gold/yellow. Coaches in this region should either be asking for a raise, or watching for the hammer to fall.

There are a few major outliers that bear mention. Boise State is getting a whopper of a deal (as noted in Fig 1), as well as Ohio State. On the unfortunate side, Duke is very far below par; the sole outcast in the negative 3rd standard deviation. This doesn't bode well for an athletics program under scrutiny to dial back costs.

So this has all been in good fun, and I certainly hope no one is actually fired on my account! Looking at Washington in particular, Sark seems to be just below the standard deviation, but in fairness he's only been coach since 2009. After our victory over Portland State this weekend, I'm hopeful he'll make up some lost ground this year!

The full table of data is below the fold...
Update: due to demand from the wise folks on Reddit, I have updated the table to be sorted by School name ~~and added helmet thumbnails.~~ (I took them down, it seemed to be causing havoc with his website)

[ Continue Reading ]

Astronomy Programming

1 comment: Topics: Astronomy, statistics, visualization

Today I'm attending a workshop on "astroinfomatics" at Microsoft Research, and one question that has come up all morning is: how much computer science do we need to teach astronomy students?

Here is my summary slide on the problem

Clearly there is need for people at all levels of expertise, but how much do you need to know to actually do research? What do you think?

The Pace of NSF Funded Research

8 comments: Topics: academia, Astronomy, costs, statistics

Recently on Facebook I came across a note by Chris Erdmann that some handy folks at Harvard put together statistics on (nearly) every astronomy paper from 1995 to present that was funded through an NSF AST grant. This seemed like a really interesting dataset, especially for a young (read: financially uncertain) research such as myself.

So parsing through all 29,042 papers listed, here are two interesting things I've learned...

1. A typical AST grant produces < 10 papers

This is a simple histogram of the number of papers each unique Grant Number produced. Many people have only produced 1 paper with a grant, but the average is about 8.75 (and the median is 3).

Distribution of number of papers per grant, with a
mean of 8.75 papers per grant (blue dashed line)

2. A grant has its peak output of papers at 3.1 years

This is a more intriguing figure to me. I've plotted (Year Published - Year Grant Awarded) as a function of Year Grant Awarded for all 29K papers, and then binned it up with pixels. You can clearly see the peak productivity between 2-4 years. I've marked the mean (solid orange) and stdddev (dashed orange) lines for each year.

"nsfastgrantbib" data from the Astronomy Dataverse

Conclusion

Following this second figure, we expect that the grants from 2009 onward have only produced maybe half of their useful (publishable) results, despite many with dwindling funding.

Likewise, it's very encouraging that the grants awarded today will still be producing usable science well into the next decade!

Finally, we are reassured that Astronomy is an exciting and fast-paced field, and that continued strong funding is required to preserve this fact.

Data Reference
Christopher Erdmann; Louise Rubin, "Compiled List of NSF Grants to ADS Records from 1995 to August 2012", hdl:10904/10152 V2 [Version]

Voyager Data

No comments: Topics: Astronomy

The Astronomy Picture of the Day (aka APOD) today features a super cool plot of the dramatic up-turn in cosmic ray detection rate over the last year that Voyager 1 has measured, which may indicate that the spacecraft is entering true interstellar space.

Here for comparison is the full data (publicly available, thanks NASA!!) over the lifetime of both the twin Voyager probes for the cosmic ray instrument.

The upturn (highlighted by the red bar) is very noticeable from 2011 onwards. Sinusoidal modulations, measured by both spacecraft, seem to occur on roughly an 11 year cycle. I think this figure is even more remarkable than the 1-year subset shown on APOD, as it gives you a flavor for both the history of the Voyager mission, and the magnitude of the discovery.

I frequently think how the seemingly menial data we collect today might be used in the future, and this figure is a grand example. If we want to discover the subtle things, things that take a long time to find, or only change noticeably on timescales longer than a human lifespan, then we need to take the best data possible now. This is our investment in the future of astronomy, and why you mind your p's and q's when reporting and storing your observations.

Grad Student Pay

2 comments: Topics: academia, Astronomy, costs, Everyday Data, humor, quantified self, soapbox

Fees are increasing at the Grand 'ol UW, same as every institution it seems, and along with it the ire of every graduate student in America.

This year my institution will require approximately one entire week of pay per quarter in fees. This must be paid in full by about the 3rd week of the quarter. Despite my being paid from a grant, which covers my salary and tuition costs, the fees must come from my own pocket. This totals almost a month of pay per year given to the University, simply for the pleasure being employed here and signing up for 10 credits of "independent research" each quarter.

Unlike some of my constituents, I'm not terribly bothered about the low wages we're paid. I generally make a livable wage, and the unofficial benefits of my job are enormous. Indeed, many of my friends who make 4x more at great companies would be envious of the daily freedom, travel abilities, and satisfaction that comes with getting a PhD. To be fair though, wee regularly hear from incoming grad students that UW has the lowest offer of pay from any institution they were accepted to. Ouch.

But there's something absurd about having to give such a huge portion of my living stipend back for "fees". I don't have data on hand about the history of fees at UW, or the cost of living in seattle (though I can say that rent is at least double what it was when I first moved here). I'd love if someone could point me to such data, though!

To facilitate some discussion, here is the history of my salary. Note these are not adjusted wages, just raw numbers. Despite having a small stack of college degrees, the earning profile in academia doesn't exactly resemble what I'd expect to make in industry - and thats OK! This figure is incredibly personal in some sense. Many people judge their worth on their income, and discussing your income (or asking others about it) is very taboo, for reasons that escape me. Maybe I'm more comfortable talking about it because I don't have any money.

So there you go, that's approximately how much money you can expect to make as a grad student in astronomy at UW these days. Be prepared to give a large chunk of it back though... Here's a couple relevant PhD Comics on the subject: Unemployment vs Grad Stipends, and Academic Salaries

I seriously invite some discussion on the subject. Got any data on typical earning profiles as a function of age in industry? Can you point me to some cost of living data, or history of fees/tuition at UW (or other schools)?

Update:
A friend of mine in Atmospheric Sciences PhD program sent me their version of my plot for comparison:

Update 2:
Here is the per quarter cost of U PASS since I first became affiliated with the UW

"Quantified Self" Art Show

No comments: Topics: art, quantified self, visualization

The banner for Quantified Self at the Gallery Project.

Fun & self serving news, everyone! A piece of mine (that's fancy talk for some art stuff I made) will be shown at The Gallery Project in Ann Arbor, MI this month!!! I'm super excited, and have never really participated in something like this before. I also found out that the show includes work by the data visualization master, at whose feet we all study, Edward Tufte! As part of my excitement about this event, the banner image for If We Assume will be a version of my piece, for the next month at least.

The piece is called "Capacity", is a medium/large sized print, and is showing in an exhibit called "The Quantified Self". The whole show is an examination of self at the crossroads of data, reflection, and art. Here is a preview of my piece, Capacity.

It was derived from the post I wrote, Laptop Battery Lifetime, on the capacity of my battery and reflection on my computer usage.

Unfortunately due to grad-school budget I cannot attend the opening (Aug 30th), but if anyone you know is in Ann Arbor, tell them to get down to Gallery Project and check my piece out! Quantified Self runs from Aug 30 to Oct 7.

Bonus:

Someone told me they thought Capacity looked like a skyline, so here it is with some simple shading added in GIMP.

Increasing Blog Traffic

1 comment: Topics: Everyday Data, statistics

Be sure to subscribe for updates to If We Assume

This year I decided to launch my personal blog, and it's been more fun than I ever imagined! Blogging is attractive in similar ways as publishing: people reading and discussing your work. Bonus: Writing about data and science provides me with a great subject matter, and lots of fun graphics. I have lots of cool ideas in the pipe, mostly waiting for free time (in short supply during grad school) to write them.

There is a glut of information online about how to build your blog traffic and increase readership. Some of this I've actively worked on, some seems like obsessive nonsense. A lot of it is spam. Here are seven of the lessons I've learned about building a blog. This is largely an empirical study of blogs, including my own, and meant in good fun...

1. Spend time worrying about the design of your blog/website. The default (e.g.) Blogger themes are nice, but they need to be fine tuned. There are lots of little tweaks that are easy to make, and make a huge difference (I think) in the aesthetic.

2. Stop worrying about the design of your blog and write. If your website looks like a Dieter Rams masters class, but only has 3 posts, then you'd better keep your day job. For me, the joy in blogging is writing and discussing ideas with people, not worrying about button placement.

3. Let other people do your work for you. Lots of people re-post material, some advocate for outsourcing or splitting your writing workload. I like to get post ideas from my friends/family. I also use Blogger (currently) to host/drive my blog, as it takes all the work out of backend and lets me focus on the parts I enjoy! I don't want to be a web developer

4. Post often. Duh. Otherwise traffic dies off... fast.

5. Post before Mondays. As this figure shows, the analytics on my blog indicate that Mondays have the highest traffic. This may differ based on where you get readers from, and of course your type of content. I've seen it written that Thursday is the best day to post, in time for the weekend e-traffic. I think this matches my results.

6. Prey on people's insecurities. Again and again I find that one of the best ways to gain blog traffic is to write about gaining blog traffic. If you've ever watched television you know that diet pills and enhancement drugs are big business. It seems sensible, therefore, that if you want to easily get large traffic, write about how people can solve their problems without significant effort.

7. Participate/advertise in social media. I like Reddit and Twitter, but haven't had much luck with stumbleupon yet. YMMV of course. These kinds of outlets provide impulsive blog traffic, and converting this to moderately increased long-term readership is the goal. The figure below shows the impulsive response my site has experienced from (mostly) Reddit, with an increasing baseline of traffic.

Voting: Do Small Counties Matter?

2 comments: Topics: demographics, Seattle, voting

Be sure to subscribe to If We Assume!

With the Aug 7th primary voting already fading to a distant memory, it occurred to me that I never bothered to look at the results! Between the Summer Olympics, fabulous recent weather in Seattle, King Felix's coronation, and signs from God, I can hardly be blamed for this minor oversight.

Getting "out" the vote has been a major movement for most of my life it seems. Who can forget "Vote or Die", which I still consider to be the most benevolent of any voter sign-up organization. The stench of partisan agenda is never far from such campaigns....

I grew up on the Eastern side of Washington State, and have spent most of the last decade on the Western side. Party lines are drawn through our state as if by Creation itself. Even "Seattle conservatives" believe the East siders live in an intellectual rain shadow.

Of course, the primary results yielded almost nothing surprising. Since they provided nothing witty nor urbane to be overheard discussing about, I intended to promptly forget about them. That is... until I realized there was interesting data on the results webpage: voter turnout statistics. Jackpot.

The Lazy Metropolis

It has been reported that voter turnout flirted with record low levels this year. Let's take a look...

Washington State historic voter turnout percentages for
gubernatorial races, with 2012 shown in blue.

This history of diminishing turnout is fascinating to me. What could it mean? A population that is steadily becoming lazier? Or perhaps more disenfranchised and disconnected from the political system? It's difficult to say - indeed I can scarcely think of a rigorous way to test such sentiments. Figures such as this certainly tell a strong story though, and give cause to wonder: will voter turnout continue to linearly decline? Will we reach a point of negligible participation in our own governance? (In so much as voting qualifies as participation. Whether the entire machine has been taken over by the bipartisan corporate-backed military industrial complex... or aliens... is an exercise for the reader)

Looking in detail at the voter turnout for just this year's primary may give another clue to the nature of such low numbers.

Washington State Primary 2012 voter turnout for all 39 counties,
as a function of the number of registered voters per county.

Another intriguing (somewhat weak) trend is found when looking at the 2012 Primary turnout for each county: more populous counties tend to have lower turnout! This is especially fascinating, indicating to me that the more metropolitan people can't be bothered as much to vote. In other words, Urbanization may negatively affect voter turnout.

The story might be that over time people move in to cities (Seattle, Spokane, Tacoma, etc), and as they do they lose incentive to vote. Again, whether this is due increased laziness towards politics seems difficult to prove with testable predictions. Considerably more data exists for WA, and it would seem beneficial to conduct such a study in order to reverse this trend.

And with an increasing percentage of people living in cities, we do need to reverse it. If such a model for decreasing voter participation is true, we must learn why.

Little Fish

Along the lines of urbanization discussed above, it is worth considering the impact that the more dutiful counties are having. This question results in my stupid title: Do Small Counties Matter?

Here we have the cumulative distribution of registered voters in WA. The way to read this is that we have sorted each county by the number of voters (left to right), and then added them up one by one. The incredible result is that the four largest counties (Spokane, Snohomish, Pierce, and King) contain more than 50% of the registered voters! (actually 58.9%) Urbanization in Action!

This also means, naturally, they have the majority of the voters. But, didn't we just see that the rural counties had higher turnout? Well, yes. Unfortunately it's almost insignificant...

As you can see, the cumulative distribution of actual votes in the 2012 Primary is nearly the same as the distribution of the voters. Nearly - but not quite the same. The proud sense of civic duty earned the 35 other counties 1.1% more of the total votes.

So, it would seem something worth talking about did come from looking at the election results! Vote or Die: participation is not the worst of evils.

Data's Head

No comments: Topics: humor, Star Trek, visualization

Through seven seasons of Star Trek The Next Generation, the android named Data provides a vehicle to (not so subtly) discuss the human condition. Viewers also delight in the futuristic way Data is constructed, or at least as it was conceived in the late '80s with a limited budget. Time and again we see him disassembled, panels opened, limbs removed, fingernails lifted...

So if you're curious, here's all the manufacturer-reccomended places you can open an androids head:

Curiosity has Landed

No comments: Topics: Astronomy, statistics

NASA - Curiosity's Heat Shield in View

Doug, the really nice barista at my favorite coffee shop, was chatting me up today about the Mars Science Laboratory (MSL), better known as Curiosity. We agreed that the entire landing affair was stunning, and the photos being sent back are nothing short of inspirational. I particularly liked the photo from Mars Reconnaissance Orbiter of the parachute deployed on Curiosity.

The interaction got me thinking about the social impact of big news events in science. One curious aspect of the MSL landing (to me at least) has been the online and social networking presence, which I've not really seen in a NASA mission before to this extent. Sure, more traditional media played a large part in the popularity of this landing, but Twitter seemed to be the place du jour to spectate. In fact, Curiosity has its own official Twitter feed, and a less official but perhaps more insightful one as well.

Engineers, NASA interns, space buffs, news outlets... everyone seemed to be Tweeting about this plucky robot. Now, people like me are discussing the discussion (so meta), hoping that it signifies a new era of public outreach/interaction, and driving more ~~suckers~~ inspired youths to study science.

So I thought it would be cool to look back at the trend of Tweets mentioning the landing, using the simple analytics tool from Topsy Labs. Ignoring the bizarre horizontal axis tick spacing, it's fascinating to see the "shape" the event takes: a strong ramp up for a couple days before the landing (Aug 5, if that's not blazingly clear), a sharp declining interest the day after, and a long tail, the latter no doubt filled with pithy humor.

Chart from Topsy Labs.

Mars is so cool... bydhttmwfi

Plots that Changed the World - II

2 comments: Topics: Astronomy, history, visualization

Today I'll continue my series of posts Plots that Changed the World. This time I'm focusing four of the most important results in astronomy's history. These are plots that changed astronomy, and in turn our understanding of the universe.

Astronomy has a long history of changing the world. Ancient peoples relied on astronomy to plan crop seasons, foretell the outcome of wars, and in recent history stars provided the most reliable/accurate means to navigate and tell time.

If you ask your neighborhood astronomer what the most famous or influential figure in the field is, they'd give you a wide range of answers (come to think of it, that's actually a really interesting survey to conduct...). The four graphs I have chosen here are undoubtedly famous, and behind each are wonderful stories of human intrigue and hard-fought discovery.

Other important physical figures, such as the beautiful clock tower in Prague, the Antikythera device, and (many) complex astronomical alignments of buildings, easily come to mind as noteworthy in the history of astronomy. As always, I've picked graphs that resonate with me, often because of the stories surrounding them. I invite you to share your thoughts on these plots and other famous astronomy results in the comments!

Let's proceed in chronological order...

[ Continue Reading ]

How to Choose a Mac

3 comments: Topics: Apple, costs, humor

As I've mentioned in previous posts about laptop battery life, my 15" MacBook Pro (MBP) is over 3 years old. I'm now starting to seriously look at buying my next computer. How do you choose which to buy?! A few things conspire to make this a stressful decision for me:

1) I'm very particular about buying things like computers.

2) I rely on my portable computer as my only machine (by choice/design) and use it day and night for both astronomy and my side projects (like my blog!).

3) I have been traveling a good amount, around 3 or 4 conferences and a few weddings per year. This is going to continue (or maybe increase).

All of my past Apple machines have been excellent, and I've selected them each with care and forethought. Usually I wait to buy until the machine that makes sense gets an upgrade, and there is a whole niche industry in predicting/discussing such timescales. For example, I purchased this MBP shortly after an update, and right before I was starting my PhD program. I knew the best value for a high performance portable computer would be the 15" MBP model.

This time I'm very torn. I had been hoping that Apple would release a 15" MacBook Air, essentially unifying their laptop line. Largely this is what they've done, with the new slimed down and beefed up MacBook Pro Retina (MBPr).

The MBA has also grown, for its part. What used to be a big netbook has now blossomed into a respectable workhorse. For things like writing, traveling, and sheer beautify, the MBA is unmatched in my eyes. With a few affordable upgrades, the 13" MBA is on my radar as a real contender.

Both the Retina and the Air are capable of handling my day-to-day workload, both have been recently updated, both argue for portability. How to choose? MacBook Pro Retina or MacBook Air? Go for the top-of-the-line expensive heavier animal, or the urban and sophisticated marvel? I've asked people for their opinions, but the feedback was limited.

I decided to see if looking at numbers could help the matter! Is there an objectively better choice? Is one Apple laptop truly more desirable?

Resolution and RAM to Weight ratios as a function
of price for various Apple portable computers.

Here I've shown the screen resolution to weight ratio, and RAM to weight ratio, both as a function of price. These seemed to me the most important characteristics of a new laptop. As expected, the MBPr completely overpowers the resolution argument, while the MBA with upgraded RAM and CPU dominates the memory to weight ratio (with the MBPr in a respectable 2nd place).

These were interesting, as were other comparisons of the stats, but a clear winner didn't emerge. I needed a way to combine all of these characteristics, resulting in an "Awesomeness Parameter" that I could use. Then, simply pick the machine with the highest Awesomeness: easy!

I toiled for ~~hours~~ minutes, weighting the 4 parameters in various ways, and finally arrived at this:

The "Awesomeness Parameter", which includes Price, Weight,
Pixel Count, and RAM, shown as a function of the price.

The parameter "Q2" includes of all these factors: price, weight, resolution, and RAM. Each quantity is combined using my proprietary algorithm. The parameter is normalized to my current MBP (red), and thus Q2 = 1. The red dashed line shows how Q2 changes if you simply decreased the price of my current machine.

While this statistic is meant in utter jocularity, it (strangely) accurately describes my feelings about the current laptop line. I've been very impressed with the 11" MBA, for instance, though it lacks the CPU power and HD space I need. Naturally the future Q3 parameter will include these stats as well.

The two top contenders are VERY close, with the MBPr scoring a 6.5 and the upgraded MBA a 6.7. By this metric, I should clearly purchase the upgraded 13" MBA...

Of course, I'm not guaranteeing my proprietary algorithm completely solves the problem. Still, it presents a curious idea: can I accurately quantify/model my desire for a computer, wrapping the decision up in to a single number?

What do you think? Which computer is better? Can we believe in a quantity like Q2? Chime in!