If We Assume: Excel vs Python vs IDL

My favorite quote about camera gear is this:

"Your camera doesn't matter" -Ken Rockwell

If you're reading my blog, odds are you would laugh at the notion of "professional grade plots" being generated using Excel. I've been guilty of this sin as well. We're all wrong. Your software doesn't matter.

There's a lot of geekery, pride, and often vitriol when it comes to visualization tools. If your graph looks dated, or is clearly created using tools that have fallen out of vogue, people will be more dismissive of your scientific results (according to my observations at least). I have observed such viz-bias in PhD scientists and undergrads alike, and have caught myself thinking it as well.

Speaking strictly for visualization (though you can extend this to many aspects of scientific computing presently) as a practitioner in Astronomy these days you're antiquated if you don't use Python (or better yet D3), IDL is considered very unfashionable, and Excel is forbidden.

I say phooey to that.

I'm not dismissing the deep value, or plain superiority in some areas, of Python over IDL. D3 is downright amazing. But, when it comes to the bread and butter plots, the ones that get science done quickly and cleanly, one tool is no better than another. Because I have keywords/settings adjusted already, I can zip out a publication quality plot in a single easy to read command in IDL (many fine examples can be found within this website). If I had been using Python continuously for the last 8 years I could do the same in that language too. With patience you can do the same in Excel.

To prove my point, here is a quick attempt to generate the same basic plot in IDL (v7.1), Python/matplotlib, and Excel (2011). The data is about 2.5 days of a lightcurve from Kepler. To try and make things fair I've scaled them all to similar resolutions, and placed ugly red labels on them in Preview.

Can you guess which plot is which?

There are aspects of each figure that I really like, and while comparing them I find I am truly satisfied with none. I'm not an expert in Python or Excel. Maybe the answers were super obvious (let me know!) but if I saw any of these figures in a research paper I wouldn't stop to wonder about the tools being used, nor question the credibility of the researchers who made them.

So in summary, your visualization tool doesn't matter. Similarly, there's just no excuse for ugly and illegible graphs from any tool. As I've tried to say time (and time and time and time) again: Visualization is first and foremost about asking good questions and clearly communicating your message. If it's artistically pleasing, so much the better!

answers: A=Python, B=Excel, C=IDL

Update:
My good friend, Meredith Rawls, graciously re-made the same figure using SuperMongo (SM). Check it out:

Also - many people have pointed out in the comments and on Twitter, this example is very selective in the graphical skills required. I heartily agree! Your visualization tool doesn't matter, provided it is capable of rendering the visualization you need!

12 comments:

Brandon RhodesOctober 29, 2013 at 10:32 AM
Plot B makes simply appalling choices of labeling and tic-marking that make the graph unreadable if one is actually interested in the values presented — and therefore, interested in where “zero” is on the Y-axis! And beyond that issue, Plot C's tick marks are much clearer to me than Plot A's. So I am happy to have finally found the tiny legend you provided at the end of your article, and learn that IDL beats Python and that both beat the tar out of Excel, since this is the same pecking order that I had heard outlined by people I know who work with data professionally.
JamesOctober 29, 2013 at 10:37 AM
Here's a rough breakdown of time I spent on each figure (again, noting I have all the IDL settings memorized and none of the Python/MatPlotLib ones yet). IDL=2min, Python=10-15min, Excel>1hour.
Kyle WillettOctober 29, 2013 at 11:22 AM
Your points are well-received, James, and I think it's really useful to have carried out the experiment. Where I would disagree with you is this: while you've clearly shown that virtually identical output can made with lots of different software packages, the example here is for a really simple plot. I love when concepts can be boiled down to a simple light curve or scatter plot, but much of the time (some would argue most of the time) that's just not the case.

Take Figures 5, 6, and 7 from Davenport et al. (2013), as an example. I'm pretty certain (correct me if I'm wrong) that those plots simply can't be made in Excel, but can in Python/MPL and IDL. Especially for plots dealing with imaging, multi-colors, transparencies, and precise alignment, I really believe that some programs are better than others. In these cases, I think your visualization tool does matter, and that it's worthwhile to be comfortable with whatever tool will meet the standards that you set for your own plots.
Kyle WillettOctober 29, 2013 at 1:10 PM
Yep. And I've certainly been guilty of "visualization snobbery" - your reminder not to dismiss a graph just because it's made in Excel is something everyone should repeat from time to time.

That being said, it's great that we ARE lucky enough to be in a field that it doesn't come up that much. Even for egregious papers, how many come up every day on astro-ph that are so out of vogue that they obscure the science? Maybe 1 or 2?
AnonymousOctober 30, 2013 at 9:39 AM
Your post is nicely stated but it is also unfortunate that this issue of tool "snobbery" keeps re-emerging over the years with each new generation of practitioners. There is often a lot of ego, maybe even a degree of elitism, wrapped up in tool choices. Tsk Tsk, you should be use Python BlahBlahKit or R BlahBlahPlot. Small minds tend to focus on the trivial (your graph doesn't have enough tick marks ...) and miss the ultimate importance of the message.

CyndiFOctober 30, 2013 at 3:26 PM
You forgot supermongo for old school cred.
AngieFebruary 13, 2014 at 8:40 PM
Check this out. A script to generate a publication quality figure out of the box! Maybe NCL (NCAR Command Language) will win this battle. http://www.ncl.ucar.edu/Document/Manuals/Getting_Started/Examples/gsun10n.shtml
UnknownSeptember 13, 2016 at 9:37 PM
But... if you use Excel then a lot of effort must be put in to get an acceptable plot.

Inappropriate comments, advertisements, or spam will be removed.
Posts older than 2 weeks have moderated comments.
(Anonymous commenting disabled due to increasing spam)