Plots as art: bonus


Sometimes the actual result can be as artistic as the rough draft. Here is the final version of the wavelet figure from the Wavelet Tree in my previous post. The x-axis is still time, showing about a month of data. The y-axis is increasing frequency. You can see there is constant power from a stable star spot, highlighted with the red line. The yellow line shows 1/2 that rotation period, and overtones are visible above that. Small spikes, like blades of neon grass, are due to flares on the star.

Plots as Art

In the daily course of research I will often generate tens to hundreds of plots, depending on the project and how automated the task is. Naturally most of these hit the "cutting room floor", or are simply sanity checks in real time, never meant to be seen by others.
Sometimes these figures are too cool to just discard, and it occurred to me that they might even qualify as "art". So I present to you: three artistic looking figures that were too cool to throw out, and a short explanation of what each was. Each figure is a real byproduct of research, but I have taken some liberties with the color schemes.

ZIP Code Density

No comments:
Color/darkness represents the number of ZIP codes within each pixel.

I've been playing with some neat data from the 2010 Census, and in honor of Memorial Day I thought this was a super fun plot to show. The actual post that I'm working on related to this will probably take a long time to complete, as it requires I gather a ton of data...

 I'm using the "CubeHelix" color scheme (shameless promotion: I wrote the first version for IDL). This scheme is great because it de-saturates from color to black/white while correctly preserving brightness. It can be rainbow colors too, not just monotone like above.

For example, here is the same figure, but with the color turned off (using Apple's Preview app). Great for making figures that look good both online (where color printing is free) and in black/white print (e.g. ApJ charges $350 per color figure, last I checked)

Never Take SR-520

Be sure to subscribe for updates on this and all my other data analysis projects!

Fuel prices have always been higher in Seattle than in many other places in the country. I've been paying around $4.30, and looking at this map from AAA that seems about average for the west coast.
AAA Fuel Gauge Report Map

This year the State Route 520 floating bridge in Seattle become a toll road, after several delays. Besides enabling the automatic tolling service 8 months behind schedule, it's been a rough year for 520. The tolls have been occasionally overcharging drivers, the bridge has been closed many weekends for maintenance (see picture below), and a yacht got wedged under the highway. Yikes.

SR520 closed for weekend
With gas prices climbing, and the 520 tolls due to increase by 2.5% starting July 1st, I thought it high time to finish a post I'd started working on months ago! The premise is simple:
Is it cheaper to take 520, or drive around on I-90?

Tolling prices for weekdays (black) and weekends (blue)

To address this question, we must first understand what the tolling prices are like. Above I have reproduced the weekday and weekend prices as a function of time. You can see the variable price scheme, whereby the drivers in morning and evening rush-hours are relatively fleeced. On the weekends it's a single-peaked distribution, cashing in on (for example) Saturday shoppers headed from Seattle to the popular Bellevue Square shopping mall.

This flat cost is the first part of the equation. Next we need to estimate how fuel efficient your car is, so we can calculate the price of driving the different length routes.

Here I'm showing the fuel efficiency of a car as a function of its speed. The black line describes this efficiency for a fairly ideal passenger car, like my 1999 Honda. The blue line has been reduced in efficiency by 20%, and I believe is more representative of a typical "light SUV", such as the extremely popular (in the northwest) Subaru Outback. I'll be assuming the SUV case below. I don't really know how accurate this data is, but I've assembled it from and I'll let you decide if you think this is a reliable or impartial source.

The last piece of the puzzle is a route to consider! For this example I'll be traveling from UW to Bellevue Square mall. Route A is the direct across 520, route B is the long drive around I-90, via I-5 and I-405. I played with other routes as well, but this is the simplest and most dramatic.

Route A: the direct trip
Route B: the long way around

So the game works like this: At every hour of the day, for every speed between 0 and 80mph, I have calculated the cost of going to Bellevue via both routes. The reason I have gone through the silly process of computing cost for every road speed is that I was curious if the distance would be the deciding factor at certain speeds. As such, I apologize if the result figure is complicated. 

Weekday price difference grid, assuming $4.20/gal fuel. Black lines highlight the price
difference over time at 60mph, the nominal speed limit on both roads.
You can almost read off the data from the grid. The color indicates the difference in price between driving around and taking 520. Red indicates that 520 is more expensive. Green to black is I-90 being more expensive, 520 cheaper. The color-bar (legend) at top was capped at $1, but the maximum price difference was actually around $1.80.

The result gives rise to the pointed name I chose for this post: you should (almost) never take SR 520 if you want to save money. The exceptions are:
  • late at night, when the toll is not in effect
  • if you're driving 5mph
  • if your time is worth more than the amount you save driving around. As a graduate student, the choice is laughably clear.
During the off-hours (11pm-5am) you can see the shape of the fuel economy curve quite nicely. Some of that effect can also be seen at the high-speed end. A reminder to you all: going 55mph saves lives and gas, which in turn saves other lives.

It may occur to you (or, it did to me at least) that this result depends on what the price of gas is, and you'd be absolutely correct. So I turned that knob in the model, and found that at ~$7/gallon gas it becomes cheaper to take 520 at all times except rush hour.

The result also depends on your fuel economy in as similar way. I've only explored the case of a small SUV, but consider a semi truck that gets 6-8 miles per gallon! In almost every conceivable scenario it would be cheaper to save a gallon or two of gas and take 520.

Maybe I'm an optimist, or a fool, but it seems like a reasonable proposal for the toll prices would have been to make them price-neutral to driving around. Tolling also increases traffic on other roads, decreasing the profitability some. I did enjoy seeing that the tolls don't affect low income families greatly. I read through a few of the initial reports released about the 520 tolls, but I'm very interested in looking at the traffic statistics and financial data to see if it meets predictions.

Overall I'm struck with how different the data analysis seems to be in this field. Figures don't seem precise, but rather very bold and contrasting. Sometimes this is effective and accurately tells a story better than a very precise figure. Frequently I find the plots burdensome, but that's the subject for another post...

Rumor Names - 2012

No comments:
A few months ago I put up a word cloud that analyzed the most common name for people getting postdoc jobs in astronomy in 2011 (according to the Rumor Mill). The result: if you wanted to get a job in astronomy last year, you had better be named David...

The 2012 job season has all be ended for the Academe. I'm happy to say it has been kind to many of my friends. Despite the foreboding messages from some talking heads, I know of none with PhDs who are requiring food stamps.

So I went to the rumor mill once more and grabbed the names of the employed elite. This year I actually tried to clean the wordle of the frequent garbage that would get included, but a few errant words slipped through.
Frequency of names from the Postdoc Jobs Rumor Mill, as of  2012 May 21
The winner this year was "Yan", which is actually due to both Tze Yan Lam and Francis-Yan Cyr-Racine being offered multiple positions each. Even more interestingly both Yan's work on cosmology theory, and were offered some of the same positions. I don't believe they have collaborated, but it seems fitting that they should.

Here's a breakdown of the contenders this year...

  Yan       11
  Laura     9
  David     8
  Michael   8
  Fumagalli 7
  Michele   7
  Colin     7
  Hamaus    7
  Wenjuan   7
  Voort     7
  Freeke    7
  Fang      7

That was all the results up to 7 hits. There was a big pile up at around 4-6 hits, which is why the word cloud looks so homogenous in word size. Odds are still very good that if you're named David you can find work in Astronomy! The winning first name this year, however, was Laura.

Fun stuff!

Email History

No comments:
I came across this neat article in WIRED a couple months ago by Stephen Wolfram (the man who brought you Mathematica, WolframAlpha...). In it, Dr Wolfram discusses some of the impressive and mildly obsessive amounts of data about himself that he has kept for over 10 years. It's a fascinating look at how we truly can be statistical beings.

He naturally makes some dandy plots to showcase this information. The first figure in the article, showing the date vs time of day for every email since 1990, really caught my eye and I've wanted to make it ever since.

Alas, I did not have the resources available to the good Dr Wolfram, and have not been able to keep my entire academic email history. GMail promised such a revolution, to "Never Delete Emails", and I have every message sent through that service since late 2005 still online. I am in the process of downloading them all. Until my GMail finishes downloading, I decided to just make the figure with what I had available on my laptop: my academic email for the last year.

This is my recreation of that super-neat figure, showing the time of day that I send emails. Even with my limited number of emails, this is a really cool figure! Most of the early AM email spurts are from days when I'm observing. I otherwise appear to be a regular creature of habit, at least with my work email.  Let's dive in just a little further...

Here I very simply show the histogram of how many emails I sent each day of the week. I try to take Saturdays off, and you can literally see the burn-out happening Thursday-Friday.

Here we have the normalized cumulative distribution of emails, showing the running total of emails normalized to 1. Besides the artifact on the first day in 2011, when I first set up the new UW email server on my laptop, the rate of email has been surprisingly constant/linear!

Lastly, here is a histogram of when I send email throughout the day. This highlights my daily routine: Sleep around 1 AM, up at 7-8, lunch around 12:30, dinner at around 7pm (19:00).

My takeaway message: I have been doing a good job of keeping a regular schedule for the past year, something I struggled with at times in college, and is often a problem for graduate students. This last year has been the first time since I was 6 years old that I have not taken classes, and I was justifiably concerned with my ability to keep a regular schedule of waking up early and working. These results are encouraging... now if only I could plot how much actual work I've been able to do as a function of time. That might be a less enjoyable result. In the finest academic tradition, I leave that as an exercise for the reader.

Smile Novae

No comments:

While looking through some research papers and reading about binary stars this morning, I happened across a graphic that blew my mind. I was reading "BoB" to refresh my memory on some terminology, when I saw this smiling supernova starring back at me. Here he is, excerpted from the actual research paper:
from Fig. 1 of Iben & Tutukov (1984).
Copyright belongs to the Astrophysical Journal.

The UW Libraries: Part 1

No comments:
UW's Suzzallo Library (image taken from here)
In college at UW I had a part time job in the library for about a year. Like many people I worked part time through most of college. I've heard it said that part time employment, ideally on campus, improves academic performance. This was certainly true in my case.

Working at the library was an ideal job for me. The hours were flexible around classes, I could wander around interesting parts of the collection I would have never otherwise seen, and I could steal a nap on the 4th floor when on opening shift.

I've reflected a lot on my college experience recently, as many people naturally do. I met my wife then, I changed from pursuing an engineering career to academia and astronomy, I did a lot of growing up. (Incidentally, I didn't learn how to become a good student until I was enrolled in a Masters program) None of these changes in my life were expected when I left my rural high school for the big city. I didn't have a good idea of what University would really be about.

I certainly never expected to become an advocate for libraries, nor becoming a member of a couple library advisory committees at UW in grad school. In high school and college I was an avid non-reader. "Books? I never touch the stuff." As I've grown to be part of the academic sphere, the critical role of libraries becomes more clear. And I even read books now, so it goes.

[\end boring backstory]

As technology sweeps through our culture, bringing iPods, Kindles, and Netflix, "traditional" media has struggled to survive the changing climate. Libraries have weathered this storm largely by adapting their scope, becoming a palace of wifi rather than of dusty texts. Thoreau, Feynman, and Kant are all still there, but you'll more likely encounter them in a PDF. Someday (soon I hope) publishers, Google, and libraries will get this "digital library" thing set up right...

This blog is nominally about science, data, and musing on the world around me. That means it's time for some library statistics! It just so happens that libraries, which are run by smart people, have been keeping statistics on their usage for a long time (e.g. see the Association of Research Libraries). UW's libraries keep significant data, and I saw a great talk by Steve Hiller a few months ago on the subject. The data driven goal of this first library post will be to look at the usage trends of the UW library system, to gauge the importance of this major university facility.

Fig 1. Average of autumn, winter, spring enrollment. Data from the UW Factbook.
First, consider the enrollment trend at UW for the last two decades (above). The UW campus has grown by ~25% since 1992. If we assume that the fraction of students checking out books stays the same, then the libraries should have received an increase in attendance.

Fig 2. Monthly gate counts of patrons for all UW Seattle libraries as a function of time.
We can see in Figure 2, however, that over the last 10 years the monthly gate counts (warm bodies walking through the door) have been nearly constant. This monthly data has strong annual structure, which I'll highlight at the end. Since enrollment has been increasing, the fraction of people visiting the libraries is either decreasing, or the number of people who can visit the libraries has "saturated" (i.e. the library is at patron capacity). My intuition is the former. The flat trend in library patrons over time presents a clear message: people still go to the libraries.

Fig 3. Total number of items used per academic quarter, which includes checkouts,
renewals, reserved, and re-shelved in-library items.
In Figure 3 I'm showing the number of items used per quarter at all UW Seattle library branches. This includes items checked out, reserves, renewals of items already checked out, and books/items used in the library but not checked out.  There is a slight decrease over the last 6 years in material usage, but the trend is slight.

Fig 4.  Number of checkouts per academic quarter.

However, Figure 4 tells a somewhat different story. Just tracking the # of checked out items per quarter shows distinct decrease. This decline accounts for most of the net decrease in item usage seen in Figure 3. If you combine this decline with the rapidly increasing enrollment, one conclusion becomes clear: People aren't checking out books. Not at the rates of years past, at least.

Fig 5. Monthly gate counts of patrons at all UW Seattle libraries from 2002 to 2010,
folded over the academic year.

Finally we come back to the monthly gate counts of patrons walking through the library doors in Figure 5. As I mentioned above, the volume of patrons in the library has remained very steady. When we show these numbers as a function of month, placing each year on top of the other, an awesome and repeating pattern emerges (Fig 5). This subtly tells the story of how students use the libraries, how they continue to view them as critical to their academic success.

In Autumn quarter, which typically starts around the last week of September, students flock to the libraries. The patronage that month is around 120% of the average! Year after year this is seen. Eager learners coming to the place they know will fill them with knowledge and inspiration.

By the 2nd month of Autumn quarter they have "figured it out". Thanksgiving holiday sends students away, often for a week, but in general library use is strong. December numbers plummet. Winter holiday takes 2 weeks, and many students finish with classes in early Dec.

The rest of the year reads off like a coarse academic calendar. Spring break in March. Graduation in June. Summer term in July/August.  Library usage peaks when students are hopeful or have deadlines, and drops over holidays.

That's nearly all I have to say about these data (right now). To get a full sense of what the library is "good for" these days we'd need to see data on electronic usage and journal subscriptions/use, not included here and probably harder to quantify.  I believe there is a clear message here: the library continues to be a hugely used resource on our campus, but the use of books decreases steadily. The library must therefore transition to becoming a more general hub for learning. From my meetings/conversations with admin in the library I can tell you that they know this, and are scrambling to redefine their scope. I mourn the loss of books in our every day lives. Small used book stores used to be found everywhere around UW, now I can only think of 1 within walking distance (Magus, and it is awesome). I sincerely hope, however, that the decline of paper book use does not come hand in hand with the decline in educational quality of our university... you're welcome to wildly speculate to that end.