Gender in Science Talks


Attention Fellow Astronomers!

If you'll be attending the upcoming AAS 223 in Washington DC, could you help me collect data for a project? This is for my "AAS Hack Day" idea.

The project stems from an anecdotal observation I made at a recent meeting I attended: the gender ratio of speakers did not appear to be the same as the gender ratio of the people who asked them questions. Part of this was due to a good mix/balance of age and gender in the talk lineup, and part of it because "gray beards" (with a known gender skew) often ask questions from the audience. This begat a slew of other questions: e.g. do women prefer to ask women questions? Do some subfields have better gender parity in speakers vs questioners? Does the questioners gender ratio match the AAS attendance as a whole?

Data is clearly needed to study this...

So I'm reaching out to you, dear colleague. I need 2 things:
1. Help me collect data on the genders of speakers versus questioners!
2. Help me spread the word so I can sample many different sessions/subfields!

The data I need is simple:
  • AAS Talk # (e.g. 123.45)
  • Gender of the speaker
  • Gender of the people who ask questions (a simple string of "MFMFMMFF" etc. is fine)

You can get me the data any way you like! Tweet it at me (@jradavenport), facebook msg me, email me (jrad - a t -, scribble it on a cocktail napkin, carrier pigeons.... or best of all use this handy webform (many thanks to Morgan Fouesneau for putting this together!)

I'll be posting more about this study of gender equality in scientific talks in the future, and will be shamelessly posting reminders/pleas for help throughout AAS223! If you have any thoughts on the project please let me know!

And please, if you like this idea (even if you're not attending AAS), share it online with your fellow astronomers! Thanks!

TwainBot: Early Lessons

No comments:

Today TwainBot hit the 100th tweet in Tom Sawyer. It's been about two weeks since the project was launched and I wanted to jot down a few thoughts/lessons learned so far. Some are obvious, some were expected, some I found subtle or surprising....

1. Uptime is difficult, especially for sloppy side projects in new languges. So far TwainBot has gone down twice and missed tweets (I caught it up manually). Once was due to coding errors (my bad) As every sysadmin or dev knows, you are garunteed to release code with bugs, and probably they will cause your system to crash in ways you didn't expect. TwainBot is very robust because it is very simple, and it saves things to dropbox so I can check on it in real time. Still I forgot to ensure that *every* tweet was correctly binned to 140char, and in a few cases (about a half percent of the tweets) my logic statements screwed up and put a few too many chars/words in. This causes the Twitter API to crap out and return an error (smart), but I didn't catch them in any kind of error trap. A programming 101 mistake (facepalm) that was trivial to correct.

2. Uptime is difficult, especially for the Pi. Of course, the Pi is not a server. Heat is an issue. A book is not a good case for a computer. The first tweet of day #2 TwainBot failed. I was home sleeping in on that Saturday morning. At first I thought I had screwed up my cron job and it wasn't scheduled properly, but I couldn't remotely access TwainBot for some reason. Instead it turned out I had cooked the Pi (hah). It overheated and shutdown (showing a red light on the board only). Cooling is a real issue, even for this mini computer. The book is now open, and so problem solved... for now. From reading about the Pi I expect more performance/uptime issues in the future, including possible HD failure. More redundancy is needed, and I'm updating my scripts to push/pull more from dropbox, allowing me to remotely intervene and push new code as needed as I'm traveling a lot next year.

3. A year is a long time. This is a long endeavor, and that's a commitment of time and brain power to run for me. For the audience there's a constant interaction needed to keep up as well. I failed to appreciate this point at first. Some people will follow and read a few tweets due to novelty/amusement. Some will think "Neat! But I'm already behind, screw this." A serious question remains: is this project interesting to anyone half way through? How could I make it more approachable? Is there a way to catch up (you know... besides reading the book) that I can implement? Twitter followers have been joining and leaving, at nearly a constant rate since day one. Time will tell if that means I'm net losing people due to lack of interest, or if interest can really only be fleeting for TwainBot, which brings me to my next point...

4. What is success? TwainBot is an art project, first and foremost. Still, what do I want at the end of the year? 10k followers? 10 followers? The struggle to maintain an audience I think is partially related to a lack of a clear end goal on my part. A lot of people have asked "why are you doing this?". The answer is because I wanted to create something. I didn't go in to this with a motivation beyond creating something to share with people. My end goal now is to talk to as many people as I can about it, to have conversations about what it means to read a book at a very different pace than normal.

5. Everyone (especially on the internet) is a critic. Some people hate destroying books physically. Some people hate "destroying" the art, by stuffing a classic in to a medium it was never designed for. This goes directly back to #4. I want to have these conversations, and every snarky (or positive) comment on the internet is a tiny conversation people are having about the project. Ultimately that's the goal, that people will think more about the intersection between literature and technology.

...and to read books, because they're awesome.

New Project: Mock Twain

No comments:
Today I'm excited to announce the public release of a new project I've been working on: Mock Twain! 

I started the associated Twitter account about a year ago to just mess around, sending occasional Twain-inspired quotes. A month ago I came up with the idea to tweet an entire book, and immediately using Twain as a source sprang to mind!

It's a simple enough idea, mixing a new-world digital medium with old-world art/content. Implementing it was also remarkably easy... and thus TwainBot was born! You can follow Mock Twain via Twitter (below) or check out the project's blog (new link on this site, to the right of my [ Press ] link)

Here is the executive summary:
  • TwainBot will tweet the entire text of The Adventures of Tom Sawyer
  • The "bot" (running on a RaspberryPi inside a hollowed out Mark Twain book) sends about 10 tweets a day, via python/tweepy
  • It will take about a year to tweet the entire book

Fisheye Gamma Ray Photo

No comments:

Cross-post from Common Observer

A neat image came up on the fantastic Astronomy Picture of the Day that caught my eye. Here is a picture of the Milky Way as seen from Earth orbit... in gamma ray light! The image from the Fermi Space Telescope is of course in false-color, but is nonetheless striking!

What is so fun about this image is that it contains both the Milky Way AND the Earth! The ring around the edge of the "photo" is actually our planet, glowing in gamma-ray wavelengths. In essence what we're seeing is a kind of wide-angle (or fisheye) effect. Here is perhaps a more familiar (and more extreme) example of this wide-angle effect, in visible light, and with a more worldly setting...

Besides study the fascinating physics that goes in to producing gamma rays throughout the Milky Way's disk (and well beyond!) this image is a brilliant visual reminder to me that our home, Earth, truly belongs to the cosmos. Rock on, NASA.

Pre-MAP: building diversity in STEM

No comments:
Today I'm putting up a guest-posting on behalf of a program being run in the UW Astronomy department. While this isn't the fun-with-data content most people look for on this site, it fits neatly under my other interests (astronomy and academia).

In brief, the department has focused on recruiting underrepresented students in their first year at UW. Of course being scientists they're also interested in how effective this program has been, and have written a short paper outlining some summary data-driven lessons.

Since its creation by astronomy graduate students in 2005, the Pre-Major in Astronomy Program (Pre-MAP) at the University of Washington Department of Astronomy has made a concentrated effort to recruit and retain underrepresented and low-income undergraduates interested in fields pertaining to science, technology, engineering and mathematics (STEM). About 90 students have participated; many have gone on to major in physics, astronomy, or other STEM fields.

The program begins in the fall (nominally in the students' first quarter at UW) with a keystone seminar where they learn astronomy research techniques (computer programing, paper reading, etc) and then apply their skills to research projects conducted in small groups. During this time, students work closely with research mentors (professors, post-docs, graduate students) as they learn what it really means to be a scientist. At the end of the quarter, each group presents their work to the astronomy department. Many students continue working on their research projects after the seminar ends.

Beyond the seminar, Pre-MAP provides many other resources for our students such as a collaborative “cohort” atmosphere, one-on-one academic mentoring, guided tours of research labs across campus, and a yearly field trip to an astronomical observatory. The idea is that by giving students early exposure to research within a collaborative and supportive community we will not only give them the skills necessary for success in STEM fields, but also allow them to gain confidence and enthusiasm for science.

But does the program actually accomplish these lofty goals? To look at this, we use data from the last 8 years of Pre-MAP students to evaluate the program and compare our students to the general UW population. We succeed in attracting students with a range of ethnicities and math backgrounds. Our students perform similarly to the overall UW population both overall and in the sciences. We find that STEM retention depends strongly on math placement and performance. However, even when controlling for these variables our students are significantly more likely to pursue STEM degrees than their peers. The entire paper can be found on the ArXiv.

Want to know more about Pre-MAP, including obtaining resources to help start a similar program at your institution?
Visit our website
Or send us an email!  mjt29 [at], sterrs [at], schmidt [at]

The Dimensions of Art

Some good soul on reddit posted a link to a very neat dataset: Metadata from the Tate CollectionThe files contain lots of interesting bits of information, but one particularly stood out to me: the dimensions of every piece of art that the Tate owns.

A major caveat: a lot of the art is 3D and has a 3rd dimension I'm not considering (e.g. sculpture). For your thoughtful viewing pleasure, here is the distribution of the aspect ratios for 65k pieces of artwork held by the Tate as a function of their width

Art dimensions, a technical view

Pixel color (light to dark) indicates density of pieces. There are some interesting clumps in this space, here are some thoughts:

1. On the whole, people prefer to make 4x3 artwork. 

This may largely be driven by stock canvas sizes available from art suppliers.

2. There are more tall pieces than wide pieces.

I find this fascinating, and speculate it may be due to portraits and paintings.

3. People are using the Golden Ratio.

Despite any obvious basis for its use, there are clumps for both wide and tall pieces at the so-called "Golden Ratio", approximately 1:1.681 (as a tribute, that's the ratio I rendered the above figure at)

Art becomes data becomes art

What I learned very quickly after producing the first figure is that nobody understands it. Even though it's very information rich and accurate, I'm violating a basic rule of data visualization: make it understandable! People gave me lots of feedback saying they couldn't wrap their heads around the figure, and I did almost nothing to break it down...

Because this is art, I felt compelled to re-visualize this into something more... visceral. Here is the same data (for art up to 3m x 3m), with each piece represented as a thin wire box.

Play along at home

If you'd like to play with this data and make your own version of these figures, I have replicated (nearly) the figures from this blog post in an IPython notebook, which is up on GitHub! (link to notebook).

Talk - Beauty in Data

No comments:
A few months ago I gave this talk at the Seattle Nerd Nite. It was a great event, and the small crowd of ~75 people who came out to the bar to see me and the other speaker were friendly and chatted me up with questions for probably another 30min after!


Excel vs Python vs IDL

My favorite quote about camera gear is this:
"Your camera doesn't matter" -Ken Rockwell
If you're reading my blog, odds are you would laugh at the notion of "professional grade plots" being generated using Excel. I've been guilty of this sin as well. We're all wrong. Your software doesn't matter.

There's a lot of geekery, pride, and often vitriol when it comes to visualization tools. If your graph looks dated, or is clearly created using tools that have fallen out of vogue, people will be more dismissive of your scientific results (according to my observations at least). I have observed such viz-bias in PhD scientists and undergrads alike, and have caught myself thinking it as well.

Speaking strictly for visualization (though you can extend this to many aspects of scientific computing presently) as a practitioner in Astronomy these days you're antiquated if you don't use Python (or better yet D3), IDL is considered very unfashionable, and Excel is forbidden.

I say phooey to that.

I'm not dismissing the deep value, or plain superiority in some areas, of Python over IDL. D3 is downright amazing. But, when it comes to the bread and butter plots, the ones that get science done quickly and cleanly, one tool is no better than another. Because I have keywords/settings adjusted already, I can zip out a publication quality plot in a single easy to read command in IDL (many fine examples can be found within this website). If I had been using Python continuously for the last 8 years I could do the same in that language too. With patience you can do the same in Excel.

To prove my point, here is a quick attempt to generate the same basic plot in IDL (v7.1), Python/matplotlib, and Excel (2011). The data is about 2.5 days of a lightcurve from Kepler. To try and make things fair I've scaled them all to similar resolutions, and placed ugly red labels on them in Preview.

Can you guess which plot is which?

There are aspects of each figure that I really like, and while comparing them I find I am truly satisfied with none. I'm not an expert in Python or Excel. Maybe the answers were super obvious (let me know!) but if I saw any of these figures in a research paper I wouldn't stop to wonder about the tools being used, nor question the credibility of the researchers who made them.

So in summary, your visualization tool doesn't matter. Similarly, there's just no excuse for ugly and illegible graphs from any tool. As I've tried to say time (and time and time and time) again: Visualization is first and foremost about asking good questions and clearly communicating your message. If it's artistically pleasing, so much the better!

answers: A=Python, B=Excel, C=IDL

My good friend, Meredith Rawls, graciously re-made the same figure using SuperMongo (SM). Check it out:
Also - many people have pointed out in the comments and on Twitter, this example is very selective in the graphical skills required. I heartily agree! Your visualization tool doesn't matter, provided it is capable of rendering the visualization you need!

A Summer at Microsoft Research


It's autumn now, a time of harvest and reflection, and the beginning of the academic year. The blog has been dormant for about a month because I've been working very hard in Astro-land.

I spent the last few months only 10 miles away from UW, just across the lake in Redmond. Since lots of people have been interested in how the experience was, and since corporate internships seem to be fairly uncommon in Astronomy, I thought it would be worthwhile recapping my summer at Microsoft Research. (Apologies for a super lengthy post. tl;dr MSR was fun, challenging, would recommend)

Laptop Battery, the Aftermath

The summer is coming to a close, and so is my internship at MSR. I haven't had much time for blogging these past 3 months, but the few posts I managed to write fared quite well! This included Airports of the World (21k views) and The De-Evolution of my Laptop Battery (114k views).

The Laptop Battery post quickly became my most viewed article (at least in terms of views on this website). At one point it was the top post on Digg, the front of Slashdot, and near the top of Hackernews. This induced a ridiculous traffic spike...

The little code snipped I posted on github got a number of "forks", and the comments on my blog (and other sources) were awesome, insightful, and usually quite friendly (as far as internet comments go).

The (De-) evolution of My Laptop Battery

Update: the GitHub repo for this data/script is now available.

Today my MacBook Air is one year old. That's not exactly an officially recognized holiday, but it does mean one thing very cool:
I have one year of data on my laptop battery, recorded every 1 minute of computer usage!

A little backstory:
I started occasionally keeping track of my laptop's battery several computers ago. Near the end of my previous computer's life I realized I could automate this collection of data. By keeping a record of the battery charge every minute my computer is being used, I am able to track the health of my notebook, as well as study my own computer usage in remarkable detail.

In a previous blog post I noted that it would only take a negligible amount of hard drive space to keep such a record for the entire life of your computer at 1-min sampling (though I under predicted the amount of space by about 3x). The previous battery study also provided me with subject matter that was included in a quantified-self art exhibition in Ann Arbor last year. While I obsessed about what to buy for this latest computer, battery life never factored in to my equation. Every model seemed to boast more than enough capacity.

Without further ado, here is what 1 year (152,411 samples) of battery capacity data looks like for my 2012 MacBook Air:

Cubehelix: now for Tableau!

Good news, everyone! Everyone's favorite color scheme, CubeHelix, has now been used in Tableau!

Background: CubeHelix is a color scheme that properly de-saturates from color to black and white. It's not always the most artistic color palette, but it has the advantage of being easier to view for people with color blindness. By changing various settings, many variations on the "CubeHelix" scheme are possible.

After my previous post about the CubeHelix color scheme, I was encouraged by Jewel Loree to actually produce the color scheme to use with Tableau. It turns out, it's pretty easy!

Data or Art?

1 comment:

A brief update: I wanted to note the passing of Ellsworth Kelly, whose 7 decade career produced some of the most interesting and subtle works of contemporary American art. Several of his pieces inspired this blog post, and I have updated it 2 years later with one additional example.

I recently gave a talk at NerdNite Seattle on the beauty of data (deck will be posted to slideshare soon). A main point of my talk was that good/effective data analysis (not simply limited to data visualization) draws from both art and science. From science comes a sense of objectivity, rationality, and gritty truth. From art we take wonder, introspection, and aesthetic. This interplay between science and art is what sparks my interest in the "quantified self", and I believe the growing popularity of data science in everyday life.
Good data analysis, like good art, should cause you to quietly reflect on what the subject means to you, how it includes your life.
In data visualization we also have an obvious parallel with art, and many of the best visualizations are constructed using principles (and often by practitioners of) graphic design. Many examples can be found on Reddit's awesome /r/DataIsBeautiful. However, often it seems that people are more worried about how a graphic looks, and less about what question it addresses. A celebration of data visualization as a sort of pop art has sprung up. This isn't a bad thing, but I find it interesting.

In my NerdNite talk I decided to play a game, which I will reproduce here for fun: Art or Data? I have chosen a few select examples of data and art that look similar (I've also cropped/scaled them to be extra misleading). Your job is to guess which panel is which. Answers follow below the fold...





Now the answers!

Quick Poll: Gender of White House Visitors

Here's an informal survey I'm taking...
What do you think the gender distribution of visitors to the White House is?

I've been playing with the 3 million row visitor record database that the Obama administration released. A full post with the answer, and the details of how I came to it, is forthcoming. I thought it would be neat to ask what result people expect!

Direct link to survey

Airports of the World

I've been busy in astronomy-land recently, trying to make some headway on my thesis before beginning a summer internship (more on both hopefully in the coming months!) For this site, recently I've been playing with some interesting data, trying to find the "story" - which is really a way of saying I haven't found the truly clever question yet. Lots of pretty/interesting visualizations have been made, however.

In the meantime, I thought the following map was pretty incredible. I present:
The World, Traced by Airport Runways.

This was generated using 45,132 runways (awesome data from here). Think about that number for a moment: there are at least 45,000 places to land an airplane! These range from small dirt fields to LAX, and the data seems to be more complete in the USA. Still, runways on every continent, seemingly every country.


Update: You can now purchase a poster version of this map. Neat!
Airports of the World Poster
Airports of the World Poster by IfWeAssume
View custom art Posters & Prints at online Zazzle

What's Trending in Astronomy - #aas222

No comments:
This week the American Astronomical Society (AAS) is having their biannual meeting. You can follow along with tons of great tweets from many astronomers at the meeting by following #aas222.
Even when I'm not able to go, I still enjoy looking through the meeting's abstract book to get a sense for what's being discussed. If you don't want to parse ~100 pages of abstracts, check out this word cloud I made using every talk and poster abstract:

This visualization is a little silly, but it does give you a sense for what people are talking about this week. Lots of big name telescopes can be found in the 'cloud (like Kepler and Hubble), and abbreviations for the states that are home to many of astronomers (AZ, CA, MA). I'm happy to see "stars" as one of the biggest science words, and cool to see WIYN getting lots of love (which makes sense given the meeting's location, and ODI being online).

I made one of these word clouds for AAS 221 as well,  though I might have pre-whitened it by removing (e.g.) United, States, and University. I can't recall. Comparing these two word clouds, however, has given me an idea I'd like to pursue: using the word frequency of paper (or meeting) abstracts to track the popularity of astronomical topics/sub-disciplines over time.

I realized as soon as I hit publish that the IAU Symposium #299 is going on at the same time as AAS 222, which is one reason there are so few uses of "planet" in the original word cloud. Here for comparison is the word cloud generated from the IAUS299 meeting program. Here's where all those planet folks are hiding this week!

How we Perceive Racial Demographics

No comments:
Last year I conducted a short online survey to (attempt to) answer a simple question:
How accurately do people know the racial demographics of their neighborhood?

This was prompted by overhearing a great many generalizations about the racial composition of Seattle, and the UW in particular. The survey was straight forward: simply provide your guesses for the % of each race in your neighborhood, as well as a few details about yourself (age, gender, race, and most importantly ZIP code in the USA). The ZIP code was used to compare the user-estimated %'s to data from the US 2010 census.

I'd like to share a bit of what I learned...

1. Respondents, or, The Kindness of Strangers

Cubehelix, or How I Learned to Love Black & White Printers


Anyone who's chatted with me about figure design (at least in Astronomy) in the last two years has probably heard my rantings and ravings about some strange color pallet called cubehelix (not to be confused with timecube). I flat out love this color scheme, and I think it could work for you! Here's why:

1. It Works Better for YOU

The most notable feature about CUBEHELIX (or "cubehelix" if you prefer) is that it prints equally well in color and black & white! This is a great time saver when making figures for publications where color is only available online (if like in Astronomy you still use journals with print copies). CUBHELIX accomplishes this by cycling through the RGB cube, while constantly increasing the saturation (black to white). I like this figure, which explains how CUBEHLIX works and its name:

Keck: A 10-m Paintbrush

No comments:
My friend Sebastian shared this awesome image with me today, and was kind enough to let me post it here:
"Focusing Keck" by Sebastian Pineda (Caltech)

This is a screenshot taken from one of the guider cameras on Keck, one of the largest optical telescopes in the world. Each hexagonal cluster of points is actually a single star being focused by each of the mirror segments! The dots are small since each segment is essentially in focus, and Keck has tremendously good "seeing"

Normally each segment of the primary mirror would focus to the exact same location. However the mirrors are slightly (and intentionally) misaligned, allowing the operators to see the focus for each of the 36 primary mirror segments. For comparison, here's one of the Keck primary mirrors in its full glory:

(Copyright W. M. Keck Observatory)

Of course Keck routinely produces stunning images of the cosmos (check some out here!) but I thought this simple black/white image above was amazing. In one image it captures the simplicity and beauty of observational astronomy, while reminding us of the engineering marvel that allowed its creation. It reminds me of the million little things that must go right each night to make astronomy happen.

Science in action! Clear skies, Sebastian!

The Cost of Astrophysics


One of my favorite posts so far on If We Assume was "The Pace of NSF Funded Research", in which I showed that NSF-funded astronomy grants produce papers for up to 15 years! I made that figure while on an airplane with my friend Eric (who does cool stuff like this!) so that's fun too.

The data for that project came from the brilliant people at Harvard's CFA Library, who gathered every Astro paper published since 1995 that referenced a NSF AST grant. When they updated this database to include the budget amount for each grant, and were kind enough to notify me, I knew it was time to do a follow-up post!

The question that immediately jumped to my mind: 
How much does a typical Astronomy paper cost taxpayers?

Caveat Lector
I want to acknowledge this kind of analysis could be seen as inflammatory, insulting, or misleading. Please consider it in the lighthearted spirit it was intended.

1. A Typical AST Grant Costs $249k

Here I'm just showing a simple histogram (with log $ bins). Almost all grants are a few hundred-thousand dollars. The typical (median) is $249k, which for reference would pay for about 4 years of support for a graduate student at the UW, including overhead, tuition, salary, publication/page charges, 2 new computers, 4 domestic conferences, and a couple international conference trips.

2. Typical Grant size has started to drop recently

The orange line traces the median grant size each year. Our new tradition in America is to evidently not pass federal budgets. I'm not going to claim this is the cause of the drop in median grant allocation, but it's interesting that the last time a budget seems to have been passed in this country is 2009... My belief is that the NSF has tried to keep scientists from leaving the field, so giving out smaller grants means more people can still pay their rent.

3. A typical paper costs about $20k

According to some very simple (read: bad) math, take the # of papers produced divided by the budget of the grant and you get some kind of "cost per paper". This assumes that papers are the only real product of research, which is not entirely true. Conspicuously, this is about on par with a year's stipend for a graduate student (not including overhead and tuition, which about doubles that cost). I don't know if people will think this is too high or low (what is the going market price for a paper?) but the more I consider it the better a deal it seems! 

Here is an obtuse way of looking at this. Orange lines track the cost per paper versus grant size for fixed numbers of papers. Kind of silly

4. Paper costs are remarkably stable since 1995

There is a slight steady increase, but generally this is quite flat. The steep rise in the past 5 years is due to grants not yet reaching their full measure (see first post about grant productivity)

5. Small grants are more "efficient"

Maybe this goes without saying, and maybe this is the stupidest result of this entire analysis, but the best "bang for your buck" is in small grants... especially if they're reasonably productive! Naturally this kind of metric rewards people who cite every grant they've ever worked on in every paper, but is that a bad thing?

Below I show the "papers per dollar", literally inverting the metric from before (# of papers produced / grant amount). Once again we assume that papers are all that matters. In red I've highlighted the "most efficient grant", that which produced the most numbers of papers for the least number of dollars. (note this may be supplanted as newer grants continue to rack up papers)

By the power vested in me by the internet, I pronounce Detailed Modeling of Radiation Transport in Supernovae (1998) the most efficient AST grant since 1995, with 56 papers citing the grant and a meager $50267 awarded. Congratulations to Dr Peter Hauschildt.

6. Bigger $ grants don't necessarily yield more citations

If the number of papers is related to the "productivity" of a grant, the number of citations probes the "impact" of a grant. Interestingly, there does not appear to be much correlation between expensive grants and more "impactful" science. Take from that what you will.

I am also pleased to announce the winners for highest "impact per dollar" (literally # of citations for the grant / cost of grant). Below in blue I have marked the winner, Submillimeter Studies of the Cosmological Star Formation and AGN Histories (2000) with 3157 citations and only $37159! Well done, Dr Lennox Cowie. A slim $11.77 per citation! Notable runner up in this category is again Detailed Modeling of Radiation Transport in Supernovae (1998) in red, with 3571 citations.

Lastly: Citations versus Papers

I also realized that this database provided an interesting testbed to consider how papers gather citations. Generally this is a topic of great debate and interest, especially for young researchers. Below I've plotted the # of total citations a grant receives versus the total # of papers it produced. Of course this should show some correlation.

Also shown for reference is the "1:1 line" representing 1 citation for every paper (a baseline for impact?),  the "20:1 line" indicating 20 citations for every paper (reasonably good I'd say!), and something I've dubbed the "Line of Self-Citation". This curious line was calculated like so: if every subsequent paper you publish contains a citation for every previous paper you've published. I guess this would be better called the "Line of Cumulative Self-Citation". 
Obviously citation behavior never literally follows this Line of Self-Citation; imagine how horribly boring a paper with 100 different self-citations would be. Also - I'm not sure if this database has intentionally removed self-citations (sometimes done). What I find curious is that this Line of Self-Citation does a reasonable job of at least going through the data.

Finally: I'm not sure what to really make of this last figure, but I don't think I've ever seen anything quite like it. Have you? I'd love to hear your thoughts/feedback!

The Reddit Effect - II

1 comment:
Today I'll share a couple observations about web traffic. Take from it what you will.

Below are two charts/tables, directly taken from my "dashboard". The first lists my Top 10 Articles, ranked by total numbers of pageviews. This shows a smooth exponential-ish distribution, not too heavy on any single article.

(apropos: I really like the built-in stats tools with Blogger!)

The second chart lists the Top 10 Traffic Sources for this site. The traffic from Reddit is more than double that of all other sources combinedWOW! 

This isn't to say that Reddit is the best place to advertise your work. It can go largely unnoticed if you don't participate in the Reddit community, and getting traction within any social news aggregator is often a subtle game. However, your potential exposure can be much higher than places like Facebook.

These stats also don't account for external exposure. For example, I'd wager more than 865 people read Huffington Post's coverage of my Starbucks post.

My intuition is that I need to diversify my readership sources some, that Reddit doesn't necessarily create a stable base (for a host of reasons). But I'm making this whole blog thing up as I'm going, just trying to do my best, so who's to say what's "best"?

April Fools - UFOs and the Humorous New Frontiers of Science

No comments:
"And now for something completely different..."
Today I have posted my first April Fools arXiv paper: Detection Rates of Unidentified Moving Objects in Next Generation Time Domain Surveys. It semi-seriously explores the possibility for LSST to place real limits on the visitation rate of UFOs to our world. This is an idea I'd been kicking around for a few years - it's silly, but not altogether absurd. I'd love to know what you think!

Like many astronomers, I read the astronomy section of the arXiv (astro-ph) daily over coffee. It is a repository where researchers post manuscripts for rapid (and free) dissemination and archival.

Link of interest: When to post to arXiv? (via AstroBetter)

I became aware of April Fools paper on the arXiv a few years ago, which range from silly inside-jokes between friends to the more subtle. My favorite is when you only realize the paper is a joke after you've started reading it! These are in short supply, but every year one or two come along.

More seriously, I love that the arXiv provides a reasonably legitimate forum to publish things that are more complex than a blog post, but perhaps less rigorous than a paper. Especially given how expensive it is to publish (page charges are routinely more than $125/page for authors), the arXiv gives scientists a valuable alternative.

There is value in the absurd. 
Especially in astronomy, we must entertain the totally bizarre and fringe (at least to a point). In this age where astrophysics is becoming truly hard, less funded, and driven by massive collaborations, I have heard it said that astronomers risk becoming less creative. If you only do science that's a "sure thing", if you're not willing to speculate a little, if you haven't the guts to try something new or engage in a bit of academic creativity, then our majestic enterprise will surely fail.

So perhaps April Fools can also be a day where we shamelessly trot out some fun ideas, some semi serious or even speculative notions. We could create a Journal of Speculative Astrophysics specifically for ideas whose time may not have come just yet, one edition per annum (Fritz Zwicky could be the editor in perpetuity).

Or maybe I'll be unemployed on Tuesday! Either way, I want to believe...

A short list of other past April Fools papers...
If you know of any other real gems, drop them in the comments below or shoot me a line!!

The Greasiest Spoon in Town

Got the Google Reader blues? Get updates for If We Assume via email

Last month KIRO 7 reported on the "10 Dirtiest Restaurants" in Seattle (link to story now dead).  Several establishments near UW were featured, and in the past month its been fascinating to watch how they (and the student patrons) reacted.

One UW-area Thai place on said list was quite upset, as they apparently have the same name as another Thai restaurant, and felt it was a case of mistaken identity. They posted a sign decrying the bad press, which didn't last long. My by-eye gauging is that their business maybe took a couple day dip, but has remained strong. Also - the place is by no means "clean".
Just down the 'Ave is a teriyaki restaurant, which absolutely deserves to be on this filthy list. I've had some frightful meals here over the past decade...

This got me thinking about restaurant inspections and food safety across the city, and I went in search of the data that could answer the question: where are the best & worst places to eat in Seattle, according to food inspectors?  Mixing in some GIS shapefiles of Seattle neighborhoods, here's what I came up with:

The data contained about 192k inspections, spanning 2002 - 2009. More recent data was also available, but to get the general map this was all I needed.

There's lots of other fun things one can do with such a large database, including looking at when the food inspectors are most likely to visit a given location, and even when most restaurant inspections occur:
This map, and the initial KIRO report, are examples of the every-day insights you can gather from public data that society is collecting every day. Personal food/restaurant reviews are another great source of insight. Combining these, it could be very interesting if (e.g.) Yelp included the most recent food inspection reports for restaurants. Wouldn't you want to know? Fun stuff.

FYI: King County's Public Health reporting system can be found here.

Planck vs WMAP: CMB OMG

No comments:
There has been a lot of press and interest today for the first results from the Planck mission.  The headlines read "Planck reveals an almost perfect Universe", which I'm sure will finally win the telescope some loving approval from it's high-expectations namesake...

This space telescope is busy measuring the Cosmic Microwave Background radiation, the fleeting thermal glow from the early universe. You can observe a piece of the CMB too - if you have an old TV, just turn it to static. Actually only a few % of that static noise is due to the CMB, but it's fun nonetheless!

When I was an undergrad I thought the CMB map produced by WMAP (the previous big name in this game) was incredible. Here's an animated gif I whipped up this morning comparing the Planck to WMAP results...

Two things stand out to me:

  1. Planck has remarkably better spatial resolution
  2. Planck chose a very different color scheme (smells like Python)

Planck was smart to ditch the gaudy rainbow color scheme. However, neither map does terribly well for colorblind people. Here I've run both the WMAP and Planck CMB maps through the handy online Vischeck tool...

Both these figures also desaturate very poorly to black/white, though the WMAP does a bit better. This is nitpicky, of course, but if you're going to have your results plastered across the world, choose a good color scheme. This is all in loving jest. Congrats to the Planck team on their great work!

Of course, no report of CMB results are complete without this seminal figure...