December 4, 2008

89.3 The Current and the Mysterious Non-Expanding Playlist

Let me say this much up front: I still love The Current. I still have the dial on my car tuned there permanently, I still listen to the podcasts when I get a chance, I'm still a member. I'm saying this because the rest of this post is going to sound like Current-bashing. I still think it's a wonderful station - I just don't like the direction it feels we're heading.

A short primer: A few years back, a magical public radio station was born. It billed itself as "the antiformat" station, gave the DJs a massive amount of freedom, and played music that was always fresh, varied, and exciting (and usually quite good besides). Then, somewhere along the line, someone decreed that certain songs needed to get certain amounts of airtime. DJs started being told what their playlists should contain. One DJ quit over the issue. And people like me started wondering why the same song was playing every day on my 20-minute commute. Not that it's a bad song, just... I don't need to hear it every single day. I don't need to hear any song every day.

But, rather than complain anecdotally, I decided to use the power of numbers. The Current makes a massive history of their playlist publicly available on their website, dating back to 2005. So I wrote a screen-scraper in Python to pull all the songs off the site and store them in a sqlite database, which I could then run queries on and make pretty spreadsheets and graphs.

On methods: I tried to normalize all the data before storage, such as stripping non-alphanumeric characters and converting to lowercase letters. This helps increase correct matches. I also ran queries against songs grouped by (artist, title) to avoid false matches on title alone. I don't think I screwed anything up, but I have no formal training in statistics, so no promises. All code used to collect and analyze the data, as well as the spreadsheets and graphs of the results, are available for download under the GPL here.

The question I wanted to answer was "is The Current's playlist shrinking, and how badly?" Generally speaking, a "good" playlist should play many different songs, and not play any particular songs too frequently. The challenge is to coax a subjective measurement like "good"-ness out of a massive pile of song listings.

The first measure I have is the "unique song ratio" - that is the number of distinct songs played in a period of time compared to the total number of songs played in that time. So it should be a fairly good measure of how much variety a playlist is offering. Higher is better - it means of the total playcount, there is a larger selection of songs played.
Unique song ratio
The numbers themselves are somewhat arbitrary, but there's a pretty clear and shocking trend visible here. Somewhere near the end of '07, things take a massive dive. The ratio over a week, which was hovering around 0.9, drops to nearly 0.6. It makes sense that the ratio over a month is lower all along - over the course of a month, it becomes much more likely that the song you're playing has already been aired. But when the giant dip in the graph levels out, the ratio over a week has leveled out right around where the ratio over a month used to be. That can't be good.

Similarly, we have average song plays, or the number of times a typical song will be played over a period of time.
Average song plays
That same programming shift is visible here, peaking at an average of 2.5 plays per song per month and leveling out over 2.

Of course, if The Current played every song exactly twice a month, I wouldn't have much room to complain (I might wonder if the director of programming had some nuerotic tendencies, but that's a separate issue). My concerns lie more in if certain songs are being overplayed. To further address that, let's measure the maximum playcount - the highest number of times any one song is played in a period of time.
Highest playcount for a single song
Again, the same trend is plainly visible. And this time, the numbers themselves are troubling. The recent end of the graph is somewhere between 60 and 70. That's enough to play the most popular song for a given month more than twice a day, every single day. The weekly count is up near 20, which is almost three times a day for that week.

So... ouch. This isn't just a minor tweak to programming. To me, this looks like a shift in the very identity of the station. And I don't think I like the new Current as much as the old one.

I don't want to get too hyperbolic. I'm sure these numbers would still look very good put up against a Clear Channel subsidiary, or really just about any commercial station. I would have loved to compile some numbers from one of those stations to have a good laugh, but sadly I couldn't find any that made old playlists available. If you know of one, I'd be interested to hear.

All the complaints flying around are not because we haven't counted our blessings - it's because we know just how lucky we are, and we're afraid we're slowly losing our treasured station to the mainstream. So no, it's not the end of the world, and I'm not convinced 89.3 has sold out to The Man just yet. But I used to describe The Current to my friends as "single-handedly saving radio." And I'm starting to wonder if I can still count on them for that. Maybe it's time to lay the responsibility in Triple J's hands.

Postscript

I want to close with one more analysis. Curious if drive time or other factors would affect the playlist at all, I ran a set of queries for the same uniqueness ratio as above, but now broken up into two hour time slots throughout the week (and yes, I included the weekend, whether that's good or bad).
Unique song ratio by time block
The orange line along the bottom is the monthly value, included just for reference. As you can see, most of the time slots follow the general trend towards less variety very closely. There are three slots, however, that don't: those from 4AM through 10AM. The Morning Show runs from 5-9AM. Strangely, the 6-8AM slot actually takes an upturn as everything else heads down. Did they ramp up their eclectic selection in reaction to the station's overall homogenization? I don't know. At any rate, woo yay Morning Show! Too bad it's ending forever in a week.







So Sad...

November 5, 2008

Visualizing sorting algorithms

I think sorting algorithms are cool. What? You're leaving already? But you only just got here...

It's true, I think sorting algorithms are cool, and not only because I'm a huge, massive nerd who sometimes spends weekend evenings coding for fun. I think they're cool because they're one of the places where the theoretical side of computer science can almost be concretely realized.

Visualizations of sorting algorithms not only make the process easier to grok, they sometimes look really cool. I like things like the Mandlebrot set because it's beauty from a totally theoretical source. By providing a simple set of rules for how the output should display and letting the computation run its course, one can create art.

So, when Grinnell's CS department decided started looking for a new logo, and John Stone brought up the idea of sorting a list of colors visually, I immediately liked the idea. Two Grinnell students, David D'Angelo and Soren Berg, had spent the summer implementing a Scheme console in Inkscape, and had recently given a very impressive presentation on their work. So I decided to give the idea a go with Inkscape and Scheme.

The resulting code can be found here. I tried to stick close to the functional paradigm, so you end up passing in a bunch of functions: most importantly, a function which takes a list and performs one "round" of sorting on it. In the examples here, I've tried to use "rounds" that take roughly n time. So with the simpler algorithms, it's one pass through the list. With quicksort, it's picking one pivot and moving everything else to one side or the other. And so on.

A visualization of mergesort that I made with this has been accepted as the new Grinnell CS logo, and will presumably be making an appearance on the website sooner or later.

Enough exposition! Let's move on the the results.


Insertion Sort

Well, we had to start somewhere...

Insertion sort (with borders)
Pretty straightforward, no? We start with a randomized list of colors on a gradient between black, Grinnell Red, and white. Each pass, we pull an item off the unsorted group and run through the sorted list to find the right spot for it. It works, and it's simple, but it's kinda dull and pretty slow.

Here's the same sort without the black borders, for your aesthetic enjoyment:
Insertion sort (no borders)


Merge Sort

Now we're talking. O(nlogn), wooooo!

Merge sort (with borders)
Each black border represents a sorted list (in the beginning, every list of one is sorted, because it only has one element). On every pass we merge these lists by twos, until we only have one list left.

Here's an un-bordered merge sort:
Merge sort (no borders)

Quicksort

Everyone's favorite fast algorithm that's still O(n^2).

Quicksort (with borders)
On each pass, a list is split into three lists: an arbitrary pivot, and all items less than and greater than that pivot. You can see divide and conquer at work here: on the first pass there is just one pivot created. By the second, there are three: the original, and one pivot picked out of each of the sublists. In contrast to merge sort, here it is when we have a plethora of one-item lists that the sort is done.

One without borders:
Quicksort (no borders)

Bubble Sort

That's right! A very special treat for you all!

Bubble Sort

Remember kids, just because it's kinda pretty, doesn't mean it's a good sorting algorithm.

Other Stuff

The cool thing is that now that I have my framework written, it's relatively easy to plug in new ideas. Following are a couple examples that I wrote up just recently.

Quicksort on a value-only gradient.
Quicksort (value only)


Quicksort on a list across the entire range of hues. The previous examples sorted by a simple sum of the RGB values of each color. For this one, I wrote a new comparator that sorts by the Hue part of HSL and used that for the sorting instead.
Hues quicksort


Mergesort on the same list of hues. Yes, I realize these are obnoxiously bright.
Hues merge sort


Okay! That's all for now. Hope you have enjoyed this, and maybe it's even inspired you to think differently about sorting algorithms for a moment. I may get inspired to mess around with these more in the future, who knows. I feel like this is only brushing the tip of the iceberg as far as the potential of scripting in Inkscape goes. Another promising route is to use David and Soren's library of transformation functions to do cool things to the items in the lists after they've been created, or in relation to their stage in the sorting cycle. And this list-based approach could probably be applied to things besides sorting. I'm off to research entropy...

October 28, 2008

How Bulk Political Mailing Should Be Done

A few days ago, I came across a real gem of a political mailing from the Republican Party of Minnesota. To a casual observer, it might just look like a sad smear attempt by a party hijacked by reactionaries and lacking any real substance. But I saw beyond the partisan hackery, and realized that this mailing had far more potential. It could be something truly great. And behold, with some scissors and glue, I made it so.

Before
Before
After
After

For best results, read in your best Dan LaFontaine voice (and watch a preview or two in his memory).

August 27, 2008

Why I sign with PGP

If you've received email from me recently, there's a good chance it's arrived with a funny-looking header and footer. At the top, it will say
-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1
And at the bottom is something like this:
-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Promote trust on the internet - Use PGP!
Comment: http://enigmail.mozdev.org

iD8DBQFIqfcGDTFvtHdOkUcRAm4JAJ4vJrcQcAM7gtzoHbI8ul3bA7EUagCcC5aO
RLpYAOHP5YS40I0xSB89pDA=
=VHP3
-----END PGP SIGNATURE-----
This all looks like nonsense. Has rage and bitterness finally won the battle for Ian's soul, leaving him banging the keyboard randomly while shouting obscenities at the Internet? No! Well, not yet anyways. This stuff around the message is a PGP signature.

If I were to send you a letter or write you a check (hypothetically of course, I hate you all and you certainly aren't getting any of my money), at the bottom there would be a little scribble vaguely resembling my name, as penned by a somewhat slow seven-year-old learning cursive for the first time. This signature is the conventional way of saying "hey, it's really me, your old pal Ian, and I did write this."

A PGP signature serves the exact same purpose for electronic communication. Of course, a string of letters proves nothing. But when I open a signed message in Thunderbird with the Enigmail extension installed, it looks something like this instead:


That's nice, innit? That green bar means that I can have confidence that these somewhat unsettling threats are, in fact, from CM Lubinski, and he has electronically signed his name to them.

Ok, so you're probably thinking that this is mildly interesting so far, kind of like a poorly-drafted version of Wikipedia, and it sure beats calculus or mopping the kitchen floor or whatever you ought to be doing, but, well, big deal. Dorks like Ian can get all excited about this PGP thing, but you're going to go trawl YouTube for some clips of a baby rabbit eating its own poo. You don't need all this signature stuff, right? Wait! That furry redigester will be there in ten minutes. First, read about...

Why You Need PGP

You need PGP. You're complacent. Things are going smoothly on the internet. Your biggest problem most of the time is the occasional piece of spam that slips through the filters and annoys us for the ten seconds it takes to read "Fr33 V1agr@" and click Delete. But the convenience of technology hides an ugly truth: email is horribly, horribly insecure.

Right now, right this instant, I could send you a message purporting to be absolutely anyone. It doesn't even take that diploma sitting on my bookshelves to do it. The Grinnell mail server and a dirty trick (which I am not going to share) is sufficient. Oh look, good old Rupert sent me something just now:


I (or someone with considerably worse intentions) can pretend to be anyone in email. To illustrate my point further, here's an email coming from a domain name that doesn't even exist (I checked):


It doesn't have to be imaginary email addresses either. I could send a message with a bunch of inappropriate jokes to your boss that looks like it's from you. I promise I'm not going to, but I, or anyone else, could. That's scary stuff. We've seen the tip of the iceberg on this with phishing emails that look like they come from accounts@ebay.com or whatever. People click those fake links by the boatloads and compromise all sorts of financial information. Even smart, internet-savvy people do. Why? Because we're complacent, and no one ever taught us to doubt that the person in the From: field actually sent that message.

Encryption

Scared yet? Here's some more food for thought: ever send private information through email? Like, say, financial information, or your company's business deals, or those emails you get when you register an account somewhere that sometimes have your new password in them. Or even just personal correspondence that you don't want to share with anyone except the recipient.

Guess what - everything you send in email winds its way across the internet in "plain text" - meaning, anyone who looks can read it. If any link in the chain of servers and data lines between you and your recipient is compromised - like someone eavesdropping at your wireless hotspot, or a mail server that's been broken into by hackers, or someone tapping an ethernet line somewhere, or a spying government aided by crony telecoms  - all your email is sitting there waiting to be poked through. Additionally, there's very little oversight of how mail servers (of which any given message may cross through quite a few) are administered, so it's quite possible that your messages will end up sitting on the server or on backup tapes for a long time - quite possibly years.

My point is this: we have no reason to be certain that everyone who gets a look at our email is trustworthy, and yet we send everything totally unprotected from prying eyes. It's like sending all of your bank deposits and love letters on postcards when some of the postmen have no credentials and didn't even pass a background check to get the job.

Luckily, PGP also provides optional encryption. It's like the electronic version of a security envelope. An encrypted messages looks like garbage, just a string of nonsensical letters. It's only when your intended recipient decrypts the message that it becomes readable again.

How PGP Works (the short version)

I want to give a brief overview of how PGP works. This isn't going to be the technical version (I'm not even qualified to give the technical version). It's also not going to be a guide to setting up your computer to use PGP. For that I simply direct you to the two plugins I use and like: Enigmail and FireGPG, and especially the quick start guide for Enigmail, which is really stellar and walks you through the steps of setting it up and using PGP for the first time.   In this article, I just want to explain the underlying concepts so you can see how PGP works, and why it's such a great idea.

To start using PGP, you create a "key pair," which consists of two parts, a public key and a private key. Your public key is something you can give to everyone - you can email it as a file, put it somewhere online, upload it to a keyserver (try searching for my name or email address), whatever. Your private key, as the name suggests, you keep to yourself - it's usually password protected as an additional layer of security. These two keys are tied mathematically. I don't pretend to understand all the details, but it's something to do with factoring primes, and the important point is that it's very quick to go one direction, but incredibly difficult to go the other. So while someone could, in theory, guess your private key using only your public key, it would take the world's fastest hardware thousands of years (yes, human years) to do so. Basically, these keys are pretty secure.

Now, when you write an email and sign it with PGP, the program uses your private key to create a string of letters that is algorithmically tied to the contents of your message. When someone receives your message and wants to verify that it came from you, they take your public key and reverse the process, checking the signature against the message. Verifying a PGP signature assures you that the message came from the owner of the key because only the person with access to the private key could have created that signature. When you want to encrypt something, you take your recipient's public key and use that to turn the message into gobbledygook. That way, only the person with access to the private half of that key (i.e. your intended recipient) will be able to decrypt and read the message.

A Brief Interlude on Trust

There's one more feature of PGP I want to touch on briefly, because I think it's pretty cool: the concept of trust.

I've been going on and on about how secure PGP is, but there's a hole in all this: how do you get other people's keys in the first place? After all, just because someone puts a key up on a public keyserver saying they're James T. Madison, you have no proof that that's actually who made that key. If you downloaded the key from that person's personal website or imported it the first time they sent you a signed message, you might trust that it's who you think it is. If they gave you the key in person, say, printed on a business card, you might trust it a whole lot more. But of course, it's not feasible to get all your keys in person - email is supposed to be convenient.

Keeping that in mind, let's do a quick thought experiment. In real life, you trust Bill because you've been friends with him for ten years and he's always been reliable and honest. Bill has a friend, Jack, who you have never met. But Bill vouches for Jack, and since you trust Bill, you trust Jack (to a certain extent).

PGP has functionality that emulates these sorts of relationships - the phrase "webs of trust" gets used a lot. When you import someone else's key, you can specify how much you trust that key. And, if you choose, you can sign other people's public keys, which is like vouching that they are who they claim to be. So suppose I have complete trust that John Stone's key is legit, because I got it from him in person. I sign Stone's public key. Now maybe CM just pulled Stone's key off a public server. He doesn't know if he should trust it or not. But say CM already trusts my key - since he trusts me and I have signed (vouched for) Stone's key, CM's PGP program knows that Stone's key is reasonably trustworthy.

The Future of Trust

Stop and think about webs of trust for a second. Isn't it a cool idea? This is the power of social bonds, realized in electronic form. Picture a world where everyone uses PGP. Imagine how hard it would become for frauds to work their way into a position to do any real damage when no one will vouch for them. Imagine the freedom to trust, really trust, people on the Internet. This is where I think PGP could take us.

One More Time, Why?

Okay, so I think PGP is important. But why am I signing all my emails with it, when next to none of my recipients are currently equipped to handle it? I have several reasons, most of which are inspired by John Stone's opinions on this topic:
  • Someone's gotta do it. If we all hang around waiting for other people to use PGP first, it will never happen. By signing my messages with PGP, the benefits are immediately available to anyone who sets it up and imports my key.
  • Advertising the functionality. Sending signed messages advertises my public key. If you want to send me an encrypted message, you know I am equipped to handle it, and you can pull my public key from the signed message to use for encrypting.
  • Proselytizing. This is probably my biggest reason for signing at the moment, and is also my reason for writing this post. I hope that some small percentage of people who receive my signed messages will, rather than being confused or just ignoring the extra stuff, be curious and look into PGP, and maybe realize what a great thing it is. I plan to link to this post in the comment section of the signature, in hopes of furthering this goal.

Final Thoughts

Go! Go install Enigmail or FireGPG! Do it! It's fifteen minutes of time now, but after that, they run quietly and unobtrusively in the background. You can do like I do and sign everything you send out, or you can just use it to verify any signatures you get and sign outgoing messages selectively (I guarantee if you send me a signed message, it will brighten my day). You're making yourself safer, and you're furthering a very worthy cause. The Internet is a cool place, people. But it belongs to us and it's our job to keep it respectable. Use PGP.

August 1, 2008

Hacking Grinnell Laundry

Disclaimer: I am giving you the knowledge, I am not dictating how it should be used. That's up to you to decide.

Easy steps to being a Grinnell outlaw:
  1. Put your laundry into a dryer. Pay with your P-card.
  2. Hit "Delicates" or the other option that isn't "Colors/Whites". Your dryer should fire up with something like 70 minutes on it.
  3. Open the dryer door. Close it again. The dryer will start blinking Select Cycle.
  4. Hit "Colors/Whites". The dryer display indicates that it is using the Colors setting (i.e. higher heat) but with all 70 minutes or whatever you got out of step 2.

For the record, I haven't actually checked to see if the dryer is doing what it claims to be doing. It could be so dumb that it is displaying one setting while giving you lower heat. I don't know. I do know that Grinnell laundry machines are crap.

The Laundry Day Boxer Problem

As a certified lazy person, I've been known to put off chores like, say, laundry for quite some time. And since I can keep wearing the same pair of pants until I drop food on them or something, it's easy to let things slide a little too long. To moderate my laziness in this regard, I use a tried and true lazy person technique: laundry day boxers.

This is the pair of boxers that stays in the drawer until all the other boxers are gone. This is the pair that is so garish that even in your sleep-addled 8am state, you can't help but notice what you're putting on, and realize that this means the grace period is up. You can't put it off any longer. Your laundry day boxers are a constant reminder that you need to do laundry now.

I know some of you own laundry day boxers. Don't be shy. Mine are bright orange, with little bats and Frankenstein monsters on them. My old roommate had a pair that were silk, with a screenprint of Elvis on them - brilliant.

Now here's where I encounter a problem (see, that title was relevant after all!): I need something to wear on laundry day. That, naturally, is my laundry day boxers. But I can't wash them because I'm wearing them. So come next laundry day, what will alert me that it is, in fact, laundry day? My laundry day boxers are now dirty and in the laundry, where they are useless to me. Useless!

Having ruminated long and hard on this problem, the only solution I can come up with is to obtain a second pair of laundry day boxers. At that point it's just a bootstrapping problem to arrange things so that one pair remains clean the first time I do laundry post-acquisition. Then, come that first laundry day, the dirty pair gets cleaned while I wear the clean pair, and I alternate from then on.

It doesn't seem quite right though. I really feel like each person should only own one pair of laundry day boxers. Call me old-fashioned, but when it comes to laundry day boxers, I believe in monogamy. I mean, when you're cavorting around in your purple sequined laundry day boxers, how do you think the ones with the Taco Bell chihuahua feel, down at the bottom of the hamper? Maybe I'm just off on this particular topic. Anyone out there own more than one pair of laundry day boxers? Maybe if I got two identical pairs, that might be a reasonable compromise - it's like I'm wearing the same pair, even if they're actually twins. And with that, I've strayed into borderline creepy analogy territory. Moving on.

Is this two-boxer solution the only way? Am I overlooking something painfully obvious? I feel like one of you people who likes game theory or something should be able to model this mathematically. This is serious business, folks.

April 22, 2008

Hello, World

This is a good blog. It is probably not the best blog.

If you're looking for the stories of my travels in the land down under, those are at iangreenleaf.blogspot.com.