March 20, 2009

Rsync and retrying until we get it right

Ok, this isn't all that special, but I scoured the first two or three pages of Google results and didn't come up with anything that solved my problem. So here it is, Internet - may the next person be luckier than me and not have to read any man pages.

Rsync is a cool utility, especially when I'm trying to plonk my 10Gb backup onto Dreamhost's flaky backup server. But I wish I could make it retry when things go south. There are various threads on doing this, but it would seem it's not built into rsync itself.

The obvious solution is to check the return value, and if rsync returns anything but success, run it again. Here was my first try:

while [ $? -ne 0 ]; do rsync -avz --progress --partial -e "ssh -i /home/youngian/my_ssh_key" /mnt/storage/duplicity_backups backupuser@backup.dreamhost.com:.; done
The problem with this is that if you want to halt the program, Ctrl-C only stops the current rsync process, and the loop helpfully starts another one immediately. Even worse, my connection kept breaking so hard that rsync would quit with the same "unkown" error code on connection problems as it did on a SIGINT, so I couldn't have my loop differentiate and break when needed. Here is my final script:



On a side note, duplicity is pretty neat. I only wish it would support resuming of interrupted backup sessions so that I didn't have to do this in two steps. My current backup workflow is

PASSPHRASE="backup" duplicity --encrypt-key 77XABAX7 /home/youngian --exclude "**/.VirtualBox" --exclude "**/.kde" --exclude /home/youngian/tmp/ --exclude /home/youngian/backup/ file:///mnt/storage/duplicity_backups/ --volsize 100

...and then the above rsync script.

February 26, 2009

Pascal and Global Warming

"Global warming?" I hear you ask. "Why bother? The scientific community has already delivered a verdict." Well, yeah. But big names like George Will are still churning out the occasional piece on how global warming doesn't exist, and while they generally get slammed in certain online circles, the big names aren't losing their jobs. This suggests that a significant portion of the population still likes hearing about how the sea ice is totally not melting at all.

And there is one facet to the global-warming-doesn't-exist camp that I acknowledge as having some merit. This is the argument that we humans don't know shit. Despite all our jetpacks and holograms, there are still a lot of things we don't understand, and how the climate works is by and large one of those things. Meteorology is hard, and you don't have to go farther than the local news to see how poorly we have mastered it. So yeah, the planet's been a bit uncomfortably warm lately, and we're kinda thinking maybe it has something to do with us, but we don't know that. We can't prove it. We can't even prove that cigarettes cause cancer, so of course we can't prove that our dirty habits are causing the North Pole to become the world's biggest EZ-Bake oven.

Take one step more moderate, and you can claim that we don't know what course global warming will take. This is hard to argue against because, well, we don't. Maybe things will level off again and we'll only lose Florida. Maybe the Great Filter will turn out to be something totally unrelated that wipes the floor with us long before we get too warm. Maybe Xenu just bumped into the thermostat dial on his way to the office and is gonna straighten things out as soon as he gets home.

What I'd like to do is advance an argument that doesn't demand wholesale acceptance of global warming. Leave facts and statistics out of it, since there's not much we truly know on the subject. All I ask is that you acknowledge that global warming, in the human-created worldwide-catastrophe sense, is a possibility. The "we don't know" argument works both ways, so it's certainly conceivable that the pinkos are right and we're on the first SUV to Sweatyville, right?

This is where Pascal comes into play. You've probably come across his triangle at some point or another. But never fear, we're not doing math today. Pascal was a well-rounded dude, so besides being a skilled mathematician, he was a bit of a famous philosopher too. He's best known for a theological argument that has since become known as Pascal's Wager. To make short work of it (the hardcore philosophers out there are already wincing), Pascal reasoned as follows:
We don't really know if God exists. But if he does, the stakes are high (eternal life, Hell, all that jazz). And if he doesn't, the believers aren't any worse off than the non-believers - we all cease to exist in the same way. So it's a good gamble to be pious regardless of any proof of God's existence.

I'm not so sure that Pascal's claims of no cost were true - I would argue that the effort of regular church attendance is a cost, not to mention possible financial losses from behaving like a pious Christian rather than an opportunistic capitalist. And since Pascal probably wasn't happy with an entirely risk-return based belief - "I accept Jesus into my heart because it makes good economic sense" might not cut it with St. Peter - you'll be needing to convince yourself that you truly believe, which sounds like a lot of emotion work to me (bam! sociology blindside!). But the reasoning still stands with a non-zero cost - a little bit of piety during one's lifetime doesn't seem like such a sacrifice when compared to an eternity in Hell.

Scott Adams (yes, the author of Dilbert) has taken the wager and run with it in some interesting directions, including the conclusion that we should become peace-loving Muslims. Of course, I have yet to convert (as does Adams), so maybe it's not as convincing as all that, but it's at least a good read. For me, the interesting next step of Adams' musing is this: Can we apply Pascal's argument to other avenues? I don't see why not.

Pascal's argument is dealing with theoretically infinite values, but I am convinced that humans cannot actually conceive of the infinite (that really is a topic for another day). My position is that our conception of infinite is the same as our conception of really, really big. I like Adams' framing of the argument in mathematical terms, because I don't think we have to actually hit infinity for the probability to tilt in our favor. Think about the math that considers what happens when a value is infinite, like limits or Big O. It's all couched in terms of "as x approaches infinity" because all we need is for x to get big enough that the numbers behave in a predictable way. We can't actually model infinity in math, but once we see a definite trend, that last step is easy to infer. This isn't math we're doing here (trust me), but I argue that we can approach it the same way. If a reward is large enough, and the costs small enough, we can apply Pascal's wager to it.

Sure, if you're already being a pedant about it, I'm bet you could trot out some claims about the types of logic involved. But frankly, if you're using phrases like "epistemic probability," I think you're already missing the point. Pascal's argument is just numbers. Adams gets this, and it's what helps him hit some interesting points without sounding like a pretentious windbag. The strength of the argument lies in pointing out some very big numbers and some not-so-big numbers, and from there any bookie could tell you which horse to bet on.

So, the argument:

If global warming exists, preventing it earns us the reward of continued existence. Worst case scenario: if the Earth keeps warming up, it's going to stop supporting human life sooner or later. We're remarkably fragile creatures, and our technological prowess will only get us so far, especially when we will have to cope with not only climate change but all the wars, famines, and tempests that spring up as a result. We'll probably snap and bomb each other into oblivion long before the last corn crop fails, but the end result is the same - we cease to exist, Darwin declares us the weakest link and sends us home. The water bears trundle along happily, and the universe quickly forgets about us.

What are the costs of preventing global warming?
  • We might have to start driving cars that get more than 8mpg. Yes, I know you love your Expedition.
  • We should probably insulate our houses our houses and use fluorescent bulbs. Quickly recouping your investment in monthly utility savings is a big sacrifice, I'm aware.
  • We'll have to research and improve alternative sources of energy. People, we're going to need to do this eventually, because the oil is going to run out, warming or not. I don't see why getting a head start on this is such a bad idea.
The costs come down to this: we'll have to spend some money now. There are some more factors that mitigate this further, especially for Western society - we'll be creating local jobs, we'll be reducing our dependence on other nations, and so on. I'm not qualified to judge the strengths of all these arguments, nor do I feel the need to. I'm satisfied that the costs of reasonable action on this front are not ridiculous or unachievable. We don't have to sacrifice any sons or clean the Augean stables, so I think this is doable.

So in one corner, we have a significant but manageable financial investment. In the other corner, we have the possibility of extinction of the species. Even if this possibility is somewhat remote by your reckoning, it's there, and it's a near-infinite value. Balanced against the low cost, this is a wise investment by the Pascal/Adams metric.

Even if you don't think global warming exists, you should still buy a hybrid. Or, y'know, bike to work.

January 11, 2009

Movies I Hate: War of the Worlds & Gone in 60 Seconds

Hooray, filler! I only have one "Top Ten List" on Netflix, and it is titled Movies I Hate. Here are two of the members of that list. Enjoy.

War of the Worlds

0.5 out of 5 stars

Never before have I sat in a movie theater and actually wished that a scene would turn out to all just be a dream. But as I sat through the transcendentally bad denouement of this movie, I realized that if Tom Cruise woke up, and it was just a dream, and he was really dying of pneumonia, I would forgive everything this movie had put me through. I would forgive the fact that our so-called protagonist is a lousy father and totally unsympathetic character. I would forgive the fact that tanks and helicopters can't take down a Strider, but apparently Tom Cruise with a couple grenades can. I would forgive the fact that each time Dakota Fanning screamed, I involuntarily squirted a little bit of urine into my pants from the horror.

It was not to be, and unless I blacked out momentarily from the sheer idiocy, the scene was not, in fact, a dream. If I end up an alcoholic five years from now, this movie is at least a little bit to blame.


Gone In 60 Seconds

0.5 out of 5 stars

What is the one reason we tolerate all the horrible plots, awful dialogue, and wooden acting in movies revolving around cars? Why, it's because we get to watch cars run into each other! So if you're going to make one of these movies, which element should you NOT remove? I'm no expert, but I would go with 'cars running into each other.'

To this day I am at a loss for why anyone thought it was a good idea to give this movie a plot stipulating that none of the sexy cars the characters drive can be smashed to smithereens. That was all it had going for it. Instead all we get to see is some police cars hitting scenery, and everyone one of those is followed immediately with a shot of the bumbling policeman appearing from the wreckage unscathed. Because, y'know, if someone got hurt, Nick Cage might be morally responsible (gasp), and how could we root for a washed-up felon then?

This movie also subjected me to the worst scene involving dirty talk and car parts I have ever witnessed. Really just one of the worst scenes of any type. It's been about five years, and I'm still angry.

December 4, 2008

89.3 The Current and the Mysterious Non-Expanding Playlist

Let me say this much up front: I still love The Current. I still have the dial on my car tuned there permanently, I still listen to the podcasts when I get a chance, I'm still a member. I'm saying this because the rest of this post is going to sound like Current-bashing. I still think it's a wonderful station - I just don't like the direction it feels we're heading.

A short primer: A few years back, a magical public radio station was born. It billed itself as "the antiformat" station, gave the DJs a massive amount of freedom, and played music that was always fresh, varied, and exciting (and usually quite good besides). Then, somewhere along the line, someone decreed that certain songs needed to get certain amounts of airtime. DJs started being told what their playlists should contain. One DJ quit over the issue. And people like me started wondering why the same song was playing every day on my 20-minute commute. Not that it's a bad song, just... I don't need to hear it every single day. I don't need to hear any song every day.

But, rather than complain anecdotally, I decided to use the power of numbers. The Current makes a massive history of their playlist publicly available on their website, dating back to 2005. So I wrote a screen-scraper in Python to pull all the songs off the site and store them in a sqlite database, which I could then run queries on and make pretty spreadsheets and graphs.

On methods: I tried to normalize all the data before storage, such as stripping non-alphanumeric characters and converting to lowercase letters. This helps increase correct matches. I also ran queries against songs grouped by (artist, title) to avoid false matches on title alone. I don't think I screwed anything up, but I have no formal training in statistics, so no promises. All code used to collect and analyze the data, as well as the spreadsheets and graphs of the results, are available for download under the GPL here.

The question I wanted to answer was "is The Current's playlist shrinking, and how badly?" Generally speaking, a "good" playlist should play many different songs, and not play any particular songs too frequently. The challenge is to coax a subjective measurement like "good"-ness out of a massive pile of song listings.

The first measure I have is the "unique song ratio" - that is the number of distinct songs played in a period of time compared to the total number of songs played in that time. So it should be a fairly good measure of how much variety a playlist is offering. Higher is better - it means of the total playcount, there is a larger selection of songs played.
Unique song ratio
The numbers themselves are somewhat arbitrary, but there's a pretty clear and shocking trend visible here. Somewhere near the end of '07, things take a massive dive. The ratio over a week, which was hovering around 0.9, drops to nearly 0.6. It makes sense that the ratio over a month is lower all along - over the course of a month, it becomes much more likely that the song you're playing has already been aired. But when the giant dip in the graph levels out, the ratio over a week has leveled out right around where the ratio over a month used to be. That can't be good.

Similarly, we have average song plays, or the number of times a typical song will be played over a period of time.
Average song plays
That same programming shift is visible here, peaking at an average of 2.5 plays per song per month and leveling out over 2.

Of course, if The Current played every song exactly twice a month, I wouldn't have much room to complain (I might wonder if the director of programming had some nuerotic tendencies, but that's a separate issue). My concerns lie more in if certain songs are being overplayed. To further address that, let's measure the maximum playcount - the highest number of times any one song is played in a period of time.
Highest playcount for a single song
Again, the same trend is plainly visible. And this time, the numbers themselves are troubling. The recent end of the graph is somewhere between 60 and 70. That's enough to play the most popular song for a given month more than twice a day, every single day. The weekly count is up near 20, which is almost three times a day for that week.

So... ouch. This isn't just a minor tweak to programming. To me, this looks like a shift in the very identity of the station. And I don't think I like the new Current as much as the old one.

I don't want to get too hyperbolic. I'm sure these numbers would still look very good put up against a Clear Channel subsidiary, or really just about any commercial station. I would have loved to compile some numbers from one of those stations to have a good laugh, but sadly I couldn't find any that made old playlists available. If you know of one, I'd be interested to hear.

All the complaints flying around are not because we haven't counted our blessings - it's because we know just how lucky we are, and we're afraid we're slowly losing our treasured station to the mainstream. So no, it's not the end of the world, and I'm not convinced 89.3 has sold out to The Man just yet. But I used to describe The Current to my friends as "single-handedly saving radio." And I'm starting to wonder if I can still count on them for that. Maybe it's time to lay the responsibility in Triple J's hands.

Postscript

I want to close with one more analysis. Curious if drive time or other factors would affect the playlist at all, I ran a set of queries for the same uniqueness ratio as above, but now broken up into two hour time slots throughout the week (and yes, I included the weekend, whether that's good or bad).
Unique song ratio by time block
The orange line along the bottom is the monthly value, included just for reference. As you can see, most of the time slots follow the general trend towards less variety very closely. There are three slots, however, that don't: those from 4AM through 10AM. The Morning Show runs from 5-9AM. Strangely, the 6-8AM slot actually takes an upturn as everything else heads down. Did they ramp up their eclectic selection in reaction to the station's overall homogenization? I don't know. At any rate, woo yay Morning Show! Too bad it's ending forever in a week.







So Sad...