March 22, 2009

Enacting my own Terms of Service

I currently have a couple web crawlers running that periodically request content from a couple websites and store it in databases. It struck me as strange that these websites deigned to stipulate certain "Terms of Service" (ToS) over my use of their content and believed that these terms formed a contractual agreement, even though there had been no negotiation over these terms, and I had never signaled my assent (I haven't clicked any little "I agree" buttons on any of these sites). So I decided to bring the art of negotiation back into the formation of these previously one-sided agreements.

So, when one of my spiders makes a request, it adds a name/value pair to the query string of the URL, like so:
This parameter directs the content provider to my own Terms of Service for the transaction. My terms start out by making clear how a content provide may accept or decline them:

By serving the content I requested, you are agreeing to all the terms and conditions set forth in this document, without reservation. If you do not wish to agree to these terms, do not serve your content in response to this request.

They go on to detail how I may use the content I am requesting. My favorite part is this:
By serving the requested content, you agree to hereby waive any and all restrictions on use of your service that you may stipulate in your own Terms of Service, Terms of Use, or other legal document...
So if the content provider responds to my request, they have agreed to my ToS and waived any terms that they may subsequently try to stipulate on my use of their content.

Now, you might think this is stupid or absurd. You might even think that this is totally unenforceable, seeing as how all I have done is provide access to the terms I am stipulating and take continued participation as consent. And I would tend to agree with you.

However, I claim that if my terms are unenforceable, so are those stipulated by the content provider. How is my request any different than providing a tiny link to the Terms of Service way down at the bottom of the page?

Example of Terms of Service link

I have as much right to place limitations on the transaction as they do. My limitations just happen to nullify all of their limitations. They're welcome to stop serving me content if they don't want to accept my terms.

Think I'm wrong? Tell me why.

March 20, 2009

Rsync and retrying until we get it right

Ok, this isn't all that special, but I scoured the first two or three pages of Google results and didn't come up with anything that solved my problem. So here it is, Internet - may the next person be luckier than me and not have to read any man pages.

Rsync is a cool utility, especially when I'm trying to plonk my 10Gb backup onto Dreamhost's flaky backup server. But I wish I could make it retry when things go south. There are various threads on doing this, but it would seem it's not built into rsync itself.

The obvious solution is to check the return value, and if rsync returns anything but success, run it again. Here was my first try:

while [ $? -ne 0 ]; do rsync -avz --progress --partial -e "ssh -i /home/youngian/my_ssh_key" /mnt/storage/duplicity_backups; done
The problem with this is that if you want to halt the program, Ctrl-C only stops the current rsync process, and the loop helpfully starts another one immediately. Even worse, my connection kept breaking so hard that rsync would quit with the same "unkown" error code on connection problems as it did on a SIGINT, so I couldn't have my loop differentiate and break when needed. Here is my final script:

On a side note, duplicity is pretty neat. I only wish it would support resuming of interrupted backup sessions so that I didn't have to do this in two steps. My current backup workflow is

PASSPHRASE="backup" duplicity --encrypt-key 77XABAX7 /home/youngian --exclude "**/.VirtualBox" --exclude "**/.kde" --exclude /home/youngian/tmp/ --exclude /home/youngian/backup/ file:///mnt/storage/duplicity_backups/ --volsize 100

...and then the above rsync script.