Archive for April, 2008

This is a technical web development post.

The other night I was up until 1:30AM fixing a bug in At 10:00pm I took a look at the latest changes log and noticed hundreds of “reverts” had happened and were continuing to happen. All coming from the same IP address. I panicked! I had to shut the site down temporarily while I fixed the problem and repaired the damage.

The problem was caused by a combination of a badly behaved robot (web spider) and a bug in the “undo revision” code.

1. The bug. Concharto is a geographic wiki. All wiki’s need to make it easy to revert changes. The changes page has a series of “undo this change” links next to each change. It was originally implemented as a simple link with a query string that looked like this: /edit/undoevent.htm?id=188&toRev=3. The bug is that all HTTP GET requests like this one should be idempotent - i.e. repeated requests (e.g. a user clicking the link over and over) should have the same effect as only one request. For instance, a link to add something to a shopping cart is NOT idempotent and is best implemented with a POST method (and some javascript). When I coded this feature way back, I did it the lazy way.

2. The Robot. All robots are supposed to follow the instructions in the robots.txt at the root of your web site. This file provides directives on how you want robots to follow links. Our robots.txt file explicitly prohibits following links to the changes page (which contained the bug). Unfortunately, all robots don’t follow this advice. This particular robot found its way to the changes page and started following all of the links there, including each “undoevent.htm” link. Ugh!

I had to do three things to recover:

  • Fix the bug. I switched all of the links to invoke a javascript submit() method and post the results to the web controller. This has two benefits: robots don’t usually run javascript and they don’t usually do HTTP POST.
  • Revert the changes that the robot made. Database backups are a good thing.
  • Block the robot’s IP address.

3. The Lesson. Keep your GET methods idempotent. It is sometimes easier (less coding) to use the query string to pass parameters but it is a bug – you will probably have to fix it later.

  • Share/Bookmark