Archive for December, 2006

Common Spidey Sense

Tuesday, December 5th, 2006

We’ve come to accept over time that spiders visit this site as much as humans. Judging by the number of spiders that seem to live here, you’d think it was a cave.

The majority of the spiders we recognise, and we appreciate the attention. Google and Yahoo! come here all the time. It makes us proud (hi guys!).

But along with those mechanical spiders, we also get visits from a variety of baby-bots and wanna-bots who rummage through the site heavily for a while and move on, and others who strike repeatedly. They don’t say who they are, or what they are doing.

Many of them are right not to advertise, because they’d get banned right off – those include spam e-mail harvesters from Brazil, and self-appointed cyber-cops like Cyveillance, who come to sniff around to see if we might be offending them.

Those we ban as quickly as we can – we don’t like spammers or bullies (bad bullybot!).

But we also get bots run by regular, decent folk who just want to keep up on what’s going on here.

The problem is that some do it hourly for months on end, and if you are a regular reader of this site, you know one thing for sure – they are wasting their time, and our bandwidth, because this site only gets updated about once every three months.

There are web site managers who jealously guard their web sites, and who go through their stats looking for abusers. There are discussions and exchanges about who certain IPs are, and what they are up to – so bad bots can get a reputation.

A bot might be banned if it sticks its head up in the stats for things like bandwidth consumption, number of hits, frequency of visits, and so on.

A bot will especially attract notice if it doesn’t respect Robots.txt, doesn’t introduce itself, falsifies information, comes from a bad neighbourhood, has a bad reputation, or drags a site down to a crawl.

Bot banning could become a bigger issue in the future as more and more bots are unleashed, and the Internet becomes clogged with spiders, pre-fetchers, harvesters, comment spammers, scrapers, and other critters.

It’s possible that there will come a time when bots are automatically banned at first sight, and the sub-uber bot (come on, say it out loud) will need to beg for an invitation.