Wikiasari forum
Thursday, February 8th, 2007Jimmy Wales of Wikipedia fame wants to start up an open source, for-profit search engine based on Nutch to compete with Google.
There’s a community forum available:
Jimmy Wales of Wikipedia fame wants to start up an open source, for-profit search engine based on Nutch to compete with Google.
There’s a community forum available:
We’ve come to accept over time that spiders visit this site as much as humans. Judging by the number of spiders that seem to live here, you’d think it was a cave.
The majority of the spiders we recognise, and we appreciate the attention. Google and Yahoo! come here all the time. It makes us proud (hi guys!).
But along with those mechanical spiders, we also get visits from a variety of baby-bots and wanna-bots who rummage through the site heavily for a while and move on, and others who strike repeatedly. They don’t say who they are, or what they are doing.
Many of them are right not to advertise, because they’d get banned right off - those include spam e-mail harvesters from Brazil, and self-appointed cyber-cops like Cyveillance, who come to sniff around to see if we might be offending them.
Those we ban as quickly as we can - we don’t like spammers or bullies (bad bullybot!).
But we also get bots run by regular, decent folk who just want to keep up on what’s going on here.
The problem is that some do it hourly for months on end, and if you are a regular reader of this site, you know one thing for sure - they are wasting their time, and our bandwidth, because this site only gets updated about once every three months.
There are web site managers who jealously guard their web sites, and who go through their stats looking for abusers. There are discussions and exchanges about who certain IPs are, and what they are up to - so bad bots can get a reputation.
A bot might be banned if it sticks its head up in the stats for things like bandwidth consumption, number of hits, frequency of visits, and so on.
A bot will especially attract notice if it doesn’t respect Robots.txt, doesn’t introduce itself, falsifies information, comes from a bad neighbourhood, has a bad reputation, or drags a site down to a crawl.
Bot banning could become a bigger issue in the future as more and more bots are unleashed, and the Internet becomes clogged with spiders, pre-fetchers, harvesters, comment spammers, scrapers, and other critters.
It’s possible that there will come a time when bots are automatically banned at first sight, and the sub-über bot (come on, say it out loud) will need to beg for an invitation.
Right on brother!
“The next killer app isn’t an app.
It will be a new networking platform that builds on today’s world-wide web and makes possible new generations of more powerful and useful applications. “
What distributed open source search lacks in storage space and speed, it can make up for in processing power. What to do… what to do….
You betcha they are.
I’ve gone on before about Vernor Vinge’s singularity - the point where artificial intelligence takes over, and leaves us in the dust - but we need to hear more.
The Singularity Institute wants to make sure it doesn’t kill us and eat us.