Wikiasari forum
February 8th, 2007Jimmy Wales of Wikipedia fame wants to start up an open source, for-profit search engine based on Nutch to compete with Google.
There’s a community forum available:
You can't compete as an open source project with Google using the same infrastructural model: p2p/distributed is the only way to go!
Although the largest Internet search engines and indexes like Google and AltaVista are commercial property, there are ambitious efforts to create open-source alternatives to them.
No one is yet providing a serious threat to Google, but these efforts represent a very serious presumption that the resources of the Internet are public, and that the public should have open access to them.
Indexes can be set up on a small scale, or globally. They can be operated by corporations, individuals, or communities.
This website is especially interested in global, community-level, grassroots, cooperative - distributed - systems. That's our bias - developing an index by the people, for the people, of the people, as it were.
This is a place for people interested in building, designing, or just talking about open source Internet indexes, search engines and spiders and robots.
Jimmy Wales of Wikipedia fame wants to start up an open source, for-profit search engine based on Nutch to compete with Google.
There’s a community forum available:
We’ve come to accept over time that spiders visit this site as much as humans. Judging by the number of spiders that seem to live here, you’d think it was a cave.
The majority of the spiders we recognise, and we appreciate the attention. Google and Yahoo! come here all the time. It makes us proud (hi guys!).
But along with those mechanical spiders, we also get visits from a variety of baby-bots and wanna-bots who rummage through the site heavily for a while and move on, and others who strike repeatedly. They don’t say who they are, or what they are doing.
Many of them are right not to advertise, because they’d get banned right off - those include spam e-mail harvesters from Brazil, and self-appointed cyber-cops like Cyveillance, who come to sniff around to see if we might be offending them.
Those we ban as quickly as we can - we don’t like spammers or bullies (bad bullybot!).
But we also get bots run by regular, decent folk who just want to keep up on what’s going on here.
The problem is that some do it hourly for months on end, and if you are a regular reader of this site, you know one thing for sure - they are wasting their time, and our bandwidth, because this site only gets updated about once every three months.
There are web site managers who jealously guard their web sites, and who go through their stats looking for abusers. There are discussions and exchanges about who certain IPs are, and what they are up to - so bad bots can get a reputation.
A bot might be banned if it sticks its head up in the stats for things like bandwidth consumption, number of hits, frequency of visits, and so on.
A bot will especially attract notice if it doesn’t respect Robots.txt, doesn’t introduce itself, falsifies information, comes from a bad neighbourhood, has a bad reputation, or drags a site down to a crawl.
Bot banning could become a bigger issue in the future as more and more bots are unleashed, and the Internet becomes clogged with spiders, pre-fetchers, harvesters, comment spammers, scrapers, and other critters.
It’s possible that there will come a time when bots are automatically banned at first sight, and the sub-über bot (come on, say it out loud) will need to beg for an invitation.
Right on brother!
“The next killer app isn’t an app.
It will be a new networking platform that builds on today’s world-wide web and makes possible new generations of more powerful and useful applications. “
What distributed open source search lacks in storage space and speed, it can make up for in processing power. What to do… what to do….
You betcha they are.
I’ve gone on before about Vernor Vinge’s singularity - the point where artificial intelligence takes over, and leaves us in the dust - but we need to hear more.
The Singularity Institute wants to make sure it doesn’t kill us and eat us.