Archive for May, 2004

ACM Queue – Searching Vs. Finding – How do you help computers find the information people really want?

Wednesday, May 12th, 2004

From the ACM’s Enterprise Search feature, William A. Woods, of Sun Microsystems Laboratories discusses different methods of information retrieval. Some of these are computation-intensive algorithms. As the authour puts it:

“It would be possible, in principle, to apply the same kinds of semantic and morphological expansions to the entire Web, using the specific-passage-retrieval technique, but that has not been my primary target. The Web is so vast that it is difficult to predict what would happen without trying it. There would probably be more issues with word sense ambiguity, and a global conceptual taxonomy would be awe inspiring. It would be an interesting challenge. Certainly the cost would be greater than for current Web search engines and might not fit their business models.

The specific-passage-retrieval algorithm lends itself to applications of large scale, because it allows a collection to be subdivided and the search to be distributed, with the results easily collated (because the penalty scores are independent of collection statistics). In theory, this could be used for a kind of federated Web search in which owners of content could provide their own indexing and search and could update their indexes whenever the content changed. This would address a fundamental problem of Web searching: the never-ending task of repeatedly crawling the Web, trying to keep the indexes current.

It is interesting to contemplate a federation knit together by a spanning network of systems (possibly a peer-to-peer network) that distribute queries and collate the results. Some of the members of the federation could be large content providers who index their own content, whereas others could be crawler-based services like current Web search engines. Of course, this would take a heretofore untold amount of cooperation among many players that are currently fierce competitors, making this scenario perhaps nothing more than theoretical for the time being.”

ACM Queue – Searching Vs. Finding – How do you help computers find the information people really want?

ACM Queue – Why Writing Your Own Search Engine is Hard – So you have a grand idea; are you ready for the execution?

Wednesday, May 12th, 2004

A nice and easy 4 page article on how to write your own search engine by a Stanford pHd at the Association for Computing Machinery’s Queue magazine. It’s part of their April 04 Enterprise Search feature.

ACM Queue – Why Writing Your Own Search Engine is Hard – So you have a grand idea; are you ready for the execution?

Objects Search – Search Engine

Thursday, May 6th, 2004

Here’s a new open search engine based on Nutch and featuring clustering of results into categories that help you narrow down your search.

Objects Search – Search Engine

mozDex: Building a Search Engine. || kuro5hin.org

Sunday, May 2nd, 2004

On Kuro5hin, an announcement of the early release of mozDex, an open source search engine based on Nutch and Lucene, along with discussion and commentaries.

Building a Search Engine. || kuro5hin.org

(mozDex is at mozdex.com, but today (May 2) it’s really slow, and errors out.)