Links

Links in this page go to outside sites. There are links to general topics, theory and technology, and to actual projects - usually current ones, although some past projects are included for interest and perspective. If you have links you would like to contribute, please send them in.

General How-Tos and Search Engine Optimisation.

Indexing and Searching the Web.

Societies, Journals and Web Sites on Indexing

Political and Social Issues

Technologies - Search and Indexing

Reviews and Compilations

Open Source - Development, Links, etc.

  • OpenSourceSearch - A resource for open source development of search engines.
  • OSDN - Open Source Development Network.
  • Open Source Initiative (OSI) is a non-profit corporation dedicated to managing and promoting the Open Source Definition for the good of the community.

Licenses - Copyright and Patents.

Projects - Mostly Open Source Spiders and Crawlers and Search Engines - Then and Now

  • Search Engines and Indexing - A Wide Range - Intranet and Internet

    • DataparkSearch Engine is a full-featured open source web-based search engine.
    • DSpace digital repository system captures, stores, indexes, preserves, and distributes digital research material.
    • ASPseek is an multi-site search engine, written in C++ using the STL library. It consists of an indexing robot, a search daemon, and a search frontend (CGI or Apache module).
    • Egothor is an Open Source, high-performance, full-featured text search engine written entirely in Java.
    • ht://Dig is a system for indexing and searching a finite (not necessarily small) set of sites or intranet.

    • KartOO is a proprietary (but cool!) metasearch engine with visual display interfaces.
    • Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.
    • MKSearch is a research project to develop a metadata search engine.
    • Mobilemaps is a nearby engine, which lets users physically locate information.
    • mozDex is a search engine based on Nutch and Lucene.

    • mnoGoSearchtm (former UDMSearch) web search engine software.
    • Namazu is a full-text search engine intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files.
    • Nutch is an effort to implement an open-source web search engine. Nutch builds on Lucene Java to provide web search application software.
    • OpenFTS (Open Source Full Text Search engine) is an advanced PostgreSQL-based search engine that provides online indexing of data and relevance ranking for database searching.
    • OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine.

    • PhpDig is a web spider and search engine written in PHP, using a MySQL database and flat file support.
    • Red-Piranha is an open source search system that can 'learn' what you are looking for.
    • Senas is an open source search engine created from scratch in Perl.
    • Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller.
    • Swoogle is a crawler-based indexing and retrieval system for Semantic Web documents in RDF or OWL.

    • Terrier is software for the rapid development of Web, intranet and desktop search engines.
    • Webglimpse and Glimpse: Unix-based search software, website index, intranet search software.
    • Xapian is an Open Source Probabilistic Information Retrieval library you can add to your own applications.
    • Yacy is a p2p-based distributed Web Search Engine.
    • Zebra is a high-performance, general-purpose structured text indexing and retrieval engine.

    • Alvis conducts research in the design, use and interoperability of topic-specific search engines with the goal of developing an open source prototype of a distributed, semantic-based search engine.

  • Peer to Peer


  • Clustering

    • Carrot2 is a research framework for experimenting with automated querying of various data sources (such as search engines), processing search results and visualization. Carrot2 was primarily built with search results clustering in mind, but it can be configured to do other things.

  • Crawlers and Spiders

    • OpenCola Folders - A distributed tool for spidering the Internet and locating new documents.
    • Grub is a distributed web crawler. It's alive! It's alive!
    • The WIRE project is an effort started by the Center for Web Research for creating an application for information retrieval, designed to be used on the Web.
    • SearchTools.com - All About Search Indexing Robots and Spiders.