Archive for August, 2004

American Civil Liberties Union : The Surveillance-Industrial Complex

Tuesday, August 24th, 2004

The full title is “The Surveillance-Industrial Complex: How the American Government Is Conscripting Businesses and Individuals in the Construction of a Surveillance Society,” and, well, the fourth conclusion is that the ACLU is leading the fight.

The expansion and aggregation of data about citizens by corporations, combined with post-911 paranoia is leading to the development of dossiers on everyone, eagerly consumed by governments, law enforcement agencies, and corporations beyond the constraints laid down by law and constitution.

The technology is in place, and the collections are for sale to anyone. The government wants to use them to fight terrorism by treating everyone as suspects. Is this the end for the concept of the free and private citizen? Is this the route to a police state?

For those concerned about data collections being used to spy on Americans (and Canadians, among others), this is an interesting and disturbing 33 page report.

American Civil Liberties Union : The Surveillance-Industrial Complex

The relevance to indexing and search is that search engine companies can track your searches (at least by IP address), and maintain a database of them. Companies like Google which offer diverse services such as Froogle, GMail, and Orkut can track your search terms, correspondence, shopping, and social interactions and combine them with outside data into a huge and comprehensive profile of you and everybody else. The sale of such information is a potentially huge revenue stream for a corporation, and in itself is a powerful resource for surveillance, analysis, and control.

How can a public index ensure that privacy is protected?

Doug Cutting Interview

Sunday, August 15th, 2004

Here’s an interview in/at Google Blogoscoped with Doug Cutting , who is principal developer of Lucene and Nutch. He talks about managing spam and distributed search among other things.

Doug Cutting Interview

KnowItAll

Saturday, August 14th, 2004

Let’s see if I’ve got this right – KnowItAll, out of the University of Washinton is a search engine that concentrates on extracting lists out of search results:

KnowItAll

There’s a short New Scientist article on it too:

Search engine tackles tricky lists

Is it open source? I don’t know! Although it does or will incorporate Nutch software and shares at least one contributor, nothing indicates it is intended to be open source.

ACM Queue – Building Nutch: Open Source Search – A case study in writing an open source search engine

Saturday, August 14th, 2004

This is a nice article on how Nutch works from April 2004’s ACM Queue, written by Mike Cafarella and Doug Cutting.

ACM Queue – Building Nutch: Open Source Search – A case study in writing an open source search engine

Search Engine Technology and Digital Libraries: Libraries Need to Discover the Academic Internet

Saturday, August 7th, 2004

Note to Librarians:

Ok, you’ve been sitting on the sidelines while a whole bunch of computer nerds take over indexing and cataloguing the largest repository of information ever known. That’s got to hurt. And what about the ‘deep web’? Is anyone indexing the academic collections? Are librarians destined to be antediluvian caretakers of old media or will they take up the torch and lead the way into a better organised and searchable future? Maybe it’s time to take back the light. Stand up, and shout “Hands off my metadata!”

From the author:

“This paper advocates a concerted initiative of the library community to pick up state-of-the-art search technology and build reliable, high quality search services for the research and teaching community.”

Search Engine Technology and Digital Libraries: Libraries Need to Discover the Academic Internet