Tuesday, December 11, 2007

The dead web - Google + Archive.org?

I have put this into this new blog from my old one where i originally posted this (actually that was also a re-hash of the same idea i wrote back in 2002, so been on my mind for a while!) a while back due to Dave Winer's recent post. He is spot on!

====

This was something i posted on an old blog some years back, but recent discussions have made me re-post just in case there is some new opinion!

Over the last 2 months I have been conducting research almost exclusively on the web.
What has really became obvious to me is the amout of dead material out there.

From web pages, that contain out of date information, to whole sites that stopped running years ago with no indication, to projects that seem to have been in flux for years, businesses that stopped trading years back and left their site on and even stuff written by people whom i'm almost read to email, only to find the passed away a couple of years back.

So is a new web needed to get us out of this? Can we see Google work with Archive.org and create a diary of the web? A time-aware searcheable web which allows some kind of time scale on the information out there, without requiring everyone to annotate their documents! Could i say "Only search content added/updated in the last year" ?

I hope so, because frankly it's getting ridiculous. 10 years ago i did some research on solitons for a Physics paper i wrote. Today some of that material returns seelingly as relevant as ever despite things continuing to evolve over the last decade. My File Exists article on 15 seconds at http://www.15seconds.com/issue/990401.htm is now over 5 years old, but still comes 8th in Google when i type "FileExists".

I don't know how many replies i have had indicating some academic moved on 3 years ago, or some project research was finished, or even links to other sites that closed their doors, re-organized or just changed their content to make it completely useless.

Could a hyped up archive.org challenge something like Google? I think so. Could we "Diff The Web" to make the content more relevant - noting that getting dublin core on everything is highly unlikely.

Anyone got answers?

No comments: