This article was written by Keith Dawson for Boston.com's DigitalMASS Internet column. It is archived here for informational purposes only because it no longer appears on the DigitalMASS site. This material is Copyright 2000 by Boston.com.


The Web's 'Southern' twang

Keith Dawson
2000-01-26

Ever wonder how much of the Web is missing? How prevalent are the annyoing "404, page not found" error messages across the Web as a whole?

I'm privileged run a private mailing list on which I can enjoy the opinions of a lot of very smart people, and share them with you. >From time to time a question like this one comes up on the list: how common are 404s? Anton Sherwood posed it, and also suggested the appropriate jargon: a page that has gone missing has "moved to Atlanta" (its area code is 404, you see). Here are the fruits of the ensuing discussion.

First the hard numbers, such as they exist. Last May, All The Web surveyed a small random sample of Web pages, and found that 28% of the pages had at least one bad link, and on average 5.7% of all the links on all the pages were stale.

In 1997 I performed a 404 experiment on about 120 issues of a topical Net newsletter, dating back 2-1/2 years before that time. Each newsletter issue had about 30 external links, many of them to stories at news sites, magazines, etc. Of the roughly 3600 links, 3/4 of them had gone 404.

All kinds of tools and solutions have evolved to salve the problem of dead links on the Web. Some companies offer site monitoring services to help keep webmasters up-to-date on the status of their links. Local tools such as Coast Webmaster serve the same end. These tools help webmasters find dead (outbound) links on the pages of their site; other techniques are available to help their servers avoid issuing 404 messages to (inbound) visiting browsers.

Some search engines -- Google was the first -- offer their cached copies of search results alongside the external links, papering over the inevitable 404 gap resulting from their robots' revisit schedule. An early project of Alexa (now owned by Amazon.com) aimed to save snapshots of the entire Web, or as much of it as feasible, and offer archived pages to users who stumble across 404s. (This effort is nearly moribund now; the Web's growth has outpaced it, as it has outpaced all efforts at containment and indexing. The Alexa Internet Archive now encompasses half a million pages, the Web over a billion.)

Are 404s really such a bad thing? According to the 404 Research Lab, 404s are your friends. The site tutors webmasters in techniques for offering their own custom 404 pages. It surveys the natural history of the 404 and provides a gallery of the coolest, strangest, most artistic, and most informative 404 pages on the Web. (Also generously provided is a page of links to other sites celebrating the 404.) Finally, the 404 Research Lab conducts periodic surveys of its visitors. This one asked what people do upon encountering a 404. The results (from nearly 3500 replies):

  • 37% hit the back button and forget about it.
  • 20% try to get to the home page to locate the missing page .
  • 3% write to the webmaster.
  • 40% weep uncontrollably.
Let me give the last word to some of the smart people on my list. Ted Byfield:
I think the better solution is to acknowledge that webpages are a subspecies of emphemera, and adjust expectations accordingly.

Kragen Sitaker:
There are two distinct media, both written in HTML and served via HTTP.

  • The Text Web: pages like those on photo.net and scripting.com, consisting mostly of literately written prose. The purpose of the Text Web pages is to communicate. Most pages there have a very long lifetime.

  • The Hamsterdance Web: glitzy pages full of animated .GIFs, funded by banner ads, with background music and video games you can play in Shockwave, with questionnaires asking for all kinds of personal information. The purpose of the Hamsterdance Web is to entertain. Pages there tend to be short-lived.

David Weinberger:
Pointless insertion of a palindrome: "404! Page gap! 404!"