The Web's 'Southern' twang
Ever wonder how much of the Web is missing? How prevalent are the
annyoing "404, page not found" error messages across the Web as a
I'm privileged run a private mailing list on which I can enjoy the
opinions of a lot of very smart people, and share them with you.
>From time to time a question like this one comes up on the list:
how common are 404s? Anton Sherwood posed it, and also suggested
the appropriate jargon: a page that has gone missing has "moved to
Atlanta" (its area code is 404, you see). Here are the fruits of
the ensuing discussion.
First the hard numbers, such as they exist. Last May, All The Web
surveyed a small
random sample of Web pages, and found that
28% of the pages had at least one bad link, and on average 5.7% of
all the links on all the pages were stale.
In 1997 I performed a 404 experiment on about 120 issues of a
topical Net newsletter, dating back 2-1/2 years before that time.
Each newsletter issue had about 30 external links, many of them to
stories at news sites, magazines, etc. Of the roughly 3600 links,
3/4 of them had gone 404.
All kinds of tools and solutions have evolved to salve the problem
of dead links on the Web.
Some companies offer site monitoring
services to help keep webmasters up-to-date on the status of their
links. Local tools such as
Coast Webmaster serve the same end.
These tools help webmasters find dead (outbound) links on the
pages of their site; other techniques are available to help their
servers avoid issuing 404 messages to (inbound) visiting browsers.
Some search engines -- Google
was the first -- offer their cached
copies of search results alongside the external links, papering
over the inevitable 404 gap resulting from their robots' revisit
schedule. An early project of Alexa
(now owned by Amazon.com)
aimed to save snapshots of the entire Web, or as much of it as
feasible, and offer archived pages to users who stumble across
404s. (This effort is nearly moribund now; the Web's growth has
outpaced it, as it has outpaced all efforts at containment and
indexing. The Alexa Internet Archive now encompasses half a
million pages, the Web over a billion.)
Are 404s really such a bad thing? According to the
Lab, 404s are your friends. The site tutors webmasters in
techniques for offering their own custom 404 pages. It surveys the
natural history of the 404 and provides a
gallery of the
coolest, strangest, most artistic, and most informative 404 pages
on the Web. (Also generously provided is a
page of links
other sites celebrating the 404.) Finally, the 404 Research Lab
conducts periodic surveys of its visitors.
This one asked what
people do upon encountering a 404. The results (from nearly 3500
Let me give the last word to some of the smart people on my list.
- 37% hit the back button and forget about it.
- 20% try to get to the home page to locate the missing page .
- 3% write to the webmaster.
- 40% weep uncontrollably.
I think the better solution is to acknowledge that webpages are
a subspecies of emphemera, and adjust expectations accordingly.
There are two distinct media, both written in HTML and served
- The Text Web: pages like those on photo.net and scripting.com,
consisting mostly of literately written prose. The purpose of
the Text Web pages is to communicate. Most pages there have a
very long lifetime.
- The Hamsterdance Web: glitzy pages full of animated .GIFs,
funded by banner ads, with background music and video games you
can play in Shockwave, with questionnaires asking for all kinds
of personal information. The purpose of the Hamsterdance Web is
to entertain. Pages there tend to be short-lived.
Pointless insertion of a palindrome:
"404! Page gap! 404!"