This article was written by Keith Dawson for Boston.com's DigitalMASS Internet column. It is archived here for informational purposes only because it no longer appears on the DigitalMASS site. This material is Copyright 2000 by Boston.com.


You can click, but you can't hide

Keith Dawson
2000-03-08

The unknown perpetrators of the recent, much-publicized denial-of-service attacks used tools designed to obliterate their tracks in cyberspace. By contrast, the ordinary user who surfs thinking (or hoping) that "on the Internet, nobody knows you're a dog" leaves tracks deeper and more persistent than he or she may imagine. The sites you visit not only know you're a dog -- they know your breed, the condition or your coat, and the cost of your last shampoo at the grooming shop.

Let's look into how much Internet sites know about you without even trying, and how much they can find out with a little more persistence.

Connection data. Every time your browser requests a file over the Web, it sends a packet of information to the server. Visit this page at Junkbusters for a quick look at some of the information conveyed in every HTTP connection request. The bold text you see was customized just for your visit. (Bet you didn't know about this one: Click on the check on whether you are being revealed link in the penultimate paragraph.)

More data. If the site owners want to do a little more work, they can learn considerably more about you:

  • Whether your browser supports Java and/or JavaScript
  • What browser plug-ins you have loaded
  • How large your monitor is and how many colors it's using at the moment
  • How many Web pages you have viewed in this window
  • The local time according to your computer
  • Who your ISP is, where it's located, and who its upstream provider is

Server logs. All Web servers store basic information about every request they process. For example, I just now visited the DigitalMASS home page. My single click left 5 records in this site's server log and 28 more in the log file of akamai.net (which DigitalMASS uses to serve some of its graphics files). Each log-file record contains my IP address, the time, information about my browser, the page I was viewing when I clicked, and other information. Visit this page on Privacy.net for a good overview of what log-file entries look like and how to read them.

Many Web sites store their server logs in a database and analyze their visitors using powerful data-mining tools. Others simply dump the log files out onto the Web for all to see.

Cookies. If the last few paragraphs have succeeded in making you uneasy about how much you reveal while surfing the Web in imagined anonymity, consider how much more you disclose when you accept cookies from the sites you visit.

A cookie can act as a unique identifier for your computer. A site can't count on your IP address (which it gets from the server log) to identify you, because you may be assigned a different IP address the next time you dial in to your ISP. Or you may be visiting from behind a firewall at work, and all the site has to go on is the IP address of the firewall. If your browser returns your own computer's unique identifier (i.e., their cookie) each time you visit, it becomes possible to correlate your actions on the site across weeks and months.

Let's say you visit a toy company's site in July. Their server stores a cookie on your computer without your awareness. Every few weeks you go back to see what's new, browsing for toys your kids might enjoy. The site's owners don't know who you are, but they can tie your visits together using their cookie. They can mine the server-log data for insights into how the site's navigation is working (what paths did you take through the pages?), which pages aren't pulling their weight (what's the last page you saw in each visit?), etc. So far so innocent.

Months later, as the holidays approach, you finally buy a toy from the site. As soon as this transaction reveals your identity the site owners can -- if they wish -- tie together your entire history on the site with other data about you, data purchased perhaps from a credit agency, a direct-mail marketing firm, or a magazine.

Now consider how much more data DoubleClick can collect about you. The Internet ad agency serves ads, with a side of cookies, for more than a thousand popular Web sites. DoubleClick can mine your meanderings across hundreds of sites over a period of months or years. If they are ever able to associate a real-world identity with your unique DoubleClick-assigned cookie, think how much more they could learn.

For hands-on fun, visit Privacy.net's simulated advertising collaborative. It demonstrates, quite graphically, how such cross-site tracking operates.