August 17, 2006

debate on ethics: should web search data be stored?

One more poser vis a vis the ethics of the AOL search disclosure:

Is it ethical for companies to keep logs of online user behavior at all?
This is one question that has been raised repeatedly, long before the AOL release made it visible to everysurfer. There has been a great deal of online discussion on this topic, but I will point you to the WSJ Online, which recently hosted a good debate on this question, Should Web Search Data Be Stored?, between Kevin Bankston of Electronic Frontier Foundation, and Markham Erickson of NetCoalition, a lobbyist for Internet firms including Google and Yahoo.

[Bankston] You also again say that the disclosure should not have happened, yet AOL insists that it was not a violation of its privacy policy to do so. If AOL's privacy policy doesn't protect users' search privacy, and the law (as you argue) doesn't protect search privacy, what are we left with? I'll tell you: a bunch of big companies with a dossier of your personal interests and circumstances more detailed than probably any other single record, which they have no duty to keep secret, and which they have massive economic incentives to mine for marketing data or sell. And since all of the privacy policies pretty much say "we can change this policy at any time," they're useless in the long-term even if they are currently providing adequate privacy protections (which they apparently aren't).

The sensitivity of this data cannot be overemphasized, and taking a Wild West approach with no clear rules is a dangerous game. The search industry's customers, and the American public, deserve better.

Go read the whole WSJ thing, it'll make you smarter.

Posted by Gene at August 17, 2006 6:23 PM | TrackBack

Libraries have a long history on this issue, having had systems for 3+ decades in which the public performs searches. Although searches in library catalogs are arguably less sensitive than many internet searches, most libraries actively scrub their logs to insure privacy of individuals. Men with badges and trenchcoats taught us long ago that it's best to expunge data that can be tied to any individual user ID once the transaction is complete. The Patriot Act only reinforced that.

Library catalog searches are generally not associated with any kind of individual user identifer, and often not with each other in any kind of session-awareness. In cases where the user must identify himself, such as for access to commercially-licensed resources, the library systematically removes all identification from logs. Shortly after a checked-out item is returned, libraries permanently destroy the association between the user and the item both in the live system and in backed-up data.

Of course, libraries are not businesses in the normal sense, and their devotion to privacy is costing them. Library systems lag behind commercial services in socially-based features in part because of a reluctance to mine and expose user behavior, or at least a healthy fear of doing it badly. Even something as simple as "Others who used this item also used:" could be disastrous for a grad student working on original thesis. But as libraries "lose market share", they wonder how they can "be more like Amazon" without sacrificing some principles.

The default positions taken by libraries (your behavior is your own business) and Internet search engines (I own your behavior) are different. It's so easy --I've done it too often-- to reveal a little personal information for immediate access to some online goodie without realizing that those few keystrokes might live forever.

Posted by: Robin at August 18, 2006 8:34 AM
Post a comment

Remember personal info?