The Web Never Forgets. Should It?


By now most people are aware that Google changed its privacy policy. You’ve read articles on all sides of the spectrum: some say this is the worst thing ever; others claim it just doesn’t matter. And there’ll be even more since the EU has declared this new policy “illegal.” This new policy matters quite a bit, but not for the reasons you might suspect. It’s not anything in the new policy—most of this data sharing would have been possible under Google’s previous, and separate, 70 policies. Instead, what matters is that this policy rewrite is a clear signal to consumers about just where all this online tracking is going.

In a nutshell, Google is condensing the information that it collects across its 70-plus services into one detailed profile on its users. And even if you read the policies, it’s very broad and hard to understand any specific details. In fact this is what the EU appears to have the most issue with, that this policy is so broad as to have no specific meaning:

“Google makes it impossible to understand which purposes, personal data, recipients or access rights are relevant to the use of a specific service.”

This broadness is the problem. That’s what’s bothering people most. Google is now telling users that it’s combining data from different sources. Most specifically, data from very different types of activities are being combined. Most people understand that Google is looking at their mail while they’re in Gmail, or looking at their searches while they’re searching—it’s the combination of the tracking of all your activities in one place that causes this negative gut reaction.

Why? The reason is that people operate and behave differently in different contexts. Think of how different your online behavior is when you are essentially daydreaming by following a haphazard path through YouTube videos, compared to how you behave online when solving a problem at work.

This is something that people intuitively understand. It’s part of human nature. But there’s a mismatch between how we actually live our lives and how the massively recorded Web operates. As social animals we‘re uniquely adapted to adjusting our interactions with each other in these different contexts. There’s a gulf between what you say and how you behave when you are with friends, versus what you say when testifying before Congress. And that’s not a bad thing: there ought to be. If your behavior weren’t different in those situations, then those people wouldn’t be your friends.

That’s the disconnect: often people think they’re making “small talk” online when it’s actually the equivalent of a prime-time CNN interview. Human memory forgets; the Internet does not. This sort of data collection and storage becomes a permanent record, one where all data points can be equal and all individuals can be identified.

All this data combination also has the effect of increasing the level of precision in identifying individuals. Each additional dataset makes it easier to select individual characteristics to be targeted (and therefore the value of the dataset), as well as making it easier to de-identify and pull individuals back out of “anonymous” data. Although you might not be identifiable in real life based on just your hair color, start adding information to that pile—height, parents’ names, favorite movies—and identifying you becomes easier and easier. The same rules apply to the virtual you: you might not be identifiable through one Google search or a few Gchat conversations, but the sum of all your activities on Google’s services, these separate pieces, rapidly narrows it down to you.

And it’s not going to stop anytime soon. Google’s changes are a visible result of the enormous pressure to monetize user data that most online consumer and content companies face. The fundamental economic structure of most Web companies will continue to lead to more changes like Google’s as they seek to maximize the value of the user data they already have and are collecting. To consumers, these policy changes should be treated like a canary in a coal mine (even though it’s an 800-pound canary).

So what can be done differently? In an ideal world consumers could limit the use of data to pre-approved users. This may not be feasible, as companies would have to spell out all the uses in advance; they would be limited in their experiments and innovation, and users would probably be faced with an even greater barrage of end-user license agreements in their online interactions.

A more practical solution would be to give data an expiration date, except for minimum datasets to do things like validate transactions. It’s not clear that there’s a need to save all accessible information on every user just because it’s possible. There are credible arguments that humans must forget certain amounts and types of information in order to grow, mature, and live happy lives. Maybe we need to do the same thing to our online society.

Andrew Sudbury is founder and chief technology officer of Abine, an online privacy company. Follow @

Trending on Xconomy