by Tom Imerito
Originally Published in Pittsburgh Quarterly – Summer 2014
I first became aware that my online privacy wasn’t nearly as confidential as I thought while shopping online with my sister who lives in Florida. Separated by 1,000 miles, phones pressed to our ears, eyes glued to computer screens, my price for a particular web cam was a bargain at $3.37; hers was $4.66, over $1 more than mine. After checking to make sure we were viewing the exact same web page with the exact same Internet address, I couldn’t help but conclude that we had been sized up by a computer algorithm somewhere in cyberspace and offered different prices for the same piece of merchandise—all within a fraction of a second. The experience made me wary. I wanted to know more.
My first thought ran to cookies, the tiny pieces of computer code that tell a website who I am, when I arrive, where I look, what I click on, and when I leave. I wondered if my cookies could disclose enough about me, within fractions of a second, to give me a better price than my sister.
The next day, I called my sister again to compare the cookies on our hard drives. Not having spent much time or money on this particular website, my sister’s hard drive had eight cookies from it. Mine had 805. It was clear that the website recognized me as a past customer who liked to window shop and hold out for incredible deals. My sister, on the other hand, appeared to be a new customer who might be less patient or less stingy than I—hence her higher price.
Digging deeper, I began to enumerate the information this website was likely to have on me. As a matter of necessity, I had volunteered my home and email addresses, as well as land line, cell phone and credit card numbers, when I placed orders. I had disclosed the names and addresses of friends and relatives to whom I sent gifts. The cookies on my computer told the site my browsing and buying behaviors. I soon came to realize that this website knew more about me than my wife and mother put together. I grew curious about other sites. How much could they know? To find out, I resolved to snoop on myself. I thought it would be difficult and expensive to do. I was wrong. It was easy and cheap.
I began with four popular search engines that promptly displayed links to 100 or so articles I have written. But after that, obscure, decades-old facts about me began to pop up. For instance, one search engine took me to the archive of my hometown newspaper, where I found my first bylined feature story. The article’s dateline was December 22, 1965. Reading it brought back memories of writing on a manual typewriter in an old-fashioned newsroom with big wooden desks covered with green blotters, a worn maple floor that creaked with every footfall, and three teletype machines crashing out stories from AP and UPI 24 hours a day in the next room.
I went back in time even further, where I came across my deceased Aunt Chickie’s 1941 high-school graduation roster, which reminded me that Aunt Chickie’s real name was Margaret. Soon thereafter, my octogenarian mother’s current name and address appeared. To my dismay, so did her maiden name, which I thought was sufficiently obscure to have used as a secret question on several websites. My self-snooping expedition was starting to trigger a case of the willies.
As I dug deeper into the search engines, my name began to pop up in directory sites, where tidbits of personal information about almost everybody is available for free, and whole rafts can be had for a small fee. I looked myself up on four of the free sites and subscribed to one of the pay sites. For the most part the information was true. They had my home addresses for the past 30 or so years and the names of my wife and children. The pay site had a rough estimate of my income, and listed me as a purchaser of health and fitness stuff, which is accurate.
But a few purported facts were miles off base. For instance, one directory lists me as being associated with Gannon University, which, unless having a friend who graduated from there 40 years ago qualifies as an association, is not true. Another has me as an employee of a government agency I sued once. Another lists me as Hispanic, which explains why I occasionally receive Spanish language junk mail and telemarketing calls. I can only guess the error is a matter of confusion about my Italian surname. The most outlandish error was a site that has me living in a $1.1 million residence on West 25 Street in New York City, where I rented, but did not live in, a tiny office in the 1990s.
Since I had rented that office under an unrecorded sublease, I wondered how any person or machine would make the connection between me and that address. Then it occurred to me that the location had been posted on my company’s website, and since I was listed as the owner, it would be a simple matter for a computer algorithm to put a proverbial two-and-two together and conclude, if incorrectly, that I lived there.
But that site was taken down more than 10 years ago; how could that information have survived intact for more than a decade? To find out, I searched my name together with the company’s name, and sure enough, there, on a site called The Way Back Machine, was the company’s 1997 website complete with a picture of a much younger me. I began to get the irksome feeling that not only was my privacy being violated, but that the Internet had permanently recorded many of my life’s activities without my permission; and worse, that some of those purported facts were wrong.
The errors in my profiles made it obvious that at least some of the information about me had been concocted from more than one source. To get an idea how many sources were swapping data on me, I installed a piece of software that tracks the cookies on my machine and the data they send to other sites. I wiped all the cookies from my hard drive and began web browsing afresh. After six weeks, I had accumulated over 300 cookies from sites I visited with some regularity and an astounding 31,533 data exchanges between them and other sites, most of which were unknown to me.
The sites that dispatched the most data were my favorites—all purportedly free: an online newspaper at 5,971 reports, an instant messaging service at 1,835, a social networking site at 849, and a streaming music station at 664. I concluded that I was unwittingly exchanging information about my online behavior for access to free websites. The problem is, I didn’t recall agreeing to exchange my data for accessing the sites. I felt as though I had been duped.
To reconcile my indignation, I visited an Internet privacy law expert, professor George Pike of the University of Pittsburgh Law School, who set me straight.
First, he said, simple facts about me, such as my name, age and address, are public by their nature. Although I am under no obligation to reveal them, by the same token, I have no legal right to claim them as private. His explanation seemed logical enough. I never complained when the phone company listed my number in the phone book; what right did I have to complain now that the phone book was digital?
Second, I am not the legal owner of information that has been gathered about me; whoever gathers it owns it. This revelation was harder to swallow because I had never asked anybody to collect information about me. Nor had I agreed to allow them to collect it. Shouldn’t information about me belong to me until I surrender it? The short, legal answer is no. The detailed answer revolves around the idea of sweat equity. For centuries prospectors have routinely staked claims for minerals on public lands based on the fact that they worked to discover the deposit. Apparently the same idea extends to public data. Whoever digs it up owns it.
Third, once I voluntarily disclose a piece of information about myself in a public forum, such as an online discussion group or a social network, it no longer qualifies as legally private. This bit of counsel was more a revelation of reality than a fine point of law. Regrettably, privacy is a one-way affair—you can go from private to public, but not the other way around.
It occurred to me that I although I had indeed checked the “I agree” box on scores of sites, I had never actually read even one of the documents to whose terms I had agreed. In the interests of due diligence, I resolved to remedy that by actually reading the fine print on the nine sites I routinely connect to each day. They ran the gamut from my computer manufacturer and operating system developer to my favorite news outlet and streaming radio station. The task entailed a mind-numbing, word-by-word, reading and notation of 78,633 words in 10 hours, one minute and 35 seconds. Yes, I counted the words and timed myself. At the end of the exercise I took a day off to regain my cognitive composure.
During my reading, it became clear that every one of my nine favorite websites’ privacy policies disclaimed any responsibility for privacy matters associated with any of the sites they link to. This easy-to-dismiss detail carries monumental consequences. It meant that to be truly diligent, I would have to read all the privacy and use agreements of all the websites on which I have ever clicked. As ridiculous as such an effort sounds, that is precisely what all nine sites recommended. After making a conservative estimate as to how many sites might be entailed in such an effort, I calculated that reading them all would take about 250 hours—just slightly more than six 40-hour weeks. As a practical matter, it was just not possible.
In 2012, the online data exchange industry grew at a rate of 18 percent and conducted over $30 billion in business, so there can be little doubt that this great mass of data is very valuable. The question remains, how much is one person’s data worth? Is it worth a free newspaper or radio station? Free email? Is it worth the gas it takes to drive to the mall or the library? Or to know in a few seconds Superman’s real name on Krypton? (Kal-El…. I looked it up… for free).
If that were all there is to it, my answer would be unequivocally, yes. But there’s more to it than that. And the rest is a little scary.
Still another Internet privacy expert, professor Alesandro Acquisti of CMU, warns of an emerging statistical practice, called re-identification, in which pieces of information about a person collected on the Web and stripped of identifiable information, for privacy purposes, are reverse-engineered and re-compiled to reveal a person’s true identity. When used in combination with other advanced statistical methods, re-identification pushes the crime of identity theft to a heightened level of concern.
For instance, Acquisti has reverse-engineered the Social Security Administration’s Master Death File to unravel the agency’s account number distribution algorithm. Since death is one of those simple public facts that don’t qualify for privacy protection, the Master Death File is considered public information. As such, Acquisti is able to legally predict a living person’s Social Security number, based on his or her date and place of birth. Fortunately, Acquisti is one of the good guys. But the same technology could make a life of crime much easier for cyber criminals.
On the law enforcement side, Keith Mularski, supervisor of the Pittsburgh FBI’s Cyber Crimes Squad, says that the proliferation of personal information volunteered by social network participants obviates the need for cyber criminals to use methods as technically sophisticated as Acquisti’s Social Security experiment. But Joe Ferrara, CEO of Wombat Security Technologies in Oakland, has recently noticed a shift in cyber crime from financial identity theft of individuals to corporate invasions committed against businesses. Ferrara says the most frequent modus operandi of cyber criminals today is to construct a fake cyber-person designed to appeal to a targeted victim’s publicly disclosed, and subsequently gathered, interests, affiliations, habits and activities. Typically the victim is a key member of a corporate team whose financial, competitive or intellectual property is the object of the theft. While the victim enjoys conversing with the fake personality, secure in the false comfort of a collegial exchange and erroneously reassured by his or her company’s security measures, the hacker behind the fictitious profile ransacks the organization’s financial, marketing, sales or intellectual property information, and makes off with the proverbial store. It’s one of the reasons, Ferrara says, to never accept an online invitation to join a social network from somebody you have never met.
The FBI’s Mularski sums up Internet privacy succinctly: “When you put information on a social network, think about it as though you’re shouting it from the middle of Market Square, because that’s how much privacy you can expect.”
Although the idealist in me cherishes privacy, the pragmatist in me has concluded that, sadly, I have sold my privacy for the convenience of sitting in my home office while earning a living by looking at a computer screen that puts a world of information and communication at my fingertips and writing about it.
I find it irritating that I didn’t know I was paying for that convenience with my privacy. Had I known, I would have been a little more careful about what I disclosed, but I probably wouldn’t have changed much else.
And I surely would have taken a moment to kiss my cherished privacy goodbye. Maybe that moment is now.
Tom Imerito is president of Science Communications, a Pittsburgh technology communications consultancy.
Published in the Summer 2014 issue // subscribe to Pittsburgh Quarterly