We already know that if you use an online social network, you give up a serious slice of your privacy thanks to the omnivorous way companies like Google and Facebook gather your personal data. But new academic research offers a glimpse of what these companies may be learning about people who don't use their massive web services. And it's a bit scary.
Because they couldn't get their hands on data from the likes of Facebook or LinkedIn, the researchers studied publicly available data archived from an older social network, Friendster. They found that if Friendster had used certain state-of-the-art prediction algorithms, it could have divined sensitive information about non-members, including their sexual orientation. "At the time, it was possible for Friendster to predict the sexual orientation of people who did not have an account on Friendster," says David Garcia, a postdoctoral researcher with Switzerland's ETH Zurich university, who co-authored the study.
Garcia's findings showed that for people in minority classes—homosexual men or women, for example—his profiling techniques were 60 percent accurate. That's a pretty high accuracy, he says, "since a random, uniformed classification would have a precision of less than 5 percent."
The paper only examines sexual orientation, but Garcia thinks this type of analysis could model things such as age, relationship status, occupation, even political affiliation. "Basically, anything that is already shared by the users inside the social network could be predicted," he says.
It's yet another reason to be wary of Facebook in particular, as the social network's growing size, massive user database, and increasing emphasis on advertising revenue continues to worry users. Last week, a two-month-old Facebook alternative called Ello was generating 50,000 new member requests per hour—not only because it was ad-free but because it provided a safe haven for members of the lesbian, gay, bisexual, and transgender community unhappy that Facebook forced them to use their real names. But even if they flee Facebook, it seems, the social network may still have ways to betray their privacy.
The problem Garcia identifies lies in something called "shadow profiles," and as a consequence, we all could be intimately profiled by the Facebooks and Googles and LinkedIns of the world—whether we agree to it or not.
Garcia says this kind of statistical analysis—essentially using machine learning to study the known tastes and relationships of one person's contacts, and making a guess about who they are likely to be—could be used to build disturbingly detailed profiles of people who do not even use the social network. Although the Friendster data dates to the last decade, Garcia believes that Facebook could make the same type of predictions with its data—and probably do this better because it has so many more users than Friendster ever did.
We learned about shadow profiles last year when security researchers at a company called Packetstorm discovered Facebook was maintaining its own files on users' contacts. For example, if Facebook found two users were connected to a non-member—say, [email protected]—it would pool other information—different phone numbers, for example—into one master dossier.
A Facebook spokesman says the company "doesn't have shadow accounts or profiles – hidden or otherwise – for people who haven't signed up for our service," and a 2011 audit by Ireland's Data Protection Commissioner confirmed this. But the company does store information on non-users when Facebook members import their contact lists.
'A Major Problem'
That doesn't sit well with everyone. "The fact that I have no control over additional email addresses and phone numbers added to their data store on me is frightening," Packetstorm wrote in a blog post last year. The man who wrote this post, Packetstorm Partner Todd Jarvis, says that he believes that Facebook still collects this data, despite his company's recommendation that they delete it. "As long as it exists, it is a liability in my opinion," he says.
He thinks that because it's such a tricky technical and ethical issue, that the only way to really protect the data of people outside of the network is through legislation. "It is not enough to get a statement from Facebook saying we promise not to build those profiles," he says.