Ever since the 2016 US election, some of the most urgent questions in tech have been about the impact of social media — Facebook, in particular. How much does fake news influence people’s opinions? Do filter bubbles lead to more polarized views? Why are people so inclined to share outrageous stories? Researchers have been looking at those questions for years, but actual experimental data has been hard to come by. But this summer, select researchers will get access to some of the most insightful social media data we have, direct from Facebook.
In an unprecedented move, Facebook has agreed to provide anonymized data about how stories are shared across its platform. The first 12 research groups to get access to Facebook’s sensitive user behavior data were announced last week. Collectively, they’ll be studying critical questions facing politics and technology: how disinformation campaigns play out on social media, the ways partisan communities become organized, and whether Facebook as a whole can swing an election.
“If we can find privacy preserving ways [of sharing data] … society would be so much better off,” says Gary King, a Harvard professor who’s helping to oversee the data-sharing arrangement between Facebook and researchers. “It’s an incredible resource that just goes untouched.”
Multiple projects will be looking directly at Facebook’s impact on elections. One group from the Pontifical Catholic University of Chile will study how Facebook usage affected Chile’s 2017 congressional elections, which brought the conservative “Chile Vamos” coalition to power. Another led from the University of Urbino in Italy will study how populist parties benefit from the sharing of partisan news sources. In Taiwan, researchers will study how news stories on Facebook impact civic engagement.
Many of the studies are broader than that, with an interest in when and why people share fake news and the types of people who do the sharing. That’s partly because of the data being made available: researchers have been able to study similar information in the past by looking at things like public tweets, but Facebook’s data is far richer. Among other things, it’ll include what links were shared, the age range and general location of people who shared them, whether people read those links before sharing them, and any associated fact-check rulings. While Facebook isn’t representative of humanity as a whole, its nearly 2.4 billion users certainly account for a good chunk.
A project led by R. Kelly Garrett, an associate professor at Ohio State University, will look at whether there are predictable patterns that lead to sharing fake and dubious news stories. Facebook’s data, Garrett says, will provide things that traditional methods of data gathering can’t offer. “People can’t reliably tell you, ‘I usually share stuff I haven’t bothered reading in middle of the night, in spring, on weekends,’” he says. “People don’t know or have incentives not to tell you the truth.” Garrett hopes to identify patterns that track across social media networks, which could help online platforms make changes to discourage the sharing of fake stories.
The data is being generalized to protect individuals’ privacy — using age brackets instead of specific ages, regions instead of specific locations — but it will still have enough detail for researchers to assess these important questions, says Magdalena Saldaña, who’s co-leading a separate Chilean team’s research into how exposed residents were to fake news. It was just a matter of crafting the studies correctly. “It was a challenge for us,” Saldaña says. “It made us modify a bit our research goals and propose a study that can actually find patterns of misinformation spreading on Facebook without having data at the user level.”
Several research groups will also take advantage of the ability to study sharing behaviors on Facebook before and after an algorithm change designed to promote friends over media sources. “What’s tricky is that both social media and making extreme news have evolved in tandem over the last 10 or 20 years,” says Nicholas Beauchamp, an assistant professor at Northeastern University, who’s leading a research group that’s studying how peer sharing affects the polarization of news. His group will look across the algorithm change to see whether peer sharing changes the rates of fake news. “We have this nice little kind of natural experiment,” he says, “where suddenly there’s this unexpected shift towards much more peer sourced information.”
All of this hinges on whether Facebook and researchers can conduct their studies without violating the privacy of any of the company’s billions of users. The project is being overseen by Social Science One, a new organization designed to make partnerships like this one, between researchers and a data-rich institution like Facebook, come together. Social Science One, which was co-founded by King, stands in between Facebook and the research teams to ensure that everyone gets what they want: researchers get the independence to publish whatever they find without Facebook having any input, and Facebook gets a careful set of eyes looking at its data to make sure that nothing shared with researchers can be traced back to the actions of a specific person.
King has, at least tangentially, bumped up against Facebook privacy concerns in the past: an organization he founded and chaired on the board, Crimson Hexagon, was suspended by Facebook last year amid an investigation into whether its data harvesting was being used for government surveillance after The Wall Street Journal reported it had contracts with the US, Turkey, and a Kremlin-linked organization. King commented at the time that he “never had line authority or day-to-day involvement” with the company. Facebook did not respond to a request for comment on the state of its Crimson Hexagon investigation.
There are several layers of protection in place to make sure the use of Facebook’s data doesn’t go awry. At the most basic level, proposals must be approved by a university’s Institutional Review Board. Then there are the layers of protection put in place by Social Science One and Facebook to make sure no individual’s reading and sharing habits can be identified. URLs will only be included if they’ve been shared enough times and in some kind of public way. The data is also going to be tainted with a small number of random errors using a technique called differential privacy. That means, on a granular level, the data won’t be fully reliable, but the big picture view will still provide reliable numbers.
Access to the data will also be restricted in various other ways. Researchers won’t be able to download or store the data; instead, they’ll have to access it through a secure portal to Facebook’s servers. Everything researchers do on those servers will be logged, and they’ll be given a data “budget” that stops them from gathering more information than they need.
“The way research has sometimes been described in the past is as a balancing act, right? There might be harm to individuals, but there’s benefit to society, so we do what we can,” King says. “Well, our goal was not to balance.”
At least, that’s the plan. Social Science One will begin introducing researchers to the data through a simplified dataset next month, which will essentially act as a beta test. There are fewer privacy concerns about the data being made available there since demographic data will largely be stripped out. So initial work can be done while the organization monitors to see whether the system works as planned. If it does, researchers will be able to get the full dataset and potentially even more in the future.
“We are absolutely talking to other companies,” King says. “We haven’t made any commitments yet because we want to make sure that this one works.”