Profile data was scraped without user consent or knowledge to "build a three-dimensional picture" on millions of people. The data was subsequently found by Chris Vickery, director of cyber risk research at security firm UpGuard. Vickery showed ZDNet the data first-hand in New York last week.
A little-known data firm was able to build 48 million personal profiles, combining data from sites and social networks like Facebook, LinkedIn, Twitter, and Zillow, among others -- without the users' knowledge or consent.
Localblox, a Bellevue, Wash.-based firm, says it "automatically crawls, discovers, extracts, indexes, maps and augments data in a variety of formats from the web and from exchange networks." Since its founding in 2010, the company has focused its collection on publicly accessible data sources, like social networks Facebook, Twitter, and LinkedIn, and real estate site Zillow to name a few, to produce profiles.
But earlier this year, the company left a massive store of profile data on a public but unlisted Amazon S3 storage bucket without a password, allowing anyone to download its contents.
The bucket, labeled "lbdumps," contained a file that unpacked to a single file over 1.2 terabytes in size. The file listed 48 million individual records, scraped from public profiles, consolidated, then stitched together.
Vickery, a well-known ethical data breach hunter, disclosed the leak to Localblox's chief technology officer Ashfaq Rahman in late February. The bucket was secured hours later.
The discovery is the latest twist among recent scandals involving tech companies and their data collection practices.
Just last month, Facebook was embroiled in a privacy row after London-based data firm Cambridge Analytica obtained data on as many as 87 million users, according to a "conservative estimate" by the social networking giant, from an academic app that collected data on its users and their friends. The data was used to build profiles on millions of Americans to predict how people will vote at the ballot box, including the 2016 presidential election.
The controversy sparked uproar, triggered congressional and parliamentary inquiries and investigations across the world, and forced Facebook to introduce stronger privacy practices.
But the data collection by Localblox can be just as invasive, and can include highly sensitive and personally identifiable information on a person -- without a person's consent.
The data was found in a human-readable, newline-delimited JSON file. The data collected includes names and physical addresses, and employment information and job histories data, and more, scraped from Facebook, LinkedIn, and Twitter profiles.
UpGuard's own report, published Wednesday, contained search queries that Localblox would use to cycle through email addresses that it had collected through Facebook's search engine to retrieve users' photos, current job title and employer information, and additional family information. It's also believed that the company supplements its collected data from non-public sources, like purchased marketing data. The data is then compiled, organized and blended into existing individual profiles.
The report described the collection operation as an effort to "build a three-dimensional picture on every individual affected" to use for advertising or political campaigning.
Vickery said that some records are more complete than others. The data can include, but not always, information such as if a person is a credit card user, their "Do Not Call" preferences, marital status, and net worth. Localblox claims it has more than 650 million records in its device ID database, and 180 million records in its mobile phone database, which includes mobile phone numbers and carriers.
"Concentrating millions of people's details can become by its very nature a weaponized thing, and something that can lead to a lot of harm," said Vickery.