A security researcher has stumbled over an unsecured MongoDB database server that contained highly detailed CVs for over 202 million Chinese users. Data appears to have originated from a data scraping app that collected resumes from Chinese job portals, the portal ZDNet reports.
Who owned the database is still a mystery, said Bob Diachenko, Director of Cyber Risk Research at Hacken Proof, the one who found the server's data left exposed online.
The MongoDB instance contained 854GB of data, with 202,730,434 records in total, most of which were CVs for Chinese users.
The resumes contained all the sensitive details you might expect to find on a CV, such as full names, home addresses, phone numbers, emails, marital status, number of children, political affiliations, body measurements like height and weight, literacy level, salary expectations, education, past jobs, and more.
The info was a threat intelligence gold mine that was left open for the entire internet to find. Tracking down its owner has been near impossible. Diachenko was forced to use a public plea on Twitter to ask for help in identifying the server's administrator.
One of the researcher's followers came to the rescue last year, when he pointed Diachenko to a now-deleted GitHub repository that contained the source code of a web app.
The app, most likely created to scrape CVs from legitimate job-finding portals, contained identical data structures to the ones found in the leaky database, a clear sign it was the one that scraped and collected the CVs.
Diachenko told ZDNet that one of the primary sources from where the app appears to have scraped CVs is bj.58.com, a popular Chinese job portal, however other portals could have been scraped as well.
When Diachenko contacted bj.58.com's staff, they confirmed his initial assessment that the data came from a data scraper, rather than being leaked from its network.
"We have searched all over the database and investigated all the other storage, and concluded that the sample data is not leaked from us," the bj.58.com spokesperson told Diachenko. "It seems that the data is leaked from a third party who scrapes data from many CV's [sic] websites."
bj.58.com did not reply to an additional request for comment from ZDNet.
But Diachenko's plea for help on Twitter also appears to have caught the eye of the database's owner, who secured the server and took down the GitHub repository a few days back.
This is not the first time that Diachenko finds a leaky server containing data from resume site scrappers. Last month, he also found a similar server exposing over 66 million records that appeared to have been scraped from LinkedIn, and later leaked via another MongoDB database.