A security researcher has discovered a massive cache of data for millions of Instagram accounts, publicly accessible for everyone to see. The account included sensitive information that would be useful to cyberstalkers, among others.
A security researcher calling themselves anurag sen on Twitter discovered the database hosted on Amazon Web Services. It had over 49 million records when discovered and was still growing before it was deleted.
The Instagram data included user bios, profile pictures, follower numbers and location. This information is viewable online. What’s more puzzling is that it also contained the email address and telephone number used to set up the accounts, according to Techcrunch, which broke the story.
Reporters identified the owner of the database as Mumbai-based social media company Chtrbox. It pays social media influencers to publish sponsored content through their accounts. The database has since disappeared from Amazon.
Response from Chatrbox
Chatrbox took issue with press coverage of the leaked records, sending Naked Security the following statement:
The reports on a leak of private data are inaccurate. A particular database for limited influencers was inadvertently exposed for approximately 72 hours. This database did not include any sensitive personal data and only contained information available from the public domain, or self reported by influencers.
We would also like to affirm that no personal data has been sourced through unethical means by Chtrbox. Our database is for internal research use only, we have never sold individual data or our database, and we have never purchased hacked-data resulting from social media platform breaches. Our use of our database is limited to help our team connect with the right influencers to support influencers to monetize their online presence, and help brands create great content.
How might someone compile a massive database of Instagram information?
The company wouldn’t answer any more questions, so it’s difficult to know for sure. User names, profile shots, and follower numbers are publicly available and could be gathered by screen scraping. Screen scrapers use automated scripts to visit websites and copy the information they find there.
Companies use scraped data for all kinds of purposes, such as price comparisons and sentiment analysis. It’s considered malicious and many publishers try to block it because the scrapers are using their proprietary data and also draining their server resources.
We’ve seen people scraping Instagram before. Redditors attempted to archive every image from the site that they could, for kicks.
But it can get you into trouble. Authorities in Nova Scotia, Canada arrested a 19-year-old for scraping around 7,000 freedom-of-information releases from a public web site there, calling him a hacker. They subsequently dropped the charges.
What isn’t typically public is the phone number and email address used to create the account, and which TechCrunch says was included with some records. Facebook used to make this available via the Instagram API, even for accounts that didn’t publicly list that information. It had to turn off that feature in September 2017 after it found people downloading celebrity contact details.