Issue 129: Facebook and Clubhouse profiles scraped through APIs, Forrester’s “State of Application Security, 2021”


This week, we obviously have to discuss the hundreds of millions of Facebook and Clubhouse user profiles that were scraped using APIs. In other news, Forrester has published their fresh and insightful report “The State of Application Security”, and there’s a new online training “Building an Identity Architecture for APIs”.

Data leak: Facebook

The biggest recent data leak news is the huge database of 530 million Facebook users that was made available. Facebook has made an official statement on the incident, downplaying it because the data was “scraped” already back in 2019 using Facebook’s APIs, rather than obtained through some sort of database access or another “direct” hack.

The vulnerability that attackers exploited was in the API that Facebook created for discovering friends based on the contacts in your phone. Facebook wanted to make it easy for users to find their friends on the social network, so the Facebook app used an API to upload the contacts from users’ phones to Facebook and fetch the profiles of users matching the uploaded phone numbers.

Sounds nice and user-friendly, but as so often happens, making things easy can also make them less secure. Attackers took advantage of the feature by generating huge “phone books” and submitting them to the API. After all, phone numbers are just a sequence of numbers following a set syntax: country code, area code, and a seven-digit number. Thus, iterating through them is not difficult.

This allowed scraping the user database and collecting a huge dataset of personal details, definitely something that should not have happened and should probably have been foreseen. After all, attackers are a very resourceful bunch. Now, with the publication of the scraped dataset, the information on names, Facebook IDs, phone numbers, and email addresses of 530 million users (including Mark Zuckerberg himself) has ended up in the public realm.

Naturally, with a high-profile case like this, a lot of researchers have been looking into the details. For example, check out this thread by Ashkan Soltani.

Let’s try to summarize the impact:

  • Facebook’s stance that if API was used to collect the data, it is not a hack makes little sense. APIs are just another attack vector — and in fact one of the main attack vectors these days — used by malicious actors. A hack is a hack, regardless of the attack vector employed.
  • Even though some of the data is public (like names), such large datasets of this public information combined with other details can be used for various phishing and social engineering attacks. There have been reports on other datasets that also include users’ page likes. This aggravates the situation considerably: the more details scammers have readily at their disposal, the more convincing the scams.
  • The phone number matching worked regardless of the user’s privacy setting. Even if the phone number was set not to be shared, or even if it was only used for multi-factor authentication, the API still exposed the user profile for the phone number. This is a huge violation of user privacy and trust!
  • Reports indicate that Facebook had been receiving reports on the vulnerability for years before finally fixing the vulnerability in 2019!

Look at it whichever way you want, Facebook is definitely not appearing in a good light here.

Lessons learned from this bad example:

  • APIs are one of your primary attack surfaces. Treat them as such!
  • Even public information can become dangerous once it becomes available as a large dataset. Information is power, after all: more information, more power. It should not be allowed to fall into wrong hands through negligence.
  • APIs should not give access to more data than you are comfortable with (and allowed to!) sharing through user interfaces.
  • Bulk operations are extremely dangerous. Always limit not just the rates at which APIs can be invoked but also the amount of data that they can return.
  • APIs need to be secured by design. Fixing issues only after they have been exploited can be disastrously late.
  • Monitoring, alerting, and promptly reacting to vulnerability reports are good additional mitigation measures.

Data leak: Clubhouse

Another popular social network, Clubhouse, had a similar leak. 1.3 million user profiles got publicly posted, and — sadly — with a very similar reaction:

As we just discussed, the fact that APIs were used to retrieve the information does not change the impact (and vendor responsibility) in any way.

As with Facebook, there’s a lot of good analysis of the Clubhouse story too. For example, read this thread by Henk Van Ess.

The impact, too, is quite similar to Facebook’s case:

  • As already mentioned, large datasets of user data are problematic. They enable phishing and other social engineering attacks even if the data has already been available on individual user pages.
  • The data contains links to user profiles in other social networks. For some users, this means that their private Twitter and Instagram accounts became de-anonymized.
  • Clubhouse made retrieving data easy because user IDs are sequential integers. Thus, APIs can be used to retrieve information on user #1, #2, #3, and so on. This makes enumerating the whole user base child’s play.

Lessons learned are more or less the same or similar as with Facebook, with one extra:

  • Do not use sequential IDs, ever. Just a simple step of using GUIDs instead makes it harder for attackers. And do not allow attackers to iterate your user or resource records.

Analyst report: The State Of Application Security, 2021

Sandy Carielli and the team at Forrester have published their report “State Of Application Security, 2021”. If you do not have a Forrester subscription, you can find a pretty extensive summary of the report in Ayala Goldstein’s blog post.

Just a few key facts from the report:

  • Application vulnerabilities (mostly API-based) are the number one attack vector:
  • The researchers recommend addressing the problem by shifting security left. That means including security in the early stages of the software development lifecycle: design, development, testing.
  • DevSecOps practices and CI/CD pipelines can help with the shift: “prerelease testing products offering deep integrations with core development tools like Azure DevOps, GitHub, Jenkins, and Jira.

Other recommendations that the report makes:

  • Nurture communication between security and development teams, and embrace automated security testing tools throughout development.
  • Create and use tools that developers love. Ensure that security is woven into development workflows.
  • Provide developers with tools that give remediation guidance, automate work processes, and prioritize security issues.
  • Invest in updated application security tools that can be easily integrated into future application development plans and architecture.

Online training: Building an Identity Architecture for APIs

There’s a new free online course from Michał Trojanowski from Curity on Building an Identity Architecture for APIs.

The course covers various API integration patterns for identity systems:

  • Token flows
  • Proof-of-possession tokens
  • Scopes
  • Claims
  • Enforcement
  • Token sharing techniques
  • Entitlements

If this is up your alley, do check it out.


Get API Security news directly in your Inbox.

By clicking Subscribe you agree to our Data Policy