API Security

API Abuse – Lessons from the Duolingo Data Scraping Attack

It’s been reported that 2.6 million user records sourced from the Duolingo app are for sale. The attacker apparently obtained them from an open API provided by the company. There’s a more technical explanation available here

While we talk a lot about the vulnerabilities in the OWASP API Top-10 and the exploits associated with those vulnerabilities, this incident provides a good reminder that not all vulnerabilities are flaws in code. In fact, this API was working as designed. The OWASP API Top 10 accounts for these kinds of attacks as API6:2023 Unrestricted Access to Business Flows.

If you’re interested in seeing the API in action, you can actually access it via a browser. Just go to this URL, replacing the example email with your own: https://www.duolingo.com/2017-06-30/users?email=example@example.com (assuming you have a Duolingo account). The response is in JSON, so it won’t produce a pretty web page for you, but you can see the information that’s publicly available via the API.

Duolingo's API Query JSON Response (Source: Black Owl Intelligence)

Is Scraped Data Dangerous?

The information shared via the API may seem relatively benign, but it’s important to consider how it might be combined with other data and used by an attacker. For example, if you have a list of email addresses that you’d like to phish, knowing some details about their Duolingo account could make a much more effective attack. Imagine receiving an email that appears to come from Duolingo and contains information about the languages you’re learning, whether you’ve logged in recently, how many ‘crowns’ or ‘xp’ you have. All of that accurate data serves as soft authentication to drive you to click a malicious link. 

How to Protect Your APIs from Data Scrapers

If we assume that there’s a valid business purpose for this particular API to be open, then we have to ask how Duolingo could detect and prevent attacks while still meeting the business requirements. A good place to start is by making sure you’re aware of the API endpoint and the sensitive data it might expose.

An API discovery tool should help here. It also might help to employ rate limiting, or even rate limiting based on user agents. It’s hard to say from the outside whether that kind of a control would work in their specific situation, but it’s a start. Of course, detecting API abuse is a key capability. It’s hard to pull 2.6 million records without being detected as query abuse or other behavioral flags. 

The Wallarm platform can help with situations like this. API Discovery will enumerate APIs and endpoints, including whether they expose sensitive data. The platform offers rate limiting, including rate limiting by user agent, and our API Abuse Prevention is designed to address automated attacks like content scraping.

Recent Posts

The CISO’s Dilemma: How To Scale AI Securely

Your board wants AI. Your developers are building with it. Your budget committee is asking…

2 days ago

Agent-to-Agent Attacks Are Coming: What API Security Teaches Us About Securing AI Systems

AI systems are no longer just isolated models responding to human prompts.  In modern production…

3 days ago

Everyone Knows About Broken Authorization – So Why Does It Still Work for Attackers?

Broken authorization is one of the most widely known API vulnerabilities.  It features in the…

2 weeks ago

From Shadow APIs to Shadow AI: How the API Threat Model Is Expanding Faster Than Most Defenses

The shadow technology problem is getting worse.  Over the past few years, organizations have scaled…

3 weeks ago

Inside Modern API Attacks: What We Learn from the 2026 API ThreatStats Report

API security has been a growing concern for years. However, while it was always seen…

4 weeks ago

CISO Spotlight: Craig Riddell on Curiosity, Translation, and Why API Security is the New Business Imperative

It’s an unusually cold winter morning in Houston, and Craig Riddell is settling into his…

1 month ago