If you have been paying attention to the recent information security news, then you know web application technologies, and WAFs more specifically, have been a culprit in several of the incidents. WAF-based attacks are frightening because WAF tech is supposed to protect. As APIs, web applications, and microservices become more central to our lives, how can we trust WAFs to protect us against Layer-7 cyberattacks? Clearly, we need WAFs. Just like a good incident response follow up, this article aims to explore recent breaches to shed light on the dangers of various WAFs, what comes next, and where our security tools are heading in the coming decade.

Recent WAF-based breaches with CapitalOne, Imperva, and Cloudflare offer essential lessons we can learn from on how WAF technology is failing us and what can we do to improve our security health.

What is a WAF really?

One of the top security tools used by companies, a web application firewall (WAF) helps protect web applications by filtering and monitoring issue TB traffic between a web application and the Internet. It typically protects web apps from attacks such as cross-site forgery, cross-site scripting, XXS, file inclusion, SQL injections, and many others. WAF is a protocol layer-7 in the OSI model. But, it's not designed to defend against all type of cyberattacks.

A strong WAF compliments other tools in a security toolbox, but should not be mistaken for a complete security solution. It eliminates many cyberattacks—those it is designed to handle. By deploying the WAF in front of the application, the WAF acts as a shield placed between the web application and the Internet. A proxy server protects the client machine's identity by using an intermediary WAF. Meanwhile, the WAF acts as a type of reverse proxy that protects the server from exposure by having clients pass through the WAF before reaching the server.

Modern cybersecurity attacks and WAFs

Internet-based cyberattacks have become increasingly sophisticated over the last twenty years. Just like any technology, WAFs have evolved.

Original WAFs could only protect us from what we knew was coming. The first generation of web application firewalls used a signature-based approach. These were oriented to protect from things like server-side RCE, path traversal, SQL injection. They essentially ran as a signature-based and blacklist-based technology.

WAF protections

The next evolution of WAFs remains where many enterprises still are—where the technology stack, including Ajax, and explosive growth has enabled this next generation of WAFs. At the same time, the number of critical web applications has exploded, driving the deployment of what we’ll call WAF 2.0.

With WAF 2.0, dynamic profiling and a sort of optimized signature list were introduced. However, the processes have still remained time consuming because you need human administration and oversight. This has meant having the resources dedicated to the WAF or risking low implementation or high misuse.

As development cycles have accelerated and applications have exponentially increased, the need for WAFs to function without heavy manual tuning and admin is essential to providing robust security. This leads us to what we can think of as WAF 3.0—the next necessary evolution of the WAF.

Hackers have evolved to match security technology. They are increasingly sophisticated, opportunistic, and very good with evasion. Hackers quickly figured out that people are deploying WAFs, so they started to create evasion techniques to evade signature-based analysis. Hackers shifted their attention to zero days.

Fully automated security methods have traditionally come up short due to the absence of clean traffic as a source material for machine learning. So, what WAF 3.0 has to focus on is the business logic that is used with mixed traffic. This eliminates the problem of requiring clean traffic for machine learning and also protects against zero days by preventing bypass.

Machine learning plays an important part in the rapidly changing cybersecurity ecosystem. The more you use it, the better it gets because it evolves and learns from its own mistakes at nearly the speed of light. It allows us to mature into a near zero false positive rate. Machine learning and AI are the only ways to really stay afloat in the growing swell of big data.

Cloud-based problems with traditional WAFs

Traditional WAFs have been proven to be inadequate, specifically for cloud native-technology. In the common case where applications move to the cloud or become containerized data, application security and API security become even more important than any other areas of security.

One of the biggest challenges heard from enterprise CISOs is lies in the human factor.  Traditional WAFs use up almost about forty-five hours per week just to process WAF fillers. Sometimes this time suck deepens when additional hours are required to write new rules. When application security teams have write these rules, it a tedious job. No one wants to do it.

Thankfully, new technology and research are cleaning up these problems.

Learning from WAF-based attacks

The Imperva breach: Who watches the watchers?

Detected on August 20th, the Imperva breach revealed that any Imperva customer using their cloud WAF product back to September 15th of 2017 were affected.

“This was the worst possible nightmare that a security as a service company like Imperva could have experienced.”

Alissa Knight, Sr. Cybersecurity Analyst, Aite Group

Imperva has a cloud WAF product that offers a web application firewall as a service. Being a customer of that service, you would upload your SSL certificates to Imperva and you would forward your web traffic through Imperva’s cloud WAF product. In order for that to happen, the SSL termination, you need to upload your actual certificates to them and also your API keys.

In the breach, attackers hacked Imperva and gained access to the encapsulated database where the customer’s extremely sensitive information was being stored. There were breached e-mail addresses, hash and salt passwords, custom API keys, and SSL certificates that were all breached. A cybersecurity offering security as a service lost customer trust in one foul breach.

When you're trusting a cybersecurity company to hold the keys to your kingdom, safeguarding your sensitive data, make sure they're practicing what they preach. Unfortunately, a lot of cybersecurity companies don't do that. Imperva didn't disclose many details around the breach, like how it happened. So, we can only imagine how many customers were affected who were trusting Imperva to provide their web service.

The very technologies we trust can scarily become a culprit or source of a data breach. But, customers are not helpless. They still need data protection.

Fortunately, it's really easy for customers to change providers and even change those keys. It was very difficult to unseat a managed security service provider (MSSP) fifteen years ago. They had their equipment on premises, making it costly to replace. Now, given virtualization and microservices within the cloud, it's very easy to switch MSSPs if you have a WAF as a service.

The most important lesson we can take from the Imperva breach is not to feel hostage to a MSSP. Are they tasting their own food or serving up your data to potential hackers, treating you as a poison tester? Before trusting their security as a service, ask security vendors tough questions. Ask if they are taking the right measures, like doing their own penetration testing. Are they employing either static and dynamic code analysis? Do you have any certifications are you doing SOC to Type 2 audits? Have a security expert do a thorough interrogation of a potential MSSP. Just because they are a cybersecurity company, doesn't necessarily mean that they're doing what they should be doing when it comes to the security of your data.

The most important lesson we can take from the Imperva breach is not to feel hostage to a MSSP. Are they tasting their own food or serving up your data to potential hackers, treating you as a poison tester?

The danger is that a lot of the enterprises want to trust an MSSP and assume they are applying their own security because “security” is right there in the title. They are assuming a vigilant security expert when they engage with any third party doing the security tooling.

Cloudflare’s 27 Minutes

Approximately 67% of all internet traffic goes through Cloudflare. On July 2nd of 2019, Cloudflare experienced an outage for about twenty-seven minutes.

If you’re doing regular expressions—for whatever that means to you, like a filter, app, checking emails, or a user’s emails—you will probably have to content with a regular expression denial of service attack. You can put sophisticated regx somewhere and run this regx against a really huge amount of data or long string.

First, we need to explain how regular expressions work. Whenever you build a regular expression engine, that engine should parse this regular expression and apply it to some kind of string. Sometimes, it's impossible to make sure a string matches that regular expression’s expression without running a reading operation from time to time, starting at the early stages. It means that if you put a very sophisticated filter based on regular expression somewhere and then put some data there, it probably going be required by this engine to run this filter against your string multiple times, not only once.

Sometimes it's possible to run the read operation only one time, but in a lot of cases it is not. The quantity of reruns can be quite huge. If that occurs, we can identify that this is a regular expression denial of service (DoS) attack wherein your regular expression engine will use all the CPU and potentially a lot of memory. That's exactly what happened with the Cloudflare.

Cloudflare blamed bad software deployment for the outage. They actually put on the production some sort of signature base on regular expression that actually was able to match pretty much every single string but with a lot of reruns on that string. That regular expression was broken by some kind of mistake because they never tested for these issues. So they used all the CPUs they have and the load balancers to just serve that regular expression. As a result, they were unable to even log in, even by a Secure Shell (SSH), to fix the problem. This was because using SSH on the same machine would require you to use the same CPU, which in this case was totally used up by the WAF. That's what we need to understand about the technical reasons underlying the Cloudflare attack.

When we want to run any kind of service, as businesses, we need to learn how to manage that service. The Cloudflare outage could have been avoided if the CPU had been organized differently.

“I believe that nobody, including myself, can understand what kind of exact character should be blocked because regular expressions are often very sophisticated. To manage them you need to actually spend a lot of time. That's why mistakes occur inside regular expressions, like regex ReDoS.”

Ivan Novikov, CEO, Wallarm

This sort of problem with Regular Expression Denial of Service (ReDoS) attacks is not unique to Cloudflare. For example, the same thing occurs for Apache ModSecurity or other WAFs. Frighteningly, it happens all the time and this incident is just another of the same story.

The takeaway from the Cloudflare incident is that we must understand that regex, as an engine, is not as good protection for an IP or IDS. It is simply not as good as more modern solutions for large scale and highly skilled productions.

Unfortunately, these regex engines are too often relied on by SOC analysts. Regex is commonly the only thing used to pattern match whatever attack pattern is coming and then potentially deploy that. That is really tantamount to creating like a single point of failure. So, we ought to ask where we go from here. Because clearly regex is not really working out.

Erratic hacker: The threat inside

Companies tend to hoard the details of breaches, for obvious reasons. But, when a hacker goes online and brags about their attack, it’s not long before it’s front page news. That’s what happened when a former Amazon employee started her day on July 29th of 2019, boasting she had hacked 106M CapitalOne customers’ data.

Paige Thompson, a hacker who goes by handle “erratic”, was an Amazon employee who gained access to CapitalOne’s database using WAF rules and roles. What is incredible is how easily she was able to elevate privileges and exfiltrate data to her local server. The mass media is attributing the attack to Server Side Request Forgery (SSRF) as one of the reasons why she was able to exploit. What we have learned, though, is that “erratic” potentially gained access though poor firewall configuration. In other words, the WAF was the potential culprit that allowed Paige Thompson access to the very customer databases the WAF was supposed to protect. Configuration never went so sideways.

Erratic seems to have accessed an obvious credential via a WAF configuration role. She then elevated some privileges, copied some tokens, and essentially was able to gain access to over 703 buckets.

Sometimes we falsely assume that having an established cloud where all your data is requires so much responsibility on the part of the cloud provider that they must be secure. In truth, even a cloud service provider (CSP) or an incredibly large enterprise is susceptible to these sorts of attacks and threats. We call it insider threat. However, even considering the insider threat profile, there are other things that went wrong associated with web application firewall rule.

Questions remain. How did the WAF rules allow Paige Thompson to elevate that many privileges and why was the exploration so easy?

The lessons learned from breaches

A quick recap on the main takeaways from the preceding three attacks:

  1. WATCH THE WATCHERS. Thoroughly vet and ask hard questions of MSSPs
  2. BE CAREFUL WITH CONFIGURATION. WAFs are designed to protect you, but mishandled, they can backfire.
  3. REJECT REGEX. Putting regex in a critical security tooling mechanism is really not a good idea.

Lessons learned: machine learning to the rescue

Now, we have technologies that employ machine learning and artificial intelligence to do things like learn from your existing traffic, set baselines, and detect abnormalities. We have more targeted protection, like API-level monitors. We have cloud-native solutions. And, we have automation to help your testing and protection run at the same rate as your DevOps lifecycles.

Machine learning is especially promising. We are sort of working towards this machine learning-based model that could help us get away from regex functionality.

Real protection will rely on applying a behavior-based model. Applying statistical approaches, including machine learning and other tools, will be increasingly necessary for protecting users against attacks.

Machine learning is not a magic formula. It has to learn the business logic of your app to understand things like how many APIs do we have, what kind of data are we dealing with, and what kind of parameters should have a mandatory ID and which are optional. You can use machine learning to determine a lot of these rules and complete tasks.

New tech tackles building around a user behavior base, to help us to identify normal behavior. We need to identify users and abnormal behaviors in addition to anomalies.

Cybersecurity as a service.

Moving into this next decade, we are seeing more and more things as a service. We're seeing managed detection and response now where outside service providers like MSSPs are handling Internet responses and response actions. Organizations straining to keep up, especially given the global talent shortage in cybersecurity. They are increasingly relying on outside companies to provide their security as a service.

Whether it's Imperva’s cloud WAF products or any other service, it is imperative that in relying on external service providers we do forfeit our own role in security. Even with a cybersecurity company as a service, there is zero guarantee that they're more secure than you. Watch the watchers. Ask your vendors what steps are they taking to secure their data. Ask for copies of penetration test reports that can be protected under an NDA. You can request their most recent SOC report. These are all things that you need to be asking.

We've had numerous cybersecurity companies breached. Hackers are opportunists, not terribly unlike people who walk through a parking lot checking doors to see if they’re unlocked. Cyber-attackers don't differentiate where the windows and doors are open. They try to get a foothold and elevate permissions and do all this fun stuff that they love to do to cause havoc. Even with strong protections, there could be an insider threat.

When talking about security with a provider, go beyond what kind of a compliance documentation they have and into deep detail into things like configuration.

Third parties are just all over. We just can't depend on these traditional technologies. We need to evolve. We need to let go of the legacy solutions that require manual tuning. We need to embrace newer technology, move to advanced WAFs that use machine learning-based models that allow them to process and learn from tons of ton of data.

Hackers are opportunists, not terribly unlike people who walk through a parking lot checking doors to see if they’re unlocked.

No one can afford to spend 40 to 80 plus hours a month on manual tuning and heavy administration. It’s not even as effective as it is costly. Evolving technology will be about saving time and using human resources more effectively. Artificial intelligence is an assistant that sops up all the tedious stuff.

Advanced firewalls will allow you to:

  • scale horizontally;
  • apply policies that are centrally managed;
  • REST API for full data access and integration;
  • use a central dashboard; and
  • allow you to protect against cyberattacks and layer seven cyberattacks.

With this rise of integration third party applications, this is exactly where we need to go. We need to move towards these advanced solutions, like cloud-native WAFs. The problem is only going to continue, integrating more APIs and microservices at deeper layers, diversifying technology sets, and our business ecosystems evolving to be even richer, more complex, and teeming with huge volumes of vibrantly moving data. Security will continue to be in the hands of outside parties, which means your knowledge base needs to be stronger. Your security needs to be your responsibility, even when it's not in your house. Think of your data like a child you really love who needs to go to science camp. Ask all the right questions and be sure that your data is safe.

Your security needs to be your responsibility, even when it's not in your house. Think of your data like a child you really love who needs to go to science camp. Ask all the right questions and be sure that your data is safe.