API Data: What is it and how is it saying it?

APIs are the blood flow of today’s applications — from online browser-based apps to mobile apps to sophisticated distributed enterprise applications connecting dozens of individually packaged containerized microservices with APIs.

The most common objective of APIs is to communicate data. Let’s say it’s a rainy day (expect delays) and someone requests local bus schedules on their phone’s map app. Their mobile phone then sends their GPS location to the map service while cloud services send weather information to the update boards. The two most popular APIs calls, across all applications, are GET requests (request to read data) and PUT requests(request to put data).

So let’s ask a simple question: What is data? When your inlaw is ranting for 20 minutes, is that data? It is data in a way, but this type of data is called unstructured data. When APIs communicate data, structured data formats are most commonly used. Structured data means every piece of data (datum) comes with a label or a description. It’s like filling out a DMV form:

First Name: Joe
Last Name: Doe
Email: joe.doe@aol.com

There is a standardized way to structure data, apply proper labels, and specify data types, called metadata. This standard is called JavaScript Object Notation or JSON and it is in itself a variant of Extendable Markup Language or XML.

Is JSON really a safer data encoding format? 
(We explore a JSON vulnerability.)

JSON — by far the most popular data encoding format for APIs — looks simple, readable and secure. But is JSON really more secure than other data encoding formats? On close inspection, JSON is a serialization format that allows users to (1) send objects as strings and then (2) it sends applications to recover objects from those strings. That makes the JSON format as dangerous as other serialization formats (XML, Ruby Marshal, PHP unserialize) in terms of application security. At Blackhat 2017, a lot of issues related to ASP.NET and Java JSON parsers were presented. (See Hewlett Packard Enterprise’s Friday the 13th JSON Attacks.)


Example of structured organization data in JSON format:

  "@context": "http://www.schema.org",
  "@type": "Organization",
  "name": "Wallarm",
  "url": "https://www.wallarm.com/",
  "logo": "https://assets.website-files.com/5fe3434623c64c793987363d/5fe35f4dbf861c75ba14e7d9_no-paddings-EN.svg",
  "image": "https://assets.website-files.com/5fe3434623c64c793987363d/60125dc6b09effe9f6100880_scheme-final-2.png",
  "description": "Wallarm Security Platform\n",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "415 Brannan St. 2nd FL",
    "addressLocality": "San Francisco",
    "addressRegion": "CA",
    "postalCode": "94107",
    "addressCountry": "United States"
    "contactPoint": {
    "@type": "ContactPoint",
    "telephone": "(415) 940-7077"

In this article, we explore another vulnerable JSON feature: Unicode string encoding format

The Unicode string encoding format is a feature designed to encode non-ASCII characters. But Unicode can also be used to encode any characters by their HEX code. This is commonly used for encoding languages using alphabets other than Latin. Chinese Kanji and Arabic letters, for example, are encoded with Unicode — obviously, this format is not uncommon in the least.

Here is what the actual Unicode string encoding might look like:

{“a”: “AAAA”}
is equal to
and also equal to
{“a”: “u0041u0041u0041u0041”}

Because Unicode converts data from a readable plain text to HEX, which is fully interpreted by the machines but not by human readers or by simple signature-based pattern detectors, this field is a hotbed for possible hacker attacks. Hiding in plain view — and bypassing most WAF protections!

Wallarm is only able to detect that attackers use this to bypass WAFs without a proper JSON parser onboard because Wallarm’s approach does not rely on patterns or signature.

(In case you missed it: relying on patterns or signatures to detect vulnerabilities can leave you open to attack.)

Testing Unicode JSON for vulnerabilities to protect RESTful API

To test what attacks can and cannot be detected, we have created a simple script enc.php that converts human readable text into Unicode. We have provided this script below. You can use it for yourself to encode data this way and test your current API protection solution, such as WAF:

$d = $argv[1];
$x = bin2hex($d);
for($i=0; $i+1<strlen($x); $i+=2){
echo “u00”;
echo $x[$i].$x[$i+1];
echo “n”;

To illustrate, let’s use an obvious attack which tries to steal users’ passwords from the system:

‘ union select password from users — a-

If a hacker tried this attack in a plain text, even a simple open-source WAF like mod-security should be able to detect it.

Using this script provided, let’s now encode this payload.

Try to send it to your protected apps, like this:

$ php enc.php ‘“ union select password from users — a-’
$ curl -d ‘{“test”:”u0022u0020u0075u006eu0069u006fu006eu0020u0073u0065u006cu0065u0063u0074u0020u0070u0061u0073u0073u0077u006fu0072u0064u0020u0066u0072u006fu006du0020u0075u0073u0065u0072u0073u002du002du0061u002d
”}’ -H ‘Content-Type: application/json’ -X PUT 


If you are using a signature-based WAF, this attack has most likely gone undetected.

We were able to discover this kind of attack because Wallarm implements a full JSON parser that can parse data even if it’s encoded under Unicode JSON to protect any RESTful API.