OWASP A2 – Cross-Site Scripting (XSS) with PHP Part 1

We are finally starting a new OWASP Top 10 security risk today. The next few weeks (or possibly month) will cover XSS prevention techniques in PHP. This is probably a little more glamorous/sexy than authentication and session management. Controls to protect an application against XSS are primarily represented in the OWASP ASVS sections 5 and 6. Today, we start with ASVS section 5.

On to the code…

ASVS 5.1 Requirement:

Verify that the runtime environment is not susceptible to buffer overflows, or that security controls prevent buffer overflows.

ASVS 5.1 Solution:

This is more of a patch management issue than a coding problem in most cases (with PHP that is). Make sure your version of PHP is up to date. If using an older version, research potential vulnerabilities, perform a vulnerability scan, and make sure your code is not using any affected modules. However, I suggest using a newer and patched version. Validate that there are no vulnerabilities in your web server software, any frameworks that have been deployed, etc.

ASVS 5.2 Requirement:

Verify that a positive validation pattern is defined and applied to all input.

ASVS 5.2 Solution:

This requires a whitelist validation approach where only known good characters are accepted. Limiting the acceptable input to a specific set of characters reduces the risk of XSS, other forms of injection attacks (SQL injection, LDAP injection, etc), and several other classes of attack. It is also important to decode the input data before performing any validation. Decoding the data and performing canonicalization will help prevent filter bypass attacks that rely on encoding the data in a different character set. This can be implemented fairly easily with a PHP function like so:

  // Create a function that takes in the type of data
  // and the actual data itself.  The data type will 
  // be used to determine what kind of regex to use.
  function dataValidator($type, $data) {

    // Decode the data into a standard character
    // set before performing any checks
    $decodeData = html_entity_decode($data, ENT_QUOTES, 'UTF-8');

    // Declare an array that defines each data type 
    // name along with the regex.  In this example,
    // we are only defining three types to accept: 
    // number - only numbers in the data, letter - 
    // only alphabet characters in the data, and 
    // alphan - alphanumeric characters in the data.
    $typeArray = array(
      "number" => '/^[0-9]+$/',
      "letter" => '/^[a-zA-Z]+$/',
      "alphan" => '/^[a-zA-Z0-9]+$/'

    // Use the PHP preg_match function to determine 
    // if the data only contains those characters.
    // Select the regex type by passing the "type" 
    // submitted to the function in to the array.
    // If there is a match, then the data is good
    // and the if statement returns true, otherwise
    // return false.
    if (preg_match($typeArray[$type], $decodeData)) {
      return true;
    } else {
      return false;

The above function could be implemented in a file that is included in any page that validates user supplied data. The function could be called with dataValidator(“number”, $_POST[‘userdata’]) (as an example) and the return value evaluated to determine whether bad data was submitted.

Whitelist validation is ideal, but sometimes blacklist validation is required or preferable. You can use similar code for a blacklist, just modify the regex and return true if the regex doesn’t match return values:

  // In this example, the regex's have been modified to
  // include everything that SHOULD NOT be found in the 
  // data type.  So a number value shouldn't contain 
  // letters from the alphabet or special characters.
  function dataValidator($type, $data) {
    $decodeData = html_entity_decode($data, ENT_QUOTES, 'UTF-8');

    $typeArray = array(
      "number" => '/^[a-zA-Z!\*,\'\"\|\.\\/\?\@\$\%\:\+\\(\)\[\]\{\}]+$/',
      "letter" => '/^[0-9!\*,\'\"\|\.\\/\?\@\$\%\:\+\\(\)\[\]\{\}]+$/',
      "alphan" => '/^[!\*\'\"\|\\/\?\@\$\%\+\\(\)\[\]\{\}]+$/'

    // Here we return true if the regex DOESN'T match.
    if (!preg_match($typeArray[$type], $decodeData)) {
      return true;
    } else {
      return false;


ASVS 5.3 Requirement:

Verify that all input validation failures result in input rejection or input sanitization.

ASVS 5.3 Solution:

Implement a generic error message that is displayed if the return type from dataValidator() is false and reject the input. Alternatively, remove all non-matching characters. You can modify the code above to sanitize user supplied data like so:

  function dataValidator($type, $data) {
    $decodeData = html_entity_decode($data, ENT_QUOTES, 'UTF-8');

    $typeArray = array(
      "number" => '/^[0-9]+$/',
      "letter" => '/^[a-zA-Z]+$/',
      "alphan" => '/^[a-zA-Z0-9]+$/'

    // Implement a second array for sanitizing.  Anything that
    // doesn't match will be removed later in the code.
    $sanitizeArray = array(
      "number" => '/[^0-9]/',
      "letter" => '/[^a-zA-Z]/',
      "alphan" => '/[^a-zA-Z0-9]/'

    if (preg_match($typeArray[$type], $decodeData)) {
      // Return the input data because it passed the validation.
      return $decodeData;
    } else {
      // The validation failed, therefore some bad data was 
      // passed that needs to be removed.  Then return the
      // sanitized data.
      $sanitized = preg_replace($sanitizeArray[$type], '', $decodeData);
      return $sanitized;


ASVS 5.4 Requirement:

Verify that a character set, such as UTF-8, is specified for all sources of input.

ASVS 5.4 Solution:

This can be accomplished by forcing the data type. If an attacker can force a specific character set, then they might be able to bypass your data validation filters. An example of forcing UTF-8, and silently ignore/drop any characters that can’t be represented by that character set:

  // Supply the user supplied data to the forceUTF function.
  // This function will use the iconv PHP function to convert
  // the data to UTF-8, and drop any characters that can't
  // be converted into the charset.
  function forceUTF($data) {
    $utfEncoded = iconv("UTF-8", "UTF-8//IGNORE", $data);
    return $utfEncoded;


ASVS 5.5 Requirement:

Verify that all input validation is performed on the server side.

ASVS 5.5 Solution:

Do not perform input validation on the client side as this is trivial to bypass and gives an attacker clues as to how data might be validated on the server side.

I think this is a good place to leave off for the week. Please provide feedback if you have it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: