PHP Function Reference

PHP htmlspecialchars() Function



The PHP htmlspecialchars() function converts some predefined characters to HTML entities. The predefined characters are:

  • & (Ampersand) becomes &
  • " (Double quote) becomes ", unless ENT_NOQUOTES is set
  • ' (Single quote) becomes ' (for ENT_HTML401) or ' (for ENT_XML1, ENT_XHTML or ENT_HTML5), but only when ENT_QUOTES is set
  • < (Less than) becomes &lt;
  • > (Greater than) becomes &gt;
Note: This function translates only the above listed entity. For full entity translation, htmlentities() function can be used.

Syntax

htmlspecialchars(string, flags, encoding, double_encode)

Parameters

string Required. Specify the input string.
flags Optional. Specify how to handle quotes, invalid code unit sequences and the used document type. The available flags constants are:
  • ENT_COMPAT: Converts double-quotes and leave single-quotes alone.
  • ENT_QUOTES: Converts both double and single quotes.
  • ENT_NOQUOTES: Leaves both double and single quotes unconverted.
  • ENT_IGNORE: Silently discards invalid code unit sequences instead of returning an empty string. Using this flag is discouraged, as it may have security implications.
  • ENT_SUBSTITUTE: Replaces invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.
  • ENT_DISALLOWED: Replaces invalid code points for the given document type with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD;.
  • ENT_HTML401: Handle code as HTML 4.01.
  • ENT_XML1: Handle code as XML 1.
  • ENT_XHTML: Handle code as XHTML.
  • ENT_HTML5: Handle code as HTML 5.
The default is ENT_COMPAT | ENT_HTML401.
encoding Optional. A string that specifies which character-set to use. The following character sets are supported:
  • ISO-8859-1: (Aliases - ISO8859-1) - Western European, Latin-1.
  • ISO-8859-5: (Aliases - ISO8859-5) - Little used cyrillic charset (Latin/Cyrillic).
  • ISO-8859-15: (Aliases - ISO8859-15) - Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
  • UTF-8: ASCII compatible multi-byte 8-bit Unicode.
  • cp866: (Aliases - ibm866, 866) - DOS-specific Cyrillic charset.
  • cp1251: (Aliases - Windows-1251, win-1251, 1251) - Windows-specific Cyrillic charset.
  • cp1252: (Aliases - Windows-1252, 1252) - Windows specific charset for Western European.
  • KOI8-R: (Aliases - koi8-ru, koi8r) - Russian.
  • BIG5: (Aliases - 950) - Traditional Chinese, mainly used in Taiwan.
  • GB2312: (Aliases - 936) - Simplified Chinese, national standard character set.
  • BIG5-HKSCS: Big5 with Hong Kong extensions, Traditional Chinese.
  • Shift_JIS: (Aliases - SJIS, SJIS-win, cp932, 932) - Japanese
  • EUC-JP: (Aliases - EUCJP, eucJP-win) - Japanese
  • MacRoman: Charset that was used by Mac OS.
  • '': An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale, in this order. It is not recommended.
If omitted, encoding defaults to the value of the default_charset configuration option. "UTF-8" is the default value and its value is used as the default character encoding if the encoding parameter is omitted.
double_encode Optional. If set to false, PHP will not encode existing html entities. The default is true which converts everything.

Return Value

Returns the encoded string. If the input string contains an invalid code unit sequence within the given encoding an empty string is returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.

Example:

The example below shows the usage of htmlspecialchars() function.

<?php
$str = "<a href='test'>Test</a>";

//returns: &lt;a href=&#039;test&#039;&gt;Test&lt;/a&gt;
echo htmlspecialchars($str, ENT_QUOTES);
?>

The output of the above code will be:

&lt;a href=&#039;test&#039;&gt;Test&lt;/a&gt;

Note: In case of an ambiguous flags value, the following rules apply:
  • When neither of ENT_COMPAT, ENT_QUOTES, ENT_NOQUOTES is present, the default is ENT_NOQUOTES.
  • When more than one of ENT_COMPAT, ENT_QUOTES, ENT_NOQUOTES is present, ENT_QUOTES takes the highest precedence, followed by ENT_COMPAT.
  • When neither of ENT_HTML401, ENT_HTML5, ENT_XHTML, ENT_XML1 is present, the default is ENT_HTML401.
  • When more than one of ENT_HTML401, ENT_HTML5, ENT_XHTML, ENT_XML1 is present, ENT_HTML5 takes the highest precedence, followed by ENT_XHTML, ENT_XML1 and ENT_HTML401.
  • When more than one of ENT_DISALLOWED, ENT_IGNORE, ENT_SUBSTITUTE are present, ENT_IGNORE takes the highest precedence, followed by ENT_SUBSTITUTE.

❮ PHP String Reference