HTML Entity Encoder & Decoder

Encode reserved HTML characters to entities, or decode entities back. Optional escape for all non-ASCII.

Loading…

All processing runs in your browser — no files or inputs are uploaded to a server.

How to use

Toggle Encode or Decode at the top and paste your text. Encode replaces the five reserved HTML characters — `&`, `<`, `>`, `"`, `'` — with their entity forms (`&amp;`, `&lt;`, `&gt;`, `&quot;`, `&#39;`). The optional "Encode all non-ASCII" checkbox additionally rewrites every code point outside printable ASCII as a numeric entity (`&#xxxx;`), useful for plain-ASCII templating systems or legacy email pipelines. Decode runs the browser's own HTML parser, so it handles every named entity (`&copy;`, `&hellip;`), decimal numeric (`&#169;`), and hex numeric (`&#xA9;`) form correctly.

Reach for this when escaping or unescaping HTML by hand — pasting user-submitted content into a static page, debugging why a `<` shows literally instead of starting a tag, or inspecting a saved HTML snippet that was already escaped once. Modern frameworks like React, Vue, and Svelte auto-escape interpolated values, so most application code does not need manual encoding. The tool runs entirely in the browser; nothing is uploaded.

Examples

Encode the five reserved characters

Input
Tom & Jerry <script>alert("xss")</script>
Output
Tom &amp; Jerry &lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;

These five replacements are the minimum needed to put untrusted text inside HTML body or attribute context. Once encoded, the browser renders the literal characters instead of parsing them as markup — the script tag stays visible as text and never executes.

Decode mixed named and numeric entities

Input
&Aacute;rbol &copy; 2024 &mdash; &#x2728; &#x1F600;
Output
Árbol © 2024 — ✨ 😀

Named entities, decimal numerics (`&#9728;`), and hex numerics (`&#x2728;`) all decode in a single pass. Emoji code points above the BMP use surrogate pairs internally but appear as a single character after decoding.

Encode all non-ASCII for legacy templates

Input
안녕하세요, 世界! €100 💡
Output
&#50504;&#45397;&#54616;&#49464;&#50836;, &#19990;&#30028;! &#8364;100 &#128161;

When the downstream pipeline only supports 7-bit ASCII (some old SMTP relays, certain log shippers, ancient template engines), every non-ASCII character becomes a numeric entity. Modern UTF-8 systems do not need this — the result is bulkier and slightly harder to read.

FAQ

Why does the encoder use `&#39;` for apostrophe instead of `&apos;`?

`&apos;` is part of XML and HTML5 but not HTML 4. Older browsers (and some legacy parsers still embedded in scrapers and feed readers) display `&apos;` literally instead of decoding it. The numeric form `&#39;` works everywhere, so it is the safer default for content that may flow through unknown rendering paths.

Is HTML entity encoding enough to prevent XSS?

Only in the HTML body context. Each context (HTML body, HTML attribute, JavaScript string, CSS value, URL) needs its own encoding scheme — escaping `<` does nothing if the value lands inside an `onclick=` handler or a `<style>` block. The OWASP XSS Prevention Cheat Sheet lists seven rules covering the common contexts. For any input that crosses contexts, prefer a templating system that auto-escapes per slot (React, Liquid, Mustache) rather than encoding by hand.

HTML entities vs URL percent-encoding — when do I use which?

Different layers, different jobs. HTML entities (`&amp;`) escape characters that the HTML parser would treat as markup — used inside the document body or attribute values. Percent-encoding (`%26`) escapes characters that would break URL syntax — used inside `href`, `src`, or form-submitted query strings. A single `&` in a URL inside an `<a href>` attribute might need both at once: `https://x.com/?a=1&amp;b=2`, where `&amp;` keeps the HTML parser happy and the URL still has a literal `&` after decoding.

Why does decoding seem to handle bizarre entities I have never seen?

The decoder uses the browser's own HTML parser, which knows the full HTML5 named entity set — 2,231 names covering Greek letters, mathematical operators, dingbats, and even joke entities like `&Aogon;` (Ą with ogonek). Anything the browser would render on a normal page also decodes here. Numeric forms (`&#NNN;` and `&#xHH;`) cover every code point up to U+10FFFF.

Is `&nbsp;` the same as a regular space?

No — `&nbsp;` is U+00A0 (NO-BREAK SPACE) and prevents line breaks at that position. It renders the same width as a normal space in most fonts but counts as a different code point: `"a b".split(" ").length` is 2, while `"a\u00A0b".split(" ").length` is 1. Pasting `&nbsp;` into a YAML file, a CSV column, or a SQL query is a classic source of "invisible" parse errors.

Will decoding `&lt;script&gt;` produce a working script tag?

In a text editor or this tool, decoding gives you the literal string `<script>`. Inserting that string into a live page with `innerHTML` would create a script element but it would not execute — the HTML5 spec excludes `<script>` from late insertion. Use a DOMParser plus explicit `eval`-equivalent to get execution, which is exactly why the spec blocks it: easy decoding should not equal easy execution.

Related concepts

HTML entities are textual references that the parser replaces with characters before rendering. Three forms exist. **Named entities** (`&amp;`, `&copy;`, `&hellip;`) are mnemonic shortcuts; HTML5 defines 2,231 of them, including most common Greek letters, mathematical operators, and dingbats. **Decimal numeric entities** (`&#169;`) reference a Unicode code point in base 10. **Hex numeric entities** (`&#xA9;`) do the same in base 16, mandatory for code points above the named-entity set. The numeric forms cover all code points up to U+10FFFF, so any character a browser can render is expressible.

Five characters carry special meaning in HTML syntax and need escaping when they appear as literal text: `&` (starts an entity), `<` (starts a tag), `>` (closes a tag), `"` and `'` (delimit attribute values). The first three are reserved in any HTML context; quote characters are only ambiguous inside attribute values, but escaping them is a safe default. Trying to render a Markdown comparison table with `<` and `>` in code samples without escaping fails for exactly this reason.

The broader topic is **contextual output encoding** — a security pattern, not a syntax detail. Each output context demands its own encoding: HTML body uses entities, HTML attributes use the same entities plus quoting, JavaScript strings use backslash escapes (`\x3C` for `<`), URL components use percent-encoding, CSS values use backslash hex (`\3C`). OWASP's XSS Prevention Cheat Sheet enumerates the rules. The practical guidance: choose a templating system that escapes per slot (React JSX, Liquid `{{ }}`, Go `html/template`) so the developer never decides which encoder to call.

Related tools