← Blog2026-05-27Developer

Percent-encoding: reserved characters and the double-encoding bug

How URL percent-encoding works, why space is %20 in a path but + in a form body, encodeURIComponent vs encodeURI, and how the double-encoding bug produces %2520.

A URL is allowed to contain only a limited set of ASCII characters. Anything outside that set — a space, a non-Latin letter, or one of the characters that the URL syntax reserves for structure — has to be represented as percent-encoding: a % followed by two hexadecimal digits naming a byte. The byte sequence comes from encoding the character as UTF-8 first, then writing each byte as %XX. So a space (byte 0x20) becomes %20, and é (UTF-8 bytes 0xC3 0xA9) becomes %C3%A9. This post covers which characters need encoding, the %20 vs + ambiguity that breaks form handling, the JavaScript functions that get misused, and the double-encoding bug that shows up when encoding happens in more than one layer.

Why URLs need it

RFC 3986 defines the grammar for URIs, and that grammar permits a small, fixed character repertoire. Characters fall into three groups:

Unreserved — always safe, never need encoding: A-Z a-z 0-9 - . _ ~
Reserved — legal in a URL but carry structural meaning, so they must be encoded when they appear inside a value rather than as delimiters.
Everything else — spaces, control characters, and all non-ASCII — which has no place in the raw URL and must be encoded.

The reason reserved characters are the tricky group is that the same byte can be either a delimiter or data depending on position. A / between path segments is structure; a / inside a single segment's value is data and has to become %2F.

Reserved characters

These are the characters that change a URL's meaning if left raw inside a value. Encode them when they are part of the data:

Char	Encoded	Structural role
space	`%20`	not legal raw; ends the URL in many parsers
`?`	`%3F`	starts the query string
`#`	`%23`	starts the fragment
`&`	`%26`	separates query parameters
`=`	`%3D`	separates key from value
`/`	`%2F`	separates path segments
`+`	`%2B`	means space in form-encoded data
`%`	`%25`	introduces a percent-escape

The + and % rows are the ones that cause silent corruption. A literal + in a query value that you forget to encode will be read back as a space by anything doing form decoding. A literal % that isn't encoded to %25 either errors out or, worse, gets interpreted as the start of an escape that was never intended.

The %20 vs + ambiguity

This is the single most common source of encoding bugs, and it comes from two different specifications disagreeing about how to represent a space.

In a generic URI — the path, and the query string under RFC 3986 — a space is %20. Full stop.
In application/x-www-form-urlencoded data — the body of an HTML form POST, and by long convention the query string when it carries form fields — a space is +. A literal + in that context is %2B.

So q=hello world can correctly appear as either:

?q=hello%20world      generic URI rules
?q=hello+world        form-encoded rules

Both are valid; what matters is that the encoder and the decoder agree. The bug appears when one side encodes spaces as + (form rules) and the other decodes with generic URI rules, leaving literal + characters in the data — or when a value containing a real + (a phone number +1 555..., a search for c++) is decoded by something that turns + into a space.

Practical rule: if you control both ends and are building a query string by hand, prefer %20 and encode literal + as %2B. If you are submitting a real form body, the browser will use + and your server framework expects it.

encodeURIComponent vs encodeURI in JavaScript

JavaScript ships two encoders, and they exist for different jobs.

encodeURI(url) is for encoding a whole URL that is already structurally complete. It leaves the reserved structural characters intact — : / ? # [ ] @ & = + $ , and a few others — because those are doing their job as delimiters.
encodeURIComponent(value) is for encoding a single piece of data that will be dropped into a URL — one path segment, one query value. It encodes the reserved structural characters too, because in a value they are data, not structure.

encodeURI("https://x.com/a b?q=c/d&e=f")
// "https://x.com/a%20b?q=c/d&e=f"   (slashes, ?, & left intact)

encodeURIComponent("c/d&e=f")
// "c%2Fd%26e%3Df"                   (everything escaped)

Use encodeURIComponent for every individual query value and path segment. Use encodeURI only when you have a complete URL string and just want to escape stray spaces and non-ASCII without touching its structure — a narrower need than most people assume.

One gotcha: encodeURIComponent does not encode +, because + is an unreserved-looking character that it leaves alone. That is fine under generic URI rules, but if your server decodes the query string with form rules, an unencoded + becomes a space. When targeting a form-decoding endpoint, post-process:

encodeURIComponent("a+b").replace(/%20/g, "+")  // form-style
encodeURIComponent("a+b").replace(/\+/g, "%2B")  // protect literal +

Pick one convention deliberately rather than relying on the default.

Encode the path and the query differently

The path and the query string have different reserved sets, so encode them separately rather than running one function over the whole string. In a path segment, / is a delimiter and must be %2F when it is part of a single segment's value; + and = are ordinary data. In the query, & and = are delimiters and must be encoded inside values; / is usually allowed raw.

The safe approach is to build the URL from already-encoded parts: run encodeURIComponent on each path segment and each query value individually, then join them with the raw delimiters you control. Never encode the assembled string a second time — which is exactly where the next bug comes from. If what you actually need is a URL-safe identifier for a path (no % escapes at all), normalize to a slug instead with our URL slug generator.

The double-encoding bug

Because % itself encodes to %25, running an encoder over an already-encoded string mangles it. hello world encodes once to hello%20world. Encode that result again and the % in %20 becomes %25, producing hello%2520world. Decode that string once and you get the literal text hello%20world instead of hello world.

The signature to look for is %25 followed by what should have been a single escape:

hello world      original
hello%20world    encoded once  (correct)
hello%2520world  encoded twice (bug — %20 became %2520)
hello%252520...  encoded three times

It happens whenever encoding runs at more than one layer and nobody tracks how many times a string has been touched: a frontend encodes a query value, an API gateway or reverse proxy re-encodes the forwarded URL, and a backend framework encodes it a third time before storing or redirecting. Each layer is individually "correct"; the stack is wrong.

To detect it, decode a suspicious value once and check whether the result still contains %XX escapes. If a single decode leaves visible %20 or %2F in your data, it was encoded at least twice. The fix is architectural, not a second replace: encode exactly once, at the boundary where raw data becomes a URL, and treat the value as opaque everywhere after that. Strip a layer rather than adding one — never "fix" double-encoding by decoding twice, because a value that legitimately contains %25 will be corrupted by the extra pass.

This is the same discipline that trips people up with other transport encodings; the failure mode where a layer mistakes encoded data for plaintext also drives the confusion in Base64 is not encryption.

Summary

Percent-encoding maps unsafe characters to % plus the hex of their UTF-8 bytes. Encode reserved characters whenever they appear inside a value, remember that space is %20 under generic URI rules but + under form rules, reach for encodeURIComponent on individual components and encodeURI only on complete URLs, and encode each value exactly once to avoid the %2520 family of bugs.

To encode or decode a value and see exactly which bytes change — and to catch double-encoding by decoding a layer at a time — use our URL encoder/decoder.