URL encoding: why spaces become %20 (and sometimes +).
A URL is a string with a grammar: ? starts the query, & separates parameters, / divides the path. The moment your data contains one of those characters, it has to be disguised — that's all percent-encoding is. Here's which characters need it, why a space has two spellings, the JavaScript function most people pick wrong, and the double-encoding bug that turns %20 into %2520.
Somewhere right now, a search for C&A is returning results for C, because the & went into the query string raw and the server read it as "end of parameter." URL encoding exists to prevent exactly that — and it's simpler than its reputation suggests, once you see that it's protecting a grammar, not hiding data.
Why URLs need encoding at all.
RFC 3986, the URI standard, splits characters into two camps.
Unreserved characters mean nothing special anywhere in a URL: the letters A–Z and a–z, the digits 0–9, and exactly four symbols — - . _ ~. These never need encoding.
Reserved characters are the URL's own punctuation: : / ? # [ ] @ and the sub-delimiters ! $ & ' ( ) * + , ; =. Each has a structural job in some part of a URL. When one of these appears as data — an ampersand inside a search term, a slash inside a file name — it must be encoded, or the parser will read it as structure.
Everything else — spaces, quotes, angle brackets, and every character beyond ASCII — simply isn't allowed in a URL raw and must always be encoded.
How percent-encoding works.
The mechanism is one rule: replace the character with its byte value, written as a percent sign and two hexadecimal digits. A space is byte 32, hex 20, so it becomes %20. An ampersand is byte 38, so %26. The percent sign itself, being the escape character, encodes as %25.
Characters outside ASCII take one extra step: the character is converted to its UTF-8 bytes first, and each byte gets its own escape. é is two bytes in UTF-8, so it becomes %C3%A9; a snowman ☃ is three, %E2%98%83. That's why encoded non-English text balloons in length — you're seeing the UTF-8 encoding spelled out one byte at a time.
50% off & more the value you want to send
50%25%20off%20%26%20more the value, percent-encoded
?q=50%25%20off%20%26%20more safe inside a query string:
the real & is hidden, the URL's
grammar characters are untouched
Decoding is the exact reverse — find each %HH, replace it with byte HH, then interpret the bytes as UTF-8. If you want to see this on a real URL, the URL Parser splits any URL into its parts and shows every query parameter decoded.
The space has two spellings.
Here's the part that generates the bug reports. In a URL proper, a space is %20 — full stop. But there is a second, older convention: when an HTML form submits with the default application/x-www-form-urlencoded format, spaces in form fields become +. That rule dates to the early web and applies to form-encoded data, which mostly lives in query strings and POST bodies.
So both of these can legitimately mean "hello world" in a query string:
?q=hello%20world
?q=hello+world (form-encoded convention)
The trouble starts when code applies the form rule in the wrong place. A + in a URL path is just a plus sign; decode it as a space and you've corrupted the data. The reverse bug is more common: a value that genuinely contains a plus — a phone number like +15551234567, a Base64 string — gets sent unencoded, and the server helpfully turns the + into a space. If a plus sign must survive a form-encoded query string, it needs to travel as %2B.
The rule: emit %20 for spaces and encode + as %2B when it's data. Accept + as a space only where form encoding applies — query strings and form bodies, never paths.
encodeURIComponent vs encodeURI.
JavaScript ships two encoding functions, and picking the wrong one is a classic. The names don't help, so here's the actual difference:
encodeURIComponent encodes everything that could possibly be structural — including &, =, ?, /, + and #. It's for encoding one piece of a URL: a single query-string value, a single path segment.
encodeURI leaves all reserved characters alone and only encodes what can never appear raw (spaces, non-ASCII, and so on). It's for cleaning up a string that is already a complete URL and whose structure you want to preserve.
The classic bug is using encodeURI on a query value. It leaves & untouched, so a value like Tom & Jerry silently splits into a parameter named q with value Tom and a mystery parameter named Jerry. The rule of thumb: if you're gluing values into a URL you're building, it's encodeURIComponent for each value, every time — or better, let URLSearchParams assemble the query string and do the escaping for you.
| Input | encodeURIComponent | encodeURI |
|---|---|---|
Tom & Jerry | Tom%20%26%20Jerry | Tom%20&%20Jerry |
a/b | a%2Fb | a/b |
50% | 50%25 | 50%25 |
café | caf%C3%A9 | caf%C3%A9 |
The double-encoding trap.
Percent-encoding is not idempotent: encode an already-encoded string and the % signs themselves get escaped. hello world → hello%20world → hello%2520world. Now a single decode yields the literal text hello%20world, percent signs and all, and users see %20 in their search box.
Double encoding happens when two layers of a system each think escaping is their job — a template encodes a value, then a URL-builder encodes the whole string again. The mirror-image bug, double decoding, is worse: it can be a security hole, because a value like %252e%252e%252f passes a path-traversal filter on the first decode and becomes ../ on the second.
The discipline: encode exactly once, at the last moment before the value enters the URL — and decode exactly once, at the boundary where the value leaves it. Everywhere in between, pass the raw value around, not the encoded one.
Takeaways.
The thing to remember: percent-encoding protects URL grammar from URL data. Unreserved characters (A–Z a–z 0–9 - . _ ~) never need it; everything else becomes %HH of its UTF-8 bytes. Spaces are %20 (the + spelling belongs to form encoding only), values get encodeURIComponent, and encoding happens exactly once.
URL encoding is a five-minute concept that causes five-hour debugging sessions, almost always because some layer encoded twice, decoded twice, or applied the form-data space rule to a path. Get the one-encode-one-decode discipline right and the whole topic goes quiet.
Encode, decode, and dissect URLs in your browser.
URL/Encode percent-encodes and decodes text (plus HTML entities and case conversions), and the URL Parser breaks a full URL into protocol, host, path, and decoded query parameters. Both run entirely in your browser — nothing you paste is sent anywhere.
Open URL/Encode