Base64 explained: why encoded files are 33% bigger

Open any JWT, any email attachment, any data: URL in a stylesheet, and you'll find the same alphabet soup: TWFuIGlzIGRpc3Rpbmd1aXNoZWQ... That's Base64, and it exists to solve one specific problem: a lot of the internet's plumbing was built to carry text, and binary data doesn't survive the trip.

Email is the classic case. The original mail protocols assumed 7-bit ASCII; arbitrary bytes could be mangled, stripped, or misinterpreted as control characters along the way. The fix — standardized for email in MIME and later specified on its own in RFC 4648 — was to re-spell binary data using only characters that every system agrees on. Base64 is that re-spelling.

What Base64 actually is.

Base64 maps binary data onto an alphabet of 64 characters that survive almost any text channel: the 26 uppercase letters, the 26 lowercase letters, the 10 digits, and two symbols — + and / in the standard flavor. A 65th character, =, is used only as end padding.

Why exactly 64? Because 64 is 2⁶ — the largest power of two you can comfortably cover with characters that are printable, unambiguous, and present in every common character set. Being a power of two is the whole trick: it means each output character carries exactly 6 bits of the original data, so the conversion is nothing more than regrouping bits.

That's worth repeating, because it's the core of the whole scheme: Base64 doesn't transform, compress, scramble, or protect your data. It takes the same bits and writes them down in bigger handwriting.

The encoding, bit by bit.

The algorithm processes input three bytes at a time. Three bytes are 24 bits; 24 bits split evenly into four 6-bit groups; each 6-bit group (a value from 0 to 63) picks one character from the alphabet. The classic example from the RFC is the word Man:

M        a        n
01001101 01100001 01101110     three 8-bit bytes (24 bits)

010011 010110 000101 101110    the same 24 bits, regrouped as four 6-bit values

  19     22      5     46      as numbers 0–63

   T      W      F      u      as Base64 alphabet characters

So Man encodes to TWFu. Nothing was added and nothing was hidden — the bits were regrouped from eights into sixes and each six looked up in a table. Decoding runs the same table in reverse.

When the input length isn't a multiple of three, the last group comes up short, and that's where = padding appears. Ma (16 bits) becomes TWE=; M alone (8 bits) becomes TQ==. The padding characters carry no data — they just declare "the final group was 2 bytes" or "the final group was 1 byte" so a decoder knows exactly where the real bits end.

Quick intuition: one padding = means the original data was one byte short of a full triple; two mean it was two bytes short. A Base64 string's length is always a multiple of four (in the padded, standard flavor).

Why output grows by a third.

Here's the arithmetic that surprises people the first time they inline an image. Each output character carries 6 bits of payload but costs 8 bits to store or send — it's still an ASCII character. That ratio, 8/6, means Base64 output is exactly 4/3 the size of the input: every 3 bytes in, 4 bytes out, a fixed +33% before padding. MIME email adds a little more by inserting a line break every 76 characters.

Can compression claw that back? Sometimes. If the underlying data is compressible (text, JSON, XML), gzip or brotli on the encoded stream recovers much of the overhead. But the most common things people Base64 — images, PDFs, fonts — are already compressed, and encoding them smears each byte's bits across character boundaries in a way that general-purpose compressors can't fully undo. Inlining a JPEG into HTML costs you real transfer size even on a gzipped connection.

Base64 is not encryption.

This one earns its own section because the confusion causes real security bugs. Base64 output looks scrambled, so credentials, tokens, and API keys routinely get "protected" with it. They aren't. There is no key, no secret, no hard math — decoding is a table lookup that any browser console does in one line.

Two places this bites in practice:

HTTP Basic authentication sends username:password Base64-encoded in a header. That's an encoding for transport, not protection — which is exactly why Basic auth is only acceptable over HTTPS.

JWTs are three Base64url-encoded segments. The signature at the end proves the token wasn't modified; it does nothing to hide the payload. Anyone holding a JWT can read every claim inside it. If you're curious what's in yours, JWT decoded walks through the three parts field by field.

The rule: if the requirement is "nobody should read this," you need encryption. If the requirement is "this needs to survive a text-only channel," you need Base64. They answer different questions, and one never substitutes for the other.

base64url and friends.

Standard Base64 has an awkward property: two of its characters mean something in other contexts. + is a space in URL query strings, and / is a path separator in URLs and filenames. Put a standard-Base64 token in a URL and it may arrive corrupted — or break routing outright.

RFC 4648 therefore also defines base64url, which swaps the two troublemakers for - (minus) and _ (underscore) and typically drops the = padding, since = is also special in query strings. Same 6-bits-per-character scheme, URL-safe alphabet. It's what JWTs use, which is why pasting a JWT segment into a strict standard-Base64 decoder sometimes fails.

Variant	Chars 62 + 63	Padding	Where you'll meet it
Standard (RFC 4648 §4)	`+` `/`	`=` required	Data URIs, HTTP Basic auth, most APIs
base64url (RFC 4648 §5)	`-` `_`	usually omitted	JWTs, URL tokens, filenames
MIME (RFC 2045)	`+` `/`	`=` required	Email attachments; line break every 76 chars

When a decoder rejects input that "looks like Base64," the culprit is almost always a variant mismatch — url-safe characters fed to a standard decoder, missing padding fed to a strict one, or stray line breaks from an email pipeline.

Data URIs: when inlining helps and hurts.

The most visible use of Base64 in web work is the data URI: data:image/png;base64,iVBORw0KGgo... pasted straight into an src attribute or a CSS url(). The whole asset rides along inside the document — no second HTTP request, no separate file to deploy.

Where it helps: tiny, single-use assets. A small icon, a 1×2-pixel gradient, an SVG cursor, a placeholder image in an email where external requests are blocked or ugly. One request instead of two, and the asset can never 404 independently of the page.

Where it hurts: anything big or shared. The asset gets the +33% size tax, can't be cached separately from the page (every page that inlines it re-downloads it), can't be lazy-loaded, and blocks the parser with a wall of text. A 200 KB photo as a data URI is a 267 KB payload glued into your HTML forever. As a rule of thumb, inline the trivially small, link the rest — and if you're deciding whether an image is small enough, compressing it first with a tool like the Image Smusher changes the answer more than the encoding does.

Takeaways.

The thing to remember: Base64 re-spells bytes using 64 text-safe characters, 6 bits per character, which is why output is exactly 4/3 the size of input. It's a transport encoding — not compression, not encryption. Use base64url when the string travels in a URL, and think twice before inlining anything larger than an icon.

Base64 is one of those formats that seems mysterious right up until you regroup 24 bits into four sixes once, by hand — and then it's obvious forever. It solves a genuinely annoying problem (binary data in text-shaped pipes) with the least machinery possible, and asks a predictable, honest price for it.

Encode and decode Base64 in your browser.

Base64 Everything encodes and decodes text and files — standard and URL-safe flavors — with a live preview. Everything runs in your browser; nothing is uploaded to a server, which matters when the thing you're decoding is a token or a credential.

Open Base64 Everything

Made with love by a very serious person pretending not to be. Tooly McToolface is a workshop of free, client-side web tools. If you enjoy knowing what's actually inside the strings you paste around, JWT decoded opens up the most famous Base64url payload of all, and magic numbers does the same trick for a file's opening bytes.

What Base64 actually is.

The encoding, bit by bit.

Why output grows by a third.

Base64 is not encryption.

base64url and friends.

Data URIs: when inlining helps and hurts.

Takeaways.

Encode and decode Base64 in your browser.

More from the workshop.

JWT decoded.

Base64 Everything.