Base64 is one of those things every developer copies, pastes, and uses for years before ever stopping to ask what it actually does. It is a way to represent arbitrary binary data using only printable ASCII characters, and understanding it properly saves you from a surprising number of bugs.
What Base64 Actually Is
Base64 is an encoding, not a compression or an encryption scheme. It takes raw bytes and re-expresses them using a 64-character alphabet that is safe to put almost anywhere text is allowed:
A-Za-z0-9+/=The core idea is a clean numeric trick. A byte is 8 bits, and 64 is 2 to the 6th power, so each Base64 character carries exactly 6 bits. The least common multiple of 8 and 6 is 24, which means three input bytes (24 bits) map perfectly onto four output characters (also 24 bits). Base64 always works in these 3-byte to 4-character groups.
Here is the transformation for the three bytes that spell
MantextText: M a n ASCII: 77 97 110 Binary: 01001101 01100001 01101110 Regroup: 010011 010110 000101 101110 Decimal: 19 22 5 46 Base64: T W F u
So
ManTWFuWhat the padding is for
Input rarely divides evenly into groups of three. When the last group has one or two leftover bytes, the encoder zero-pads the bits to fill the next 6-bit chunk and appends
=- 3 input bytes encode to 4 characters, no padding
- 2 input bytes encode to 3 characters plus one text
= - 1 input byte encodes to 2 characters plus two text
==
That is why you so often see Base64 strings ending in
===Why Binary-to-Text Encoding Exists at All
The obvious question is why we would inflate our data just to move it around. The answer is historical and still relevant: large parts of the internet were designed to carry text, not arbitrary bytes.
Email is the classic example. The original SMTP and the message format behind it were built for 7-bit ASCII. A raw byte with the high bit set, a null byte, or a stray carriage return could be silently mangled, stripped, or interpreted as a control signal by some intermediate server. If you want to send a JPEG or a PDF through that pipe intact, you have to first turn it into something that survives a text-only channel. Base64 is that something.
The same problem shows up anywhere a transport is text-shaped:
- Putting binary inside JSON or XML, which have no native byte type
- Embedding a small image directly in HTML or CSS
- Stuffing a cryptographic key or certificate into a config file or environment variable
- Passing binary through a URL or an HTTP header
In each case the rule is the same: when the channel only guarantees safe passage for printable characters, encode your bytes into printable characters first. Base64 is the most common answer because it is simple, reversible, and supported everywhere.
The ~33% Size Overhead
Base64 is not free. You are spending 4 output characters to represent every 3 input bytes, which is a 4/3 ratio, or roughly a 33% increase in size before you even count padding and any line breaks the format adds.
text3 bytes -> 4 chars (+33%) 300 bytes -> 400 chars (+33%) 3 MB image -> ~4 MB of Base64 text
This matters more than people expect. A 3 MB image becomes about 4 MB of text. If that text also gets gzipped in transit you recover some of the loss, because Base64 output is still fairly compressible, but you never fully break even versus shipping the raw bytes. The overhead is the price of compatibility, and you should only pay it when you actually need a text-safe representation.
A frequent mistake is treating Base64 as if it shrinks data. It never does. If your goal is smaller payloads, you want compression; if your goal is safe transport through a text channel, you want Base64. They solve different problems.
Data URLs: Base64 You See Every Day
One of the most visible uses is the data URL, which lets you inline a resource directly instead of linking to a separate file:
html<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..." />
The shape is
data:[mediatype][;base64],<data>;base64Data URLs are genuinely useful for small assets. A tiny icon or an SVG embedded as a data URL saves an HTTP round trip, which can be worth it for above-the-fold content. The tradeoffs:
- They inflate the host document by ~33%, and a data URL cannot be cached independently of the page or stylesheet that contains it.
- They are best for small, rarely-changing assets. Inlining a large hero image bloats your HTML and forces the browser to re-download it on every page load.
As a rough rule, reach for a data URL when the asset is a few kilobytes and saving a request matters; otherwise serve it as a normal cacheable file. When you need to generate or sanity-check one, a quick Base64 encoder and decoder is faster than wiring up a script.
JWTs and the Base64url Variant
If you have worked with authentication, you have stared at a JSON Web Token: three chunks separated by dots.
texteyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0In0.dQw4w9WgXcQ...
The first two segments are JSON objects, the header and the payload, each encoded with a URL-safe flavor of Base64. Because standard Base64 uses
+/=+-/_The single most important thing to understand here: the header and payload of a JWT are encoded, not encrypted. Anyone holding the token can decode those segments and read every claim inside in plain text. The third segment is a signature that proves the token was not tampered with and was issued by someone holding the secret, but it does nothing to hide the contents.
This trips up developers constantly. Never put anything you would not want the client to see inside a standard JWT payload. To inspect what a token actually contains, decode it with a JWT decoder and you will see the claims in the clear, which is exactly the point being made.
When NOT to Use Base64
The most common abuse of Base64 comes from confusing "unreadable to me" with "secure."
It is not encryption
Base64 has no key and no secret. The transformation is fully public and trivially reversible by anyone, including with a one-line command:
bashecho "cGFzc3dvcmQxMjM=" | base64 --decode # password123
Encoding a password, an API key, or a token in Base64 and storing or transmitting it provides exactly zero confidentiality. It only makes the value slightly less obvious at a glance. If you need secrecy, use real encryption with a managed key. If you need to store passwords, use a purpose-built password hash. Base64 belongs in neither workflow.
It is not a checksum or integrity guarantee
Base64 will happily encode corrupted bytes and decode them right back to the same corrupted bytes. It tells you nothing about whether the data is intact. For integrity you want a hash or a signature.
It is wasteful when the channel is already binary-safe
If you are writing bytes to a file, a binary database column, or a protocol that handles raw bytes cleanly, encoding to Base64 first just adds 33% and a pointless round trip. Use it at the boundary where text-only transport forces your hand, and not a layer earlier.
A Practical Mental Model
Keep three questions in mind:
- Am I moving bytes through a text-only channel? If yes, Base64 is the right tool. If no, you probably do not need it.
- Do I need this to be secret? Base64 never helps here. Reach for encryption.
- Do I care about size? Then remember the 33% tax and consider compressing the raw bytes instead.
Base64 is a small, honest tool that does exactly one job well: making binary survive a journey designed for text. Use it for that, keep it away from anything security-shaped, and it will serve you reliably for the rest of your career.