Hashing, encryption, and encoding: three things people confuse
Encoding, encryption, and hashing solve three different problems. Mixing them up is how passwords leak and tokens get forged. Here is the precise distinction.
英語版を表示しています。翻訳は準備中です。
Three operations get used interchangeably in conversation and code reviews, and the confusion is not academic — it is the direct cause of plaintext-equivalent password stores and tokens anyone can read. Encoding, encryption, and hashing each transform data, but they answer different questions and have different reversal properties. Get the wrong one and the failure mode is silent: the code runs, the tests pass, and the security property you thought you had does not exist.
The one distinction that matters
Strip away the algorithms and there are two questions:
- Can you reverse it? Encoding and encryption are reversible. Hashing is not.
- Do you need a secret to reverse it? Encryption requires a key. Encoding requires nothing — the transform is public.
That gives you three buckets:
| Reversible? | Needs a key? | Purpose | Examples | |
|---|---|---|---|---|
| Encoding | Yes | No | Representation / transport | Base64, hex, URL percent-encoding, UTF-8 |
| Encryption | Yes | Yes | Confidentiality | AES (symmetric), RSA / ECC (asymmetric) |
| Hashing | No | No | Integrity, fingerprint, lookup | SHA-256, SHA-3, BLAKE3 |
The "needs a key" column is the one people skip, and it is the one that decides whether something is a security control at all. Encoding needs no key, so it provides no confidentiality — full stop. If anyone can run the reverse transform without a secret, the data is in the clear.
Encoding: changing the representation
Encoding maps bytes from one alphabet to another so they survive a
channel that cannot carry the original. Base64 turns arbitrary binary
into 64 printable ASCII characters so it fits in JSON, email, or a URL.
Percent-encoding escapes characters that have meaning in a URL. Hex
renders bytes as 0–f pairs for display. UTF-8 encodes code points
into byte sequences.
None of this involves a secret, and that is the entire point. The
decoder is built into every standard library. atob() in the browser,
base64.b64decode in Python, base64 -d on the command line — all
reverse Base64 instantly, no key required.
Encoding is the right tool when the problem is shape: binary that needs to travel through a text-only field, a filename that needs to be URL-safe, a hash digest that needs to be printed. It is the wrong tool the moment the problem is secrecy, because it offers none.
Encryption: confidentiality with a key
Encryption transforms plaintext into ciphertext that cannot be reversed without the key. This is the only one of the three that protects confidentiality, and the key is what makes that true.
Two families:
- Symmetric (AES, ChaCha20) — the same key encrypts and decrypts. Fast, used for bulk data: disk encryption, TLS record payloads, encrypting a column in a database.
- Asymmetric (RSA, ECC) — a public key encrypts, a private key decrypts (or the inverse, for signatures). Slower, used to bootstrap trust and exchange symmetric keys without a prior shared secret.
The trade-off encryption forces on you is key management. Encrypted data is exactly as safe as the key, and the key has to live somewhere: a KMS, an HSM, an environment variable, a config file. A leaked key turns all your ciphertext back into plaintext retroactively. "We encrypt it" is an incomplete sentence until you can answer where the key is and who can reach it.
Hashing: a one-way fingerprint
A hash function maps input of any size to a fixed-size digest, and the defining property is that you cannot run it backward. There is no key and no decode step — given the digest, the only way to find an input that produces it is to try inputs until one matches.
That irreversibility is the feature. Hashing answers "is this the same data?" and "has this changed?" without storing or revealing the data itself. It powers integrity checks (compare digests of a downloaded file), deduplication, content-addressed storage (Git commit IDs), and hash-table lookup.
One nuance worth stating plainly: MD5 and SHA-1 are broken for collision resistance — an attacker can construct two different inputs with the same digest, which is fatal for signatures and certificates. They are still perfectly fine as non-security checksums, where you only care about detecting accidental corruption and there is no adversary crafting collisions. Use SHA-256 or BLAKE3 for anything an attacker can influence; MD5 is acceptable for an ETag or a cache key.
The confusions, and the bugs they cause
"We Base64 the password before storing it"
This is encoding mistaken for security. Base64 has no key; the stored value is the password, in a costume. Anyone with read access to the table runs one decode and has every credential. This shows up in real breaches more often than it should. Base64 is for transport, never storage of secrets.
"We SHA-256 the passwords"
Closer, but still wrong, and the reason is subtle enough that careful engineers ship it. SHA-256 is irreversible, which feels like exactly what password storage wants. The problem is that it is fast — modern GPUs compute billions of SHA-256 hashes per second. An attacker who steals the hash table runs a dictionary or brute-force attack offline at enormous speed, and weak passwords fall in seconds.
Password storage needs a slow, deliberately expensive key-derivation function: bcrypt, scrypt, or Argon2. These are tunable — you set a cost factor so a single hash takes, say, 100ms, which is unnoticeable on login but makes offline cracking thousands of times more expensive.
And every password needs a per-user salt: a random value stored alongside the hash and mixed into it. Without a salt, two users with the same password get the same digest, and an attacker precomputes a table of common-password hashes (a rainbow table) once and matches it against every leaked database forever. A unique salt per user defeats precomputation entirely — the attacker has to crack each entry separately. Modern KDFs generate and embed the salt for you; bcrypt stores it in the output string.
A pepper is a related but distinct idea: a secret value added to every password before hashing, stored separately from the database (in app config or a KMS) rather than in the hash. The salt defends against precomputation when the database leaks; the pepper adds a second factor that a database-only leak does not expose. It is a defense-in-depth measure, not a substitute for a proper KDF and salt.
"We encrypt the API token"
Sometimes correct, often the wrong tool. If you need to recover the token's plaintext later — to forward it to an upstream API on the user's behalf — encryption is right, and key management is the cost. But if all you ever do is check an incoming token against a stored one, you do not need to recover anything, and encryption is overkill that adds a key to protect. Store a hash of the token and compare hashes. And when the question is "did this message come from who it claims, unmodified?" the answer is usually an HMAC — a keyed hash built for authenticating webhook payloads and similar messages — not encryption at all.
Where each belongs in a web app
- Encoding — serializing binary for JSON or URLs, building data URIs, rendering a digest as hex for display. The base64-is-not-encryption distinction is worth internalizing: it is everywhere in tokens and payloads, and it secures nothing.
- Encryption — data at rest you must read back (PII columns, encrypted backups), data in transit (TLS), and tokens you need to recover in plaintext. Always paired with a key-management story.
- Hashing — password storage (via a slow KDF + salt, never a raw fast hash), integrity checks, deduplication, lookup keys, and HMAC for message authentication.
The mental shortcut: if there is no key, it is encoding or hashing, and it protects confidentiality of nothing. If you need the original back, it is encoding or encryption, never hashing. And if you are storing passwords, none of the general-purpose tools apply — reach for a purpose-built KDF.
When you need to compute or compare digests while debugging, the hash generator produces SHA-256 and other digests in the browser so you can verify a checksum or sanity-check that two values match without piping anything through a shell.