CRC32 vs MD5 vs SHA-256: picking the right check

Download a Linux ISO and you'll find a SHA256SUMS file next to it. Open a ZIP archive and every entry carries a CRC-32. Plenty of internal tooling still stamps artifacts with MD5. All three produce a short hex fingerprint of the data — and all three are answering different questions. Using the wrong one isn't a style mistake; it's a category error that occasionally becomes a security hole.

Two different jobs, one word.

Everything in this article is a function that reads arbitrary input and emits a fixed-size digest, where any change to the input almost certainly changes the digest. The split is in the phrase almost certainly — against whom?

Checksums (like CRC32) defend against accidents: a flipped bit on a flaky disk, a truncated download, line noise. The corruption is random, so the checksum only has to make random changes detectable.

Cryptographic hashes (like SHA-256) defend against adversaries: someone deliberately crafting a different input that produces the same digest. That's a vastly harder requirement, and it's the entire difference in design, cost, and output size.

The question to ask: "who am I checking against?" If the answer is entropy — cosmic rays, bad RAM, dropped packets — a checksum is enough. If the answer is a person, only a cryptographic hash (and often more than that) will do.

CRC32: the corruption detector.

A cyclic redundancy check treats your data as one enormous binary number and computes the remainder of dividing it by a fixed 33-bit polynomial. The remainder is the 32-bit check value. That sounds arbitrary, but polynomial division has a beautiful property: it's guaranteed to catch every single-bit error, and it detects short burst errors — consecutive corrupted bits, exactly the shape hardware failures tend to produce — extremely reliably.

This is why CRC-32 is the integrity check inside ZIP and gzip, every PNG chunk, and Ethernet frames. It's tiny, it's cheap enough to run on every packet, and against random damage it's superb. The standard test vector, if you're checking an implementation: the ASCII string 123456789 has a CRC-32 of CBF43926.

Against a human, though, CRC32 offers nothing at all. It's a linear function — no secret, no one-way property. Anyone can modify a file and then adjust four bytes to restore the original CRC. And with only 32 bits, even random collisions get likely surprisingly fast (that's the birthday problem — with mere tens of thousands of files, matching CRCs among some pair become expected). A matching CRC32 means "probably not corrupted in transit," never "this is the file the author published."

MD5 and SHA-1: how they broke.

MD5 (128-bit) and SHA-1 (160-bit) were designed as cryptographic hashes, and for years they were the standard. Both are now broken — in a specific, important sense.

What broke is collision resistance: the guarantee that nobody can produce two different inputs with the same digest. For MD5, collisions were demonstrated by Xiaoyun Wang's team in 2004, and the attack matured to the point where colliding files are generated in seconds on ordinary hardware; the Flame malware infamously exploited an MD5 chosen-prefix collision to forge a Microsoft code-signing certificate. SHA-1 followed in 2017, when Google and CWI Amsterdam published SHAttered — two different PDFs with the same SHA-1 digest.

Does that make MD5 useless? Not for accident-detection: as a fancy checksum against random corruption, it still works fine, and you'll keep meeting it in legacy protocols and deduplication systems. But the moment an attacker enters the picture — file authenticity, signatures, certificates, anything security-adjacent — a collision-broken hash is disqualified, because "two files, same fingerprint" is precisely the move a forger needs. New systems shouldn't reach for MD5 even for the innocent cases; SHA-256 costs little more and never invites the awkward audit question.

SHA-256: the current default.

SHA-256, from the SHA-2 family standardized by NIST, produces a 256-bit digest and remains collision-free in practice — no practical attack is publicly known. It's what OS vendors publish next to ISOs, what Git is migrating toward (from SHA-1), what package managers pin dependencies with, and what certificate signatures use.

The 256-bit output isn't decoration. Against brute-force collision hunting, resistance scales with half the bit length (the birthday problem again) — so SHA-256 offers on the order of 2¹²⁸ work to find any collision, comfortably beyond feasibility. Its cousins SHA-384/512 and the newer SHA-3 family exist and are fine; for everyday integrity work, SHA-256 is the boring, correct default.

The comparison table.

Algorithm	Output	Designed against	Status today	Reach for it when
CRC32	32 bits	Random corruption	Fine at its job; zero security	Framing/format checks, quick corruption tests
MD5	128 bits	Adversaries (1992)	Collisions trivial since 2004	Legacy compatibility only
SHA-1	160 bits	Adversaries (1995)	Collisions demonstrated 2017	Legacy compatibility only
SHA-256	256 bits	Adversaries (2001)	No practical attacks known	File verification, signatures, new designs

What none of them do.

Two jobs get mis-assigned to plain hashes often enough to call out:

Passwords. A fast hash is exactly what you don't want for password storage — speed is the attacker's friend when they're guessing billions of candidates offline. Passwords need deliberately slow, salted, memory-hard functions: bcrypt, scrypt, or Argon2. "We hash passwords with SHA-256" is a red flag, not a reassurance. (More on what actually makes a password strong in how strong is your password, really?)

Authenticity. A hash proves a file matches a digest you already trust. If the digest travels over the same channel as the file, an attacker who can swap one can swap both. Download page and checksum file on the same server = integrity check against corruption, not against compromise. Authenticity needs the digest to arrive via a separate trusted path, or a real signature (HMAC, GPG, code signing) binding it to a key.

Takeaways.

The thing to remember: match the tool to the threat. CRC32 detects accidents and nothing else. MD5 and SHA-1 are broken against forgers — legacy only. SHA-256 is the default for anything an adversary might touch. Passwords get a slow hash (Argon2/bcrypt), and authenticity needs a signature or a separately-trusted digest, not just a matching hash.

The fingerprint metaphor hides a real split: some fingerprints only need to differ by luck, others need to differ against effort. Once you sort the algorithms by the threat they were built for — rather than by output length or age — the "which one do I use" question mostly answers itself.

Compute checksums and hashes in your browser.

The Hash Generator computes MD5, SHA-1, SHA-256, and SHA-512 of any text, and the CRC32 tool covers the checksum side — both entirely in your browser. Nothing you paste is uploaded, which matters when you're fingerprinting something sensitive.

Open the Hash Generator

Made with love by a very serious person pretending not to be. Tooly McToolface is a workshop of free, client-side web tools. If you like knowing what a fingerprint actually promises, password entropy applies the same adversarial thinking to passphrases, and magic numbers covers the other way software identifies a file.

Two different jobs, one word.

CRC32: the corruption detector.

MD5 and SHA-1: how they broke.

SHA-256: the current default.

The comparison table.

What none of them do.

Takeaways.

Compute checksums and hashes in your browser.

More from the workshop.

How strong is your password, really?

The CRC32 tool.