Color palettes from photos: three algorithms

The three approaches, in one paragraph each

Frequency sampling. Count every distinct pixel value, sort by occurrence, return the top N. The simplest possible algorithm; runs in O(pixels) time. Produces a palette dominated by whatever covers the most surface area in the image — usually backgrounds, sky, large blocks of similar tone. Great for "what's the dominant color of this product photo." Terrible for "give me an interesting palette" because the result is almost always one near-monochrome plus four shades of background.

K-means clustering. Treat each pixel as a point in 3D color space (RGB, or better Lab). Pick K random starting centroids. Assign each pixel to the nearest centroid; recompute centroids as the mean of their assigned pixels; repeat until centroids stop moving. The K final centroids are your palette. Slower (O(K × pixels × iterations), typically 10-30 iterations); produces palettes that are both representative and distinct. The standard choice for general-purpose palette extraction. Sensitive to starting positions — running twice on the same image can give slightly different results unless you seed deterministically.

Median cut. Treat the entire pixel set as a 3D bounding box in color space. Find the longest axis (usually the one with the widest spread of values). Split the box at the median value along that axis into two boxes. Recurse on each box until you have N boxes; the centroid of each box is one palette color. The classic algorithm for GIF quantization (it's what gave us the original 256-color web palettes). Faster than k-means, deterministic, and tends to produce palettes that span the full range of the image's color variety rather than clustering around the most common values.

What each algorithm actually emphasizes

The right way to think about the differences: each algorithm answers a slightly different question.

Frequency sampling answers "what colors cover the most area?" That's the right question for product photography on white seamless ("the shoe is red"), for brand-color extraction from a logo on a known background, or for compressing an image to a small palette where the goal is to minimize visible difference from the original.

K-means answers "what colors best represent the variety in this image?" That's the right question for design palettes (you want a balanced set of distinct colors you can use as fill/accent/background), for thumbnail generation where you need a small number of colors to convey the image's identity, and for tagging or search indexing where you need a stable, comparable color signature per image.

Median cut answers "what colors span the full range of this image's tonal variety?" That's the right question for color-quantizing an image into a fixed palette (the original GIF use case), for printing where you need a palette that hits the gamut extremes, and for generating "complementary" palettes that emphasize range over fidelity.

Why running them on the same image gives different answers

Take a photograph of a forest at sunset. Lots of green canopy, smaller area of orange sky, a thin band of warm light on the trunks, a dark foreground.

Frequency sampling returns: dark green, slightly lighter green, slightly different green, mid-tone green, and brown. The sunset orange — visually the most striking color in the photo — doesn't make the top 5 because it covers maybe 8% of the pixels.

K-means with K=5 returns: a dark green, a mid green, the orange, the warm trunk-glow color, and the dark foreground. Five distinct colors that genuinely represent the photo. Re-run with a different random seed and the exact shades shift slightly, but the structure stays.

Median cut returns: deep shadow black, dark green, mid green, an orange, and a near-white from the brightest highlights. The palette spans from darkest to brightest with stops in between, which is what you'd want if you were re-rendering the image with only 5 colors.

Three correct answers. Three different palettes. Three different uses.

The color space matters too

All three algorithms operate on color values as 3D points. The choice of color space changes the meaning of "distance" between two colors — and therefore changes which colors get clustered together.

RGB. Easiest, fastest. Distance in RGB space is computationally cheap (Euclidean on three integers 0-255). It maps poorly to perceived difference — two pairs of colors with the same RGB distance can look very different to a human eye. Acceptable for fast frequency sampling. Mediocre for k-means — you'll get clusters that are mathematically tight but visually arbitrary.

HSL / HSV. Better for "give me three distinct hues" because hue is a single dimension. Worse for general palette extraction because saturation and lightness aren't on a perceptually uniform scale. Useful for sorting palettes for display (sort by hue then lightness) but rarely the right space for the clustering itself.

Lab (CIE L*a*b*). Designed to be perceptually uniform — a unit of distance in Lab space corresponds roughly to one "just noticeable difference" to a human eye. Conversion from sRGB to Lab is a non-trivial math step (sRGB → linear RGB → XYZ → Lab) but worth it for any algorithm that depends on color similarity. K-means in Lab gives noticeably better palettes than k-means in RGB. Median cut benefits less because the bounding-box logic doesn't need perceptual uniformity to find tonal extremes.

If your tool says "uses k-means clustering" without specifying the color space, assume RGB and adjust expectations accordingly. The premium implementations all do Lab.

Two practical traps

Sampling 100% of the pixels is wasted work. A 12-megapixel photo has 12 million pixels. Running k-means on all of them takes seconds and gives a palette that's indistinguishable from running it on a uniform 5% sample of 600K pixels. For frequency sampling the difference is even smaller. The implementations that feel snappy in your browser are downsampling internally — usually to a 100×100 or 200×200 grid before clustering — and the user never notices.

Quantizing colors before extraction can either help or hurt. If you reduce the input image from 16M possible colors to, say, 4096 (round each channel to the nearest 16) before running the algorithm, you speed up frequency sampling dramatically and get a palette that's stable across small image variations (resaving as JPEG won't change the result). But for k-means, pre-quantization throws away exactly the precision the algorithm uses to find good clusters — you'll get a worse palette faster. Match the pre-processing to the algorithm.

One more algorithm worth naming

Modified median cut (sometimes called "octree quantization") is a variant where the bounding-box split happens at the variance-weighted center instead of the median, and the split axis is chosen to minimize the variance of the resulting two boxes. It produces noticeably better palettes than vanilla median cut on photographs with skewed color distributions (e.g., night photos that are 90% black with small bright areas) and is what most modern GIF encoders use internally. The trade-off is implementation complexity — vanilla median cut is ~50 lines of code, the modified version is closer to 300.

If you're picking an off-the-shelf library and the page mentions "modified median cut" or "Wu's algorithm" or "octree quantization," that's a good sign. Color Thief (the npm package) uses modified median cut by default; node-vibrant uses a variant of k-means in HSL with manual seeding for the well-known "vibrant" / "muted" categories.

Companion tool

If you have a photograph and want to extract its palette right now: Color Grabber. Drop in any image; the tool runs k-means clustering in Lab space (the better-quality option from above) and returns the top 5, 6, or 8 colors as hex, RGB, HSL, and Tailwind/Figma/SCSS export formats. Everything happens in your browser — the image never uploads. The downsampling is automatic; even a 50-megapixel raw still completes in a second or two.

For the encoding side of color — how to use those palette values in your CSS, what the practical differences between RGB and HSL syntax are, when to use OKLCH — the existing SVG Shrinker documentation covers some of the metadata-stripping aspects, and the why SVGs bloat article covers SVG-specific color use.