Killring Rick Dillon's Weblog

How Broken is SHA-1?

Back in February 2005, SHA-1 was broken. The core of what "broken" means in this context is described very well by Bruce Schneier in his post announcing the attack:

If you hashed 280 random messages, you'd find one pair that hashed to the same value. That's the "brute force" way of finding collisions, and it depends solely on the length of the hash value. "Breaking" the hash function means being able to find collisions faster than that. And that's what the Chinese did. They can find collisions in SHA-1 in 269 calculations, about 2,000 times faster than brute force.

This was a major concern to me. It turns out that it's best to avoid SHA-1 in a variety of contexts, including cryptographic keys. Nevertheless, a bunch of systems that rely on cryptographic security use SHA-1, including checksums for Git objects and OpenPGP key fingerprints.

Even prior to the 2005 attack, Schneier pointed out that "It's time for us all to migrate away from SHA-1." It was on this basis that I almost switched to using SHA224 in my latest software. Almost.

To understand why I stuck with SHA-1, let's take a look at how cryptographic hashes can be attacked.

Collision Attacks

The attack documented in 2005, like most hash attacks, was a collision attack. Schneier didn't use that phrase when describing it, but that's what he describes when he talks about finding a pair of inputs that hash to the same value.

Collision attacks are the most common kind of attack against cryptographic hashes because they don't restrict the set of inputs over which the search for such a pair might be conducted. The reason collision attacks are relatively easy is because any pair of inputs that hash to the same value are acceptable; no restrictions are placed on what those inputs are, or what the value of the hash is.

Aside: The Birthday Problem

A related concept is the birthday attack, which is related to the birthday problem. The birthday problem is simple: how many people can gather at a party before it becomes probable that two share the same birthday?

You can find the number fairly easily by calculating the likelihood that, as the size of the party grows, two people won't share the same birthday. Your first guest can have any birthday at all without fear of sharing that birthday with another guest. But your second guest can have any day except the day of the first guest's birthday, leaving 364/365 possibilities. The third guest only has 363/365 possibilities, and so on. In imperative Python 3 code, it looks something like this:

# The number of guests
guestcount = 1
# The chance of all guests having unique birthdays
uniquechance = 1
# Keep adding guests until the chance of them all being
# unique is less that 50%
while uniquechance > 0.5:
    uniquechance *= (365-guestcount)/365
    guestcount += 1
print(guestcount)

This code returns 23, meaning that once you have 23 guests at your party, it's more than 50% likely that two guests will share a birthday. The number is surprisingly small because we haven't specified any day at which the collision will occur, in the same way that the 2005 attack on SHA-1 doesn't restrict collisions to any particular hash.

Preimage Attacks

Unlike collision attacks (and the birthday attack), for many practical applications of hashing, the value of the input and/or the hash matters. When the input to the hash function constrains the search for a collision, the hash must be broken using a preimage attack. There are two types of preimage attack:

  1. (Preimage Attack) Given a hash, generate an input that hashes to it.
  2. (Second Preimage Attack) Given an input, generate a second input that hashes to the same value as the given input.

A preimage attack only requires that the attacker find a single input that hashes to a given hash. Even so, it is a substantially harder attack to mount than a collision attack.

A second preimage attack adds an additional constraint: given an input and its hash, find a second input that hashes to the same value. Because there are more constraints on the solution, this attack is even more difficult.

So how broken is SHA-1?

So why did I stick with SHA-1? Well, the software I'm working on is concerned with identifying public keys using a digest of the key itself. When used in this context, such a digest is called a fingerprint.

OpenPGP uses a SHA-1-based hash to generate the fingerprint for a public key. One of the worst-case scenarios is that a user, given the fingerprint for key, attempts to retrieve it from a remote source and receives the wrong key, but with a fingerprint that matches, allowing an attacker to read encrypted messages intended for someone else. Since the attacker is given both a public key and its fingerprint and needs to generate another public key that has the same fingerprint, such an attack is a second preimage attack.

So, while SHA-1 is technically "broken" because of the 2005 collision attack that reduced the search space by a factor of 2000, there are still no known feasible preimage attacks (or second preimage attacks) on SHA-1 that weaken it when it is used for key fingerprinting. I was particularly hesitant to move away from SHA-1 for key fingerprinting using OpenPGP because it is part of the standard, and I don't have any interest in rewriting well-established cryptographic routines. I'd rather reuse existing standards with an eye toward making them accessible in new contexts. So, for now, SHA-1 it is.