Killring Rick Dillon's Weblog

Diving Into RSS

This post is designed for those that have decided RSS is something to look into, but aren't sure where to start.

Get a Feed Reader

I'll address potential reader demographics in turn. This is not intended as a survey of available software. Rather, these are my picks after using more than a dozen RSS readers over the years. My prime pick, Google Reader, is no longer with us, but two worthy successors takes its place if you're looking for professionally hosted web-based readers.

Web-Based

If you're reading across multiple devices (computers, phone, tablet), a web-based reader is almost mandatory, because RSS readers are stateful: they store which articles have already been fetched as well as which articles you have read. Even if you're only reading on your laptop, web-based are the easiest to set up and maintain, while also providing the best user experience.

For the Casual User

The best options that are hosted professionally are Feedly or The Old Reader. Both sprang from the ashes during the Great Google Reader Exodus of 2013. I would love to pick only one of these, but both are truly excellent. Both sites allow you to create an account very easily using your Google account.

I find that The Old Reader is the most exciting, because it builds a social network out of RSS, something I've wanted to see happen for years now.

Feedly has mobile apps for Android and iOS, as well as their flagship web interface. If you're using The Old Reader, it has a whole page of native app options for the mobile platforms and beyond.

For the Web-Savvy

If you run your own website and are comfortable deploying your own web app, there's always an advantage to running your own server. You have more control over your data, when you upgrade and whether or not you want to shut the service down.

The superb, open-source TT-RSS is the best-in-class for such users, and I've been using it for more than a year. It has an open-source Android app, and there is extensive third-party support for it as well.

The power of the open-source nature of the project and high community involvement is easy to appreciate if you browse through TT-RSS's plugins page.

If you're not into running your own servers, hosting providers like Bitnami offer a TT-RSS based stack that you can deploy easily.

For the Offline Afficionado

If you strongly believe that using a webpage to aggregate web pages doesn't work for you, there are several offline options.

The best, in my opinion, are the Firefox-based readers. They work across platforms as extensions to Firefox. They each have their own style, but by far the best is Brief, which provides the essential river-of-news experience that makes RSS so powerful.

On OS X, there are a few paid options, but the reader I used 'back in the day' on OS X, Vienna, is open-source and still alive and kicking.

On Linux, the best reader is still Liferea. I used it for a few weeks some years back, but moved to web-based readers because of their overall superiority.

For the Emacs Hacker

If you can't bring yourself to ever leave Emacs for anything, check out Elfeed. It's amazing, and the engineering behind the database is very interesting given the limitations imposed by Emacs.

Get Feeds

OK, so you've got a feed reader. But how do you find feeds? Most browsers have removed their built-in mechanism to identify pages with RSS. Fortunately, good browsers are easy to extend.

If you're using Firefox (you should!), then the extension RSS Icon in Awesomebar will add the RSS icon back to the URL bar, giving you one-click access to RSS subscription. If you're using Google Chrome or Chromium, get the RSS Subscription Extension, which provides almost identical functionality.

Once you add these extensions, you should come back to this site, and you'll immediately notice that it has an RSS feed because there is an orange RSS icon in the URL bar.

Integration With Your Browser

When you click on that orange icon to subscribe to a feed, your browser will bring up a preview page of the feed, and then ask you which program to use to read the feed.

The simplest approach at this point is to simply press Ctrl-l, Ctrl-c to select the URL and copy it (Cmd-l, Cmd-c on OS X) and then paste it into your feed reader's 'Add Feed' box.

A more sophisticated approach is to let your browser know what reader you're using, so it can add it with one click. If you're using Feedly, there is an extension for both Firefox and Chrome.

TT-RSS has a has a whole topic on integrating with Firefox.

If you're using Chrome or Chromium, the drop-down on the subscription page provides you the option to manage your feed readers, allowing you to provide a endpoint to call with the feed URL to subscribe.

Pro Usage Tips

Each of these probably deserves a (short) article of its own, but here are some tips I've picked up in 10 years of daily RSS reading:

  • Move through a combined view that aggregates all your feeds in reverse-chronological order. This is called the river-of-news. Don't visit feeds one-by-one, because you'll forget to visit the less frequently updated feeds. Use the river-of-news.
  • Use the keyboard to move through, visit and star articles. In most good readers, you can press '?' to get a quick help screen. Don't use the mouse.
  • Star/bookmark/favorite items that you might want to come back to later. RSS is good for reading, but is also a good database of noteworthy content.

Good luck, and stay tuned for more RSS articles soon.

Why You Should Be Using RSS

RSS is a simple, easy technology that allows you to stop opening 30 tabs in your browser to check the news sites you care about. RSS unclogs your inbox of all the newsletters you subscribe to. RSS puts you in control. RSS is simple, distributed, ubiquitous, and free. You should be using it. But what is it?

RSS is way for computers to read websites.

Consider: much of the web is designed only for humans to read. Layout, formatting, fonts, responsive design, JavaScript...it's all about designing web pages for one scenario: a reader visiting the site with a browser. It's all about the all-powerful User Experience. But in focusing so much on a user's experience on a single site, we've completely neglected the user experience of the web as whole. That's where RSS comes in.

RSS is one of many ways websites can be made easier for computers to parse. Separating content from form is at the core of CSS. But content itself can contain markup that gives computers clues about what kind of content it is. Microformats are designed to provide exactly that kind of metadata.

RSS takes things a step further, providing a simple format the strips away all the style and layout information, and just provides raw structure, metadata, and content. An RSS parser examines the structure to identify given entries in an RSS feed. For each entry, the parser examines the metadata, like the title of the entry, the date the entry was made, and the author of the entry. Finally, the parser examines the content itself. Using this basic methodology, the parser can build a database of the website.

But what is gained by parsing RSS and building the database? To answer that, we should look at how people use the web for both search and discovery.

Search and Discovery

The decentralized nature of the web makes it difficult to jump directly to what you're looking for, so search engines sprang up to solve exactly that problem. By having computers follow all the links on all the pages and store what words are on what pages (and what those pages link to), users can quickly jump to content related to keywords they enter.

But what about data you don't know exists? News is an obvious example, and has at least three aspects.

  • News from people you don't know but from sources you may trust, like the BBC (actual news)
  • News from people you are interested in, but aren't necessarily your friends (blogs, social networks), and
  • News from friends (social networks),

There's also non-news sources.

  • A wiki that covers a topic you're interested in. What's been updated lately?
  • Blogs, either personal or professional, updated frequently or not. How do you find out when they've changed?
  • Questions posed about a particular topic in a Q&A forum. How do you discover new questions?

Search engines address none of these use cases, since search engines handle search, and these issues are about content discovery.

The Flawed Model of Email-Powered Discovery

The most common way to address these issues is via website newsletters, wherein a reader submits their email address to the website and gets notified about interesting stuff via email.

There are three problems with this approach:

  • It gives the reader's personal information (their email address) to a site needlessly.
  • Rather than the website acting as a passive source of information, it now takes control, deciding when a reader gets notified about new content. This leads to the oft-cited problems of email overload.
  • Unsubscribing becomes an error-prone transaction that puts the reader at the site's mercy. If the 'unsubscribe' link doesn't work for any reason, the reader is left little recourse besides email filters and the 'Mark as Spam' button.

The whole system is much more complex than it needs to be, and it dilutes the importance of email, since messages from your bank about your account being overdrawn carry the same importance as the latest web-comic from a friend-of-a-friend.

Introducing the Feed Reader

We can avoid the problem of email dilution and email overload by moving non-actionable content away from email and into a feed reader. A feed reader is a lot like a second email client, but rather than being a gateway to communication with humans, a feed reader is designed to be a gateway to the web.

Remember that database of information the computer gathered by parsing RSS? It's essentially a one-stop shop for all the data from web pages you care about, stored in a single location, much like email. Unlike email, though, it is non-urgent content, so you can treat it more like a magazine than an email client. A feed reader maintains a list of all the sites you're interested in getting updates about and automatically fetches new entries from them regularly, storing them in the database, and presenting you with a stream of updates.

This approach puts you back in control, but alleviates the burden of checking many sites manually. The browser becomes much less cluttered: rather than a tab for each site that may have interesting content, tabs are opened by the feed reader if a particular entry looks interesting enough to read through in its entirety.

Besides the reduction in email noise and enhanced organization offered by RSS, it also allows you to enjoy content that was previously inaccessible. Consider the Harvard Journal of Law and Technology. It's simply not updated often enough to warrant visiting it every day, but when it is updated it has some great content. With RSS, updates magically appear in the same place you get all your information on the web: in your RSS reader. There's tons of content on the web produced by amateurs that is excellent, but because they're not paid to write, updates can be sporadic. RSS opens the door to enjoying that entire category of content.

So, if you see an interesting site, you don't have to worry about bookmarking it, signing up for a newsletter, or remembering to visit it again. You simply click the feed icon and content from the site magically appears in your usual news stream. It's like teleporting information directly into your brain.

RSS is a Way for The Internet to Know Itself

In his 1980 TV series Cosmos, Carl Sagan said "We are a way for the cosmos to know itself." The internet exists on a unimaginably smaller scale, but RSS gives the internet a way to read itself. RSS brings a beautiful publishing, aggregating, filtering, and analyzing platform to regular people, making the vastness of the web just a bit more manageable. If you're not using it, you're spending a whole lot more time absorbing a whole lot less information.

Shortcomings of Canonical's Unity

In 2011, Canonical made Unity the default desktop environment for it's market-leading distro Ubuntu. Unity has been in development since 2009, but remains the least sophisticated desktop environment available for Linux, and not only fails to innovate in any meaningful way, but represents a regression in the quality of software on Linux with respect to stability and configurability. As a result of Canonical's insistence on using Unity (which was developed in-house at Canonical), entire Ubuntu spinoffs have been created with a goal of allowing users to easily avoid using Unity. Distros such as Kubuntu, Lubuntu, Xubuntu differ from Ubuntu only as much as necessary to provide a different default desktop experience from that provided by stock Ubuntu. Even the more distantly-related Linux Mint has taken it upon itself to move away from Unity, creating not one, but two alternative desktop environments, MATE and Cinnamon, based on Gnome 2 and Gnome 3, respectively. This has not deterred Canonical in it's mission to push Unity as the de facto desktop interface in an effort to unify the user interface for Linux across desktops, laptops, netbooks, tablets, and phones.

Unity Apologists

I was reading a thread over on Hacker News in which Canonical was getting praise for not actively fighting the community's decision to switch from Upstart to systemd. In this discussion, past Canonical projects that bucked the community were discussed, including Unity and Mir. One comment read "[Unity] is a breath of fresh air compared to most alternatives on linux."

That has not been my experience with Unity, and I commented as such, but was immediately questioned as perhaps being part of a community of "power users" that "never really used Unity". Au contraire.

Many of Unity's shortcomings stem from Canonical's ongoing proclivity to attempt to reinvent common desktop interactions, regardless of the cost it imposes on Ubuntu's least experienced users. Power users can simply change environments, but new users are stuck with Unity's limitations until they gain the expertise to switch away from it.

Caveat: I'm Using Unity 5.x

It's worth noting that the machine I've most recently used Unity on is using the latest LTS release of Ubuntu, 12.04, which, as of this writing, is still recommended on Ubuntu's site as the latest stable release. Nevertheless, I realize that there's a good chance Unity 6 and Unity 7 have introduced improvements, and that not all features have been backported to the 12.04 distro, so some of my comments may be somewhat dated. That said, they do reflect the current state of a fully-patched 12.04 system. With that caveat out of the way, let us forge ahead.

Ongoing Instability

In writing this post, I fired up a Unity session on my Ubuntu box and used it for an hour to refresh my memory on exact details of Unity's behavior that disappointed me. In the first thirty minutes of usage, Unity crashed twice during execution of routine operations (opening the launcher to launch a program in both cases, actually). So Unity's stability leaves something to be desired, even in 2014. I just moved from Cinnamon to KDE4 a couple of weeks ago, and in that time, KDE hasn't crashed even once. In months of Cinnamon usage on three machines prior to that, I experienced only one crash. Having core elements of your user experience crash regularly is undesirable, to be generous.

Making Easy Things Difficult

One mistake Canonical continually makes is releasing beta software to its user base, and making that software the default. Unity is perhaps the canonical example of this (pardon the wordplay). The first commit to Unity was made in October 2009, and it was made the default environment in Ubuntu in the 11.04 release, after about 18 months of development. Not surprisingly, the lack of maturity in the codebase is evident.

How Do I Add a Program to the Dash?

In most desktop environments, it's a common and simple operation to create a menu item for an application installed outside of the usual package management mechanisms. In KDE, for example, simply right-clicking on the the menu icon and selecting "Edit Applications" brings up an interface to add, remove and edit applications. It's a common operation for many users.

Despite the utility of modifying entries in the Dash and Launcher, Unity makes it difficult, simply by virtue of the fact that the functionality is not included at all. Users that wish to change an icon, description or simply add an executable that is not already present have two options:

  1. Open a text editor and navigate to a hidden directory, creating a .desktop file in a very particular format to make a new program appear to Unity.
  2. Install a third-party applications like gnome-panel and alacarte to allow programs to be added to the Unity Launcher and Dash.

The fact the there is an extensive wiki page describing a series of complex contortions a user must go through to access such basic functionality is inexcusable. I don't mind steep learning curves, but a product that makes simple actions time-consuming and complex doesn't respect my time.

How Can I Resize the Launcher?

This is more of an illustrative point -- Unity absolutely allows users to resize the launcher, even from within the GUI. But how? Can a user just right click on the launcher and select the "Resize" option? Or perhaps move the mouse to the edge of the Launcher and drag to resize it? In fact, neither option works -- the setting is buried inside of the settings application, under "Appearance". There, in a panel that allows you to change the background for the desktop, there is a slider that allows you to choose a launcher size between 32 and 64 pixels. That was only added in 2012, actually. Before that, it was a hidden option, made available through custom configuration editing or the use of the MyUnity tool.

Using KDE 4 as a counterpoint again, simply right clicking on any panel on the screen pulls up a menu that allows the user to choose "Panel Settings". From the settings interface (which is actually attached to the panel being modified), panel position, width, alignment, included widgets and auto-hide behavior are all easily accessible. Compared to Unity, it is a triumph of design. For what it's worth, similar functionality is available in Gnome 2, KDE 3, Cinnamon, MATE, LXDE and XFCE.

In fact, Unity does so poorly at affording such obvious settings like this that a whole ecosystem of tools has grown up around Unity in a community-wide effort to add back all the basic features users expect from their desktop environments, like resizing panels, adjusting transparency, tuning auto-hide behavior and many others. It's worth noting that features like this have been in Linux desktop environments since the late 1990s, and having them missing in 2014 is simply embarrassing.

What's With Reordering Programs in the Launcher?

If you want to change the order that programs appear in the launcher, you might think you could simply click on the icon of the program you want to move and drag it to the desired position. Instead of moving the program as intended, users will instead find that the entire set of programs in the launcher is pulled in the direction the user drags, only to snap back to the original position when the user releases the mouse button. To actually rearrange the ordering of the icons, users must first drag the icon out of the launcher (toward the center of the desktop) and then back into the launcher in the desired position.

Making Advanced Features Impossible

OK, perhaps not impossible; just about anything is possible if you're willing to write plugins or extensions. Unity relies on this fact extensively, pushing basic functionality out of Unity and into various plugins written by the community, each varying in terms of quality, performance and maintenance. The result is a shoddy, ad hoc ecosystem of spotty software each user is responsible for cobbling together on a per-machine basis.

Hey, Mind if I Move the Launcher?

The Launcher panel is placed on the left side of the screen. That's great for widescreen users, but it might be desirable to move it for some users, either to the right side or even to the top or bottom. It turns out that moving the launcher is impossible using the stock software (one might imagine editing a configuration file somewhere). Instead, moving the location of the Launcher requires an unofficial Compiz plugin, not because of limited developer resources, but rather by design. Here's Mark Shuttleworth himself on this exact issue:

I think the report actually meant that the launcher should be movable to other edges of the screen. I'm afraid that won't work with our broader design goals, so we won't implement that.

Shuttleworth maintains this sort of an Apple-esqe attitude toward dictating how users should use their computers. But it doesn't work nearly as well in the Linux ecosystem as it does among Apple's users that have come to expect a strictly controlled experience, from the hardware all the way through the OS to the software (via app stores) and into their cloud offerings and content store. Linux poses a significantly different value proposition, and targets a different demographic.

What's My CPU Doing?

Every desktop environment on Linux supports adding some kind of system monitoring applet that sits next to the system tray and task switcher. Unity managed to not only launch without that one, but lacks most other applets common to other desktop environments as well. Common functionality that allows users to customize their environment is completely absent in Unity, with right-clicks yielding identical results to left-clicks on almost every visible UI element.

So, what does it take to get a simple CPU graph next to the system tray?

As it turns out, users must install another custom piece of software provided out-of-band from the main Unity development pipeline. Not only does a user have to install the software manually, it's not even included in the default repositories. Instead, the extension is only available from a PPA (which must be added manually).

The situation with the CPU monitor is hardly unique. It turns out that modifying practically anything about the top panel in Unity is extraordinarily difficult, requiring research, custom hacks or entire add-ons to obtain features built-in to KDE, Gnome, LXDE and XFCE. Simple actions like adding another panel at the bottom of the screen, adjusting auto-hide behavior, tuning transparency, and changing the order of the icons on the right side are not possible in the default Unity install.

In short, customizing Unity in common ways almost uniformly results in a project. The sad part is that there are a lot of pieces of software that afford the user a lot of customization and power, but it comes at the cost of the learning curve. Unity actually is less powerful than other desktop environments while simultaneously being harder to use. It's the worst of both worlds.

But Why?

The sad part is I can imagine how this all came to pass. The design meetings for Unity must have been tough. A product manager was charged with creating a unified interface suitable for both desktop and touch-based devices. By that point, Unity already lost, simply because of the design constraints imposed by such a goal. Consider:

  1. Right-click had to be removed from most elements, since touch-based devices wouldn't be able to easily access the functionality.
  2. The launcher had to remain on the left side of the screen, probably because different interaction expectations were designed for the right side. You can see this clearly in their designs for Ubuntu Phone.
  3. Since the launcher could be packed full of the "favorite" programs, it might overflow. But because it had to support touch, it couldn't be shrunk to accommodate the program icons (as seen in Apple's OS X). So instead, it had to allow the user to scroll it, removing the ability to easily rearrange programs via dragging, resulting in the contorted "drag out of the launcher and back in" UX.
  4. Common panel widgets, like CPU monitoring, were put on the back burner, if considered at all, since smaller devices don't have the space to include them, or the battery life to constantly update them. (See Dan Sandler's comment on Android battery life for context.)

These are just examples, but the point is that by trying to present a unified experience across all devices, Unity seriously compromises quality as well. Touch devices don't feel quite right (why is there a program launcher on the left side?), and desktops get a dumbed-down version of a desktop (no right-click, for example).

So, What's Better?

The good news is that if you're looking for something better than Unity, you don't have to look far: just about everything is better. MATE (a fork of Gnome 2) is quite serviceable and I used it happily for nine months. Cinnamon (a reskinning of Gnome 3) is equally usable and, while light on features, perfectly serviceable. XFCE is my go-to environment on my Linux gaming box.

I'd have to say the gold standard today, though, is really KDE 4. It took years, but that team has taken all the lessons learned from KDE 3 and created a superbly powerful, well-designed and sleek desktop environment. So, if you've got the Unity blues, KDE is just a

sudo apt-get install kde-full

away.

Klout and a Broken Model of Internet Influence

Back in 2012, when I joined the startup scene in San Francisco, I was surprised to learn that so many took Klout seriously. They tracked their Klout rating over time, comparing it with others, and even had playful competitions to see who could increase their Klout score the most over a couple of months.

When I first learned of Klout shortly after it came out, I didn't think too much about it. It basically seemed like a one-number metric to determine your influence online. As the years have passed since Klout was launched, I see it more as an example of a how a deeply flawed model of the internet has been popularized.

In the early- and mid-1990s, the internet was a loose federation of institutions, mostly in the .gov and .edu spaces. Email was still highly decentralized, since webmail didn't exist yet. If you were 'on the internet', it was likely through your university or research institution, and they might offer you 'web space' where you could publish some HTML files that constituted your website (much like the site you're reading right now). It was a beautifully organic system, but was still nascent, reserved mostly for the technical elite.

Fast forward to today, and we find that almost all of the power on the internet has been consolidated. Home pages have been replaced by social network profiles on corporate-controlled sites like Facebook and Twitter. Rather than providing space and bandwidth, these companies are 'identity brokers', a much broader and more lucrative role than a simple web host. I've never heard that term used before, but it seems apt.

But with this inevitable commercialization of identity online comes second-tier services that analyze the resulting structure. Where Google sought to analyze the inter-connectivity of the sites on the internet, Facebook sought to rebuild the internet from the inside out, replacing home pages and web hosts with profile pages that Facebook could track metrics on, like popularity. As competition grew and other niche networks joined the fray, an opportunity arose for companies like Klout to act as a sort of meta-analyzer, analyzing identity and profiles across social networks, delivering a satisfying number letting you know your worth online.

As I was sitting at lunch earlier this week, a coworker asked me "Hey, what do you think of Klout?" I paused only for a moment before replying "Honestly? I think it's bullshit."

There are a couple of problems with Klout, one flowing from the other.

Klout is built on the idea that a person's influence online is governed by the profiles they keep in large, corporate-controlled networks. If a person doesn't buy into the notion that social interaction online should be mediated by these identity brokers, Klout simply ignores them. It's not the worst assumption to make, however, in an age where The Pope has a Twitter account.

The second problem, an inevitable outgrowth of the assumption that social networks are the source of influence, is that Klout completely ignores other sources of influence. If someone has a blog with 500k subscribers via RSS, it is invisible to Klout. If someone is a top commentor on Slashdot, a top poster on Reddit, in the top 1% of StackExchange users for C++, or has 10k followers on YouTube, they might be completely invisible to Klout, and at the very least, all those contributions online won't contribute to Klout's notion of 'influence'.

And that's the crux of the issue. My identity online is a handle I've spent years building across dozens (probably hundreds, to be honest) of sites. While it's in Facebook's and Google's and Twitter's best interest for my identity to be tied to their service, I believe that identity and influence cannot be so easily corralled.

And that's the real problem with Klout: it reinforces this notion that the identity brokers dictate reality. When I joined the startup, a senior employee told me "You don't exist online!" I was confused, since I have at least two different blogs, and have fairly active accounts with G+, Youtube, Reddit, Twitter, StackExchange, Slashdot and GitHub. When I asked her what she meant, she simply said "You have no Facebook profile!" Even in my case, where I maintain multiple profiles on sites controlled by identity brokers, it still wasn't enough; they have to be the right identity brokers.

How Broken is SHA-1?

Back in February 2005, SHA-1 was broken. The core of what "broken" means in this context is described very well by Bruce Schneier in his post announcing the attack:

If you hashed 280 random messages, you'd find one pair that hashed to the same value. That's the "brute force" way of finding collisions, and it depends solely on the length of the hash value. "Breaking" the hash function means being able to find collisions faster than that. And that's what the Chinese did. They can find collisions in SHA-1 in 269 calculations, about 2,000 times faster than brute force.

This was a major concern to me. It turns out that it's best to avoid SHA-1 in a variety of contexts, including cryptographic keys. Nevertheless, a bunch of systems that rely on cryptographic security use SHA-1, including checksums for Git objects and OpenPGP key fingerprints.

Even prior to the 2005 attack, Schneier pointed out that "It's time for us all to migrate away from SHA-1." It was on this basis that I almost switched to using SHA224 in my latest software. Almost.

To understand why I stuck with SHA-1, let's take a look at how cryptographic hashes can be attacked.

Collision Attacks

The attack documented in 2005, like most hash attacks, was a collision attack. Schneier didn't use that phrase when describing it, but that's what he describes when he talks about finding a pair of inputs that hash to the same value.

Collision attacks are the most common kind of attack against cryptographic hashes because they don't restrict the set of inputs over which the search for such a pair might be conducted. The reason collision attacks are relatively easy is because any pair of inputs that hash to the same value are acceptable; no restrictions are placed on what those inputs are, or what the value of the hash is.

Aside: The Birthday Problem

A related concept is the birthday attack, which is related to the birthday problem. The birthday problem is simple: how many people can gather at a party before it becomes probable that two share the same birthday?

You can find the number fairly easily by calculating the likelihood that, as the size of the party grows, two people won't share the same birthday. Your first guest can have any birthday at all without fear of sharing that birthday with another guest. But your second guest can have any day except the day of the first guest's birthday, leaving 364/365 possibilities. The third guest only has 363/365 possibilities, and so on. In imperative Python 3 code, it looks something like this:

# The number of guests
guestcount = 1
# The chance of all guests having unique birthdays
uniquechance = 1
# Keep adding guests until the chance of them all being
# unique is less that 50%
while uniquechance > 0.5:
    uniquechance *= (365-guestcount)/365
    guestcount += 1
print(guestcount)

This code returns 23, meaning that once you have 23 guests at your party, it's more than 50% likely that two guests will share a birthday. The number is surprisingly small because we haven't specified any day at which the collision will occur, in the same way that the 2005 attack on SHA-1 doesn't restrict collisions to any particular hash.

Preimage Attacks

Unlike collision attacks (and the birthday attack), for many practical applications of hashing, the value of the input and/or the hash matters. When the input to the hash function constrains the search for a collision, the hash must be broken using a preimage attack. There are two types of preimage attack:

  1. (Preimage Attack) Given a hash, generate an input that hashes to it.
  2. (Second Preimage Attack) Given an input, generate a second input that hashes to the same value as the given input.

A preimage attack only requires that the attacker find a single input that hashes to a given hash. Even so, it is a substantially harder attack to mount than a collision attack.

A second preimage attack adds an additional constraint: given an input and its hash, find a second input that hashes to the same value. Because there are more constraints on the solution, this attack is even more difficult.

So how broken is SHA-1?

So why did I stick with SHA-1? Well, the software I'm working on is concerned with identifying public keys using a digest of the key itself. When used in this context, such a digest is called a fingerprint.

OpenPGP uses a SHA-1-based hash to generate the fingerprint for a public key. One of the worst-case scenarios is that a user, given the fingerprint for key, attempts to retrieve it from a remote source and receives the wrong key, but with a fingerprint that matches, allowing an attacker to read encrypted messages intended for someone else. Since the attacker is given both a public key and its fingerprint and needs to generate another public key that has the same fingerprint, such an attack is a second preimage attack.

So, while SHA-1 is technically "broken" because of the 2005 collision attack that reduced the search space by a factor of 2000, there are still no known feasible preimage attacks (or second preimage attacks) on SHA-1 that weaken it when it is used for key fingerprinting. I was particularly hesitant to move away from SHA-1 for key fingerprinting using OpenPGP because it is part of the standard, and I don't have any interest in rewriting well-established cryptographic routines. I'd rather reuse existing standards with an eye toward making them accessible in new contexts. So, for now, SHA-1 it is.