Killring

Rick Dillon's Weblog

Programming With Units

Programs often handle numbers that are measurements. Measurements for the same dimension (time, distance, volume, pressure, etc.) can have many different units. There are two main drivers of this diversity:

  1. Granularity
  2. Fragmentation

Time has seconds, minutes, hours, days, weeks, months, and years because of the need for different levels of granularity. Distance has meters and yards, centimeters and inches because of fragmentation.

So, managing numbers with units has both intrinsic and incidental complexity.

Managing Units

The simplest approach is to ignore units.

DELAY_BEFORE_SENDING = 2

Is that days? What are we sending? A letter? Maybe it’s an email, and it’s in seconds. Maybe it’s a fax, and it’s in minutes.

An age-old way to handle this is to add a comment:

DELAY_BEFORE_SENDING = 2 # days

But this still creates ambiguity when the constant is used. If we want to reduce the delay because the letter is expedited:

EXPEDITED_AMOUNT = 12 # hours
expedited_delay = DELAY_BEFORE_SENDING - EXPEDITED_AMOUNT # Oops!

This works, but anyone who has to maintain that code is going to be sad. We can do better.

By baking the units into the name of the constant, we eliminate ambiguity when it is both defined and used:

DELAY_BEFORE_SENDING_DAYS = 2
EXPEDITED_AMOUNT_HOURS = 12

# Wrong!
expedited_delay = DELAY_BEFORE_SENDING_DAYS - EXPEDITED_AMOUNT_HOURS

By appending units, the error becomes obvious, and the appropriate conversion can be applied. This is a fantastic, simple solution to handling units with constants. Note, though, this doesn’t work as well with variables, where values can change.

Aside: More Robust Solutions

It bears mentioning that there are much more complex, robust solutions to this problem. In object-oriented systems, it is possible to create a different class for every unit, each having methods to convert between various units of the same dimension. One can even implement multi-dimensional reasoning, so that when you divide Foot by Second, you get FeetPerSecond, which has a method that seamlessly converts to MilesPerHour. This is a complex system to write with lots of API surface area, but makes maximal use of polymorphism to insulate developers from unit conversion. These systems are beyond the scope of this discussion because most general-purpose programming languages lack such a facility.

Enter: ActiveSupport

I work at a Rails shop, so ActiveSupport is everywhere. It uses a different approach to handling units. It is implemented in way that can cause confusion and errors when used with the constant naming outlined above.

ActiveSupport monkey-patches Ruby’s Numeric class with methods like seconds, minutes, hours and days. As the documentation says:

Enables the use of time calculations and declarations, like 45.minutes + 2.hours + 4.years.

This is a handy feature, though it’s not clear why Numeric gets patched for time, but not any other measurement (distance, area, speed, etc.)

When a developer who knows about this feature of ActiveSupport sees code that says:

DELAY_BEFORE_SENDING_DAYS = 2
EXPEDITED_AMOUNT_HOURS = 12

it becomes very tempting to ‘fix’ it:

DELAY_BEFORE_SENDING = 2.days
EXPEDITED_AMOUNT = 12.hours

Doing so reintroduces the problem of ambiguity during usage, though. One option is to do both:

DELAY_BEFORE_SENDING_DAYS = 2.days
EXPEDITED_AMOUNT_HOURS = 12.hours

This is where the trouble starts.

Inconsistent Semantics

What did that code actually do? Let’s check:

2.2.1 :001 > require 'active_support/all'
 => true
2.2.1 :002 > 2.days
 => 2 days
2.2.1 :003 > 12.hours
 => 43200 seconds

Note the inconsistency in how inspect is handling units. But what exactly is that days object?

2.2.1 :001 > 2.days.class
 => ActiveSupport::Duration

How does this object interact with Numeric?

2.2.1 :001 > 2.days - 5
 => 2 days and -5 seconds

ActiveSupport gives no warning at all that code is mixing values with units and those without. It seems that, internally, ActiveSupport::Duration is representing duration using seconds, though that’s not stated anywhere. It’s easy enough to test, though:

2.2.1 :001 > 2.days.to_i
 => 172800
2.2.1 :002 > 1.week
 => 7 days
2.2.1 :003 > 1.week.to_i
 => 604800

The free mixing of values with units and the use of monkey-patching creates very confusing and incorrect semantics:

2.2.1 :001 > 2.days
=> 2 days
2.2.1 :002 > 2.days.minutes
=> 10368000 seconds
2.2.1 :003 > 2.days.minutes.days
=> 10368000 days
2.2.1 :004 > 2.days.minutes.days.weeks
 => 6270566400000 days

Methods are often named after what they produce. In Python you might see int('5') or str(3). In Elixir, you’ll find String.to_integer("5"). ActiveSupport instead names its methods after the units they consume, which leads to the chaining you see above producing meaningless results. This also leads to the oddity where ActiveSupport monkey patches numeric with lots of methods named after units (fortnights, seconds, weeks, etc.) and then has a lone to_milliseconds method, which is named for what it produces rather than what it consumes (hence to prefix). Documentation indicates this was added to work well with JavaScript time functions.

Don’t Mix Systems

All of these issues arise because ActiveSupport’s implementation of durations is incompatible with any other system, despite the fact it issues no warnings when such usages occur, because that would violate duck typing. The end result is that when choosing how to handle units, it’s all or nothing: either use ActiveSupport throughout your entire codebase to handle durations, or use more general systems based on constant naming. If you mix them, you risk getting something like:

# In one class
DELAY_BEFORE_SENDING_DAYS = 2.days

# Somewhere in another class
EXPEDITED_AMOUNT_DAYS = 1
expedited_delay = DELAY_BEFORE_SENDING_DAYS - EXPEDITED_AMOUNT_DAYS

Which is the equivalent of:

2.days - 1

This returns a Duration that’s one second shy of two days, and, when used by code expecting a Numeric, will appear as the number of seconds in that duration, 172799.