Programming for Maintainability 19, The True Hungarian

So there’s this variable naming convention called “Hungarian typing”, which defines itself by naming variables according to their type. The classic example is:

int iPotatoCount = 53;
const char* sPotatoName = "Russet";

With the upside that where the variable is used, you can clearly see the type:

printf("Number of potatoes is %d\n", iPotatoCount);

Which is not super useful for two reasons: One, modern text editors will just directly tell you the type if you hover over the variable, and two, your functions should all be short enough that all of your arguments and local variables should appear on the same screen as whatever code you’re looking at (or same page, if you print out your code).

Of course, this is only one kind of Hungarian notation, known as Systems Hungarian, so named because it was used by the Windows division at Microsoft. The systems team got the idea from the Microsoft Apps division which developed Apps Hungarian[1], with one crucial difference: Instead of using the literal datatype to inform the naming of the variable, you use the semantic kind of the variable to inform the naming of the variable. Joel Spolsky’s example was that of “safe” (i.e., HTML escaped) and “unsafe” strings:

const char* usCommentTitle = getUnsafeValueDirectlyFromUser();
const char* sCommentTitle = sanitize(usCommentTitle);

Both of these are const char* data types, but they have semantic differences which are not encoded in the type system.

With this in mind, let’s look at today’s code:

def wait_for_pll_lock(self, pll, timeout=2000):
    """
    Wait for the PLL to lock, with the given timeout in ms
    """
    self.send_pll_config()

    # Now poll lock status until timeout
    end_time = time.monotonic() + timeout
    while time.monotonic() < end_time:
        time.sleep(0.1)
        if self.check_plls_locked(pll):
            return True

    self.log.debug('{} not reporting locked after {} ms wait'.format(pll, timeout))
    return False

Can you see the bug? That’s right, the timeout is passed in as milliseconds (2000 is a lot of seconds), but inside the function it’s treated as seconds. Setting aside that Python doesn’t even enforce the data type, it’s clear that though timeout has the same type everywhere (well, integer or float), there are two different semantic types being used: seconds and milliseconds.

In Hungarian typing, the fix would be to rename the argument to something like ms_timeout, so it’s clearer that the timeout is in milliseconds. This still isn’t perfect though - the pre-existing time.monotonic() function doesn’t follow out ms_ or s_ convention, so it’s not immediately clear without outside knowledge that time.monotonic() + ms_timeout is invalid.

Python supports classes, so one could imagine making a class specifying a duration like so:

class Milliseconds:
    def __init__(self, milliseconds):
        self._ms = milliseconds

    def to_secs(self):
        return self._ms / 1000

which you could then use alongside similarly defined Second and Microsecond classes, writing a wrapper around the builtin time.monotonic() to return a Seconds object and so forth. But, this costs runtime overhead, and it also just moves the complication of getting things right into your time objects. You still have to get things right.

But some languages support a so-called “new type” idiom, such as Rust (surprise!):

struct Seconds(f32);
struct Milliseconds(f32);

fn add(a: Seconds, b: Milliseconds) -> Seconds {
  a + b
}

fn main() {
    println!("{:?}", add(1.0, 2.0));
}

here we’ve defined two types, Second and Milliseconds, which are each just a single f32 under the hood (no added runtime overhead). Because these are different, new types, this code doesn’t compile in four different places.

The first is where we do a + b, the compiler complains that there is no implementation of Add for Seconds and Milliseconds.

The second and third are where we pass the undecorated 1.0 and 2.0 to the add function. Because these numbers are undecorated, the compiler doesn’t know whether we intended them to be Seconds or Milliseconds, so we have to explicitly specify.

And the last is where we print out the value, because we haven’t defined a way to print out Seconds.

In order for the user to operate with these types, we must define a type-safe way to interact with these types.

And I would implement a perfectly valid set of these duration, and maybe someday I will, but somebody already did that and I don’t want to encourage re-inventing the wheel in a blog about software engineering. Rust defines a Duration type which behaves how you expect, the above code could be written as:

fn wait_for_pll_lock(&mut self, pll: Pll, timeout: Duration) -> bool {
    self.send_pll_config()

    let start = Instant::now();
    while start.elapsed() < timeout {
        std::thread::sleep(Duration::from_secs_f32(0.1));
        if self.check_plls_locked(pll) {
            return true;
        }
    }
    false
}

which works like a charm: both inside the function, the timeout parameter cannot be misused (unless you really really try), and there is no ambiguity to the caller about whether they should be passing seconds or milliseconds. They should be passing a Duration type, or else they’ll get a compile error. (and remember: The caller may not be you. It may be somebody in three years who’s never met you, and has until now zero experience with your fancy PLL timeout function)

This is, of course, a testament both to the power of strict typing, as well as the power of well-designed standard library APIs. A lot of this benefit comes about because the library was thoughfully designed with this use case in mind - you can pass a Duration to sleep, Durations behave rationally when compared to Instants, and so forth.

[1] https://en.wikipedia.org/wiki/Hungarian_notation

This blog series updates weekly at Programming for Maintainability