Programming for Maintainability 8, With Love

So I was working on some driver code for a device the other day, and came across this interesting bug. The device has a clock rate that a user can specify, they can choose from one of several options (call them 10MHz, 11MHz, 40MHz, and 44MHz). The user can also pre-configure a “multiplier” option on their device (by loading a different FPGA bitfile), which has two options: x1 and x4. As you might’ve guessed, the 10MHz and 11MHz selections only work when the multiplier is set to x1, and the 40MHz and 44MHz options only work when the multiplier is set to x4.

So, we added this code to where the user sets the clock rate:

PLL_FREQS = {10e6: 10, 11e6: 11, 40e6: 10, 44e6: 11}

def set_clock_rate(self, desired_clock_rate):
    if not self.is_valid_clock_rate_and_mult(desired_clock_rate, self.multiplier):
        raise RuntimeError(f"Invalid clock rate {desired_clock_rate} for multiplier {self.multiplier}!")

    self.pll.set_clock_rate(PLL_FREQS[desired_clock_rate])
    self.clock_rate = desired_clock_rate

(“PLL” stands for “phase-locked loop”, which isn’t important except to say that it’s how we configure the actual hardware)

It turns out that this function, set_clock_rate, is useful for re-setting the clock after certain events, so we also call it in:

def set_clock_source(self, source):
    ... # Setup the clock source
    self.set_clock_rate(self.clock_rate) # Re-set the clock

All’s well and good, except you say - wait a second - this is hardware, so what about when the device is initialized? And therein lies the problem:

DEFAULT_CLOCK_RATE = 10e6 # 10MHz

def init(self):
    ...
    self.clock_rate = DEFAULT_CLOCK_RATE
    self.pll.set_clock_rate(PLL_FREQS[self.clock_rate])
    ...

Uh-oh. So if the user has pre-configured their multiplier to the x4 option, this will break.

…except, it’s important to note that the actual clock settings for the corresponding frequencies (10MHz/40MHz and 11MHz/44MHz, respectively) aren’t different. If the multiplier is pre-configured to x4, this will work just fine. Unless they call set_clock_source, in which case set_clock_source(10MHz) will be called, which will then error.

It’s also important to note that set_clock_source isn’t something you call unless you have special hardware, such as an external clock. Us plebian software developers never have a need to call it.

The fix is pretty obvious - choose the default clock rate based on the multiplier setting - but this isn’t the blog Programming for Hot-Fixing This One Bug, this is Programming for Maintainability. We want to eliminate entire classes of problems.

Off the cuff, we can think about a few different avenues we could consider to solve this problem. One is that since the underlying PLL hardware is x1/x4-agnostic, the API itself could also be agnostic. That is, the user could select from the options 10MHz or 11MHz, with the preconfigured multiplier determining the actual clock frequency. This has the advantage that there is no invalid setting, obviating the need to check for the error at all. Of course, if not all combinations are supported (for example, maybe 11MHz doesn’t work with the x4 multiplier), then you will still need to check.

If you did still need or want the check, you could make the PLL class multiplier-aware. That is, make the call:

self.pll.set_clock_rate(self.clock_rate, self.multiplier)

but, this now burdens the PLL code with having to know about the multiplier, which it didn’t before.

At a high level, the problem here is that there are two ways to “set the clock rate” on the device: There’s the direct method (directly configuring the PLL) and there’s the API method (calling set_clock_source). One of the tenets of Python is that “there should be one - and preferably only one - obvious way to do it.” So a third option might be to create a middleman class:

class CheckedPll:
    def __init__(self, pll, multiplier):
        self.pll = pll
        self.multiplier = multiplier

    def set_clock_rate(self, desired_clock_rate):
        if not self.is_valid_clock_rate_and_mult(desired_clock_rate, self.multiplier):
            raise RuntimeError(f"Invalid clock rate {desired_clock_rate} for multiplier {self.multiplier}!")

        self.pll.set_clock_rate(desired_clock_rate)

and then use this class in both of the instances above. This has the advantage that there’s only one obvious way to do it, but has the glaring disadvantage that it’s just a lot of code, especially if the PLL class has a lot of other methods, many of which may not care about the multiplier.

What’s the right thing to do here? It depends. If you can change the API, that’d be my first option. But, the moral is, whenever we write code we consider whether there are multiple ways to do what we’re trying to do, and if there are, try to eliminate the incorrect contenders.

This blog series updates every week at Programming for Maintainability