Introducing: bdmCore

When I moved out, I thought I was pretty much done with hardware projects. I hadn’t done any in the year prior, but I had done a lot of software projects. So when I moved, I left behind a lot of my bulky hardware, like my oscilloscope.

As anyone who follows my blog knows, that prediction turned out wrong.

It turns out that PCB production costs have fallen, and my salary has risen, to the point where I can afford to custom-make PCBs. jlcpcb.com has a deal where you can build ten boards (under 100mm by 100mm) for $2 + shipping, which if you’re willing to take the slow boat (which took a full month for me!) comes out to a grand total of only $10, or a dollar a board.

This opens the door to a variety of possibilities. In order to learn about making PCBs, I invented a small toy project: The driver timer. The driver timer is a button and a LED that you attach to your steering wheel. You push the button, and two seconds later the LED flashes. You use it to time the distance between you and the car in front of you: You should maintain a two second distance. I’ll discuss that in more detail in a future post.

The important thing for this post is, the driver timer is driven by a microcontroller. When I was growing up, before the “maker” movement got mass audiences, I programmed PIC microcontrollers (to this day I have a few PIC16F84s lying around). These were nice, but required Microchip’s proprietary and expensive programmers. I want to avoid proprietary systems as much as possible.

After PICs, when I next encountered microcontrollers they were in the form of assembled green boards with ARMs that plugged into your computer via USB. These don’t require expensive programmers, but it’s difficult to integrate them into applications because of their pre-assembled form.

So who else makes microchips? Turns out NXP (formerly Freescale) does. (I know someone whose Dad works there, so I knew they did something involving microchips) Mouser will sell you an 8-pin SC9RS08KA2 microcontroller with 4 I/O, 63 bytes of RAM, and 2KB of flash for just over $1.

And, best of all, it has an open specification for their in-circuit programmer/debugger. All I had to do was build it.

Introducing: bdmCore.

bdmCore is (almost) everything you need to program and debug a RS08 (or, as far as I know, any S08-derived core that speaks the Background Debug Protocol). It’s written in a platform-agnostic way, so that it can run on any FPGA. The tradeoff is that you then have to adopt it for your favorite FPGA. (as long as you can hit roughly 50MHz timing, you’re set)

I used a Mojo v3, because that’s what I have. If you just want a Mojo bitstream, hit me up and I can send it your way.

The protocol is setup using four wires: power, ground, BKGD, and RESET/Vpp. (BKGD is the “one wire” in the “one wire protocol”) bdmCore assumes control of all four of these wires, as such there are three control lines you need in your programmer schematic: Power control, Vpp control, and BKGD. BKGD needs to have a pullup to power. Vpp should be 12V (check your datasheet).

Here’s the schematic I used:

Note that you only need Vpp for programming the flash. If you only want to debug, you don’t need that line (or a 12V supply) at all.

The bdmCore FPGA firmware speaks a serial protocol, on the Mojo v3 this means that you talk to it through the USB serial connection.

I won’t delve into the details of how all this works at the low level. But let’s instead do a walkthrough of the code for my carflasher project firmware, to give an idea of how to program and debug a device.

import itertools
from rs08asm import rs08asm
from mc9rs08kaX import *

Bah, standard include stuff to get the compiler and the device. We’ll circle back on this. In fact, let’s start much, much lower in the code, at line 91:

device = mc9rs08kaX(2)
p = rs08asm(device)

This configures what device we’re using (the “2” variant of the mc9rs08kaX device) and initialized a program in that device, “p”. The device is what gives us all of the clever string labels we’re going to use later for registers and memory locations, like “.RESET” and “.PTAPE”

p.at(".RESET")
p.jmp("start")

RS08 devices, when they start up (“power-on reset”), begin executing the instruction at memory address 0x3FFD, which because we set the device, we can reference using the handy label “.RESET”. If you’re used to assembly, and don’t recognize this, rest assured - these commands do in fact correspond to assembly instructions. We’re just using Python to wrap the assembly (and to do other cool things, as we’ll see later).

Since 0x3FFD is the last address in memory, the first thing we want to do is jump to start. Hence p.jmp(“start”).

p.at(".FLASHBASE")
p.label("start")

Here’s where “start” is defined. It’s wherever “.FLASHBASE” is for our chip.

# Set the trim NOTE! This value will vary per chip
p.movi(98, ".ICSTRM")
p.movi(0x01, ".ICSSC")

# Set the prescaler to divide by 8
p.movi(0xC0, ".ICSC2")

Curious what the possible instructions are? They’re enumerated in rs08asm.py. The first value in the tuple is the python function name, the last is the layout of the bits, and the ones in the middle are the arguments to the function. Here’s the possible argument types:

8i - 8-bit immediate value, can be a label.
4a/5a/8a - A 4/5/8-bit address, can be a label.
n - A literal value between 0 and 7, inclusive. Cannot be a label.
rel - A relative address.

For semantic meaning, reference your local datasheet. Note that some otherwise ambiguous instructions have a “i” or an “a” suffix, to indicate it is the “immediate” or the “accumulator” version, respectively. So:

p.mov(98, ".ICSTRM")

Moves the data at address 98 to the ICSTRM register, whereas

p.movi(98, ".ICSTRM")

Moves the value 98 to the ICSTRM register.

The above code is setting up the clock. The internal reference clock can be trimmed. But, of course, the question is, how do you get the trim?

Fortunately, with bdmCore, you can use writeByte commands to set the trim and read back the clock speed indirectly by reading the sync time. This is already packaged for you in sw/trimmer.py in that project. As a fun tangent, you can make a graph of the clock speed versus the trim value, and you get something like:

(graph made with MC9RS08KA2 chip with prescaler set to /2)

Interestingly, not linear. Also shows that at very high speeds (low trim values) the clock becomes unstable. As you can see you get fairly decent precision, with a /2 clock frequency ranging from 3.25MHz to 7.25MHz (a maximum of about 6.5MHz to 14.5MHz).

Anyway, so I ran the trimmer on my chip and got 98 as the optimal value, so that’s what I set.

# Kill the watchdog, enable stop mode, enable BKGD and RESET
p.movi(0x23, pageAddr(p, ".SOPT"))

# Disable & acknowledge low voltage detect
p.movi(0x40, pageAddr(p, ".SPMSC1"))

# Setup the I/O direction and pulldown
p.movi(0x20, pageAddr(p, ".PTAPUD"))
p.movi(0x20, pageAddr(p, ".PTAPE"))
p.movi(0x13, ".PTADD")

More routine setup instructions. If you’re still reading this, it’s because you love poring over datasheets, so go pore over a datasheet!

But notice the pageAddr() call. p.movi() maps to the “mov” instruction under the hood, taking an immediate value and an address. Above we used string values to refer to registers, now we’re using a function?

So you can pass a raw number to movi if you want, but (because of the instruction set) that number has to be less than 256. “.SOPT” is in the register bank residing at 0x200, well above 256, so we can’t simply call p.movi(…, “.SOPT”).

The RS08 architecture has a “paging window”, meaning that there is a fixed set of addresses (0xC0 through 0xFF) which, when read, return the value at some offset in memory specified by the “.PAGESEL” register. Let’s go look at the pageAddr() function:

# Note: This changes .PAGESEL!
def pageAddr(p, addr):
    addr = p._labelOrLiteral(addr)
    if addr < 256:
        raise Exception("You don't need to use pageAddr to access %s"%addr)
    p.movi(addr>>6, ".PAGESEL")
    return 0xC0 + (addr&0x3F)

Aha! So this function goes and resolves the addr you pass it to a number using _labelOrLiteral. Then, it sets the “.PAGESEL” register with the high bits of the address you want, effectively setting the page you want to access, and then it returns the address within the paging window that points to where you want to access.

(Interestingly, this means that if addr is a user defined label that the user hasn’t defined yet, this will error. That’s a bug. Hmm…)

So, if you call pageAddr(p, “.SOPT”), the “.PAGESEL” register will be set so that the paging window points to 0x200, and the address returned will be the location of “.SOPT” within this paging window.

That’s just the beginning of the clever Python code generation. Next we call the function mtimWaitSpan:

# Light each of R,G,B for 1/2 second
mtimWaitSpan(p, (1,0,0), 64, 0.5)
mtimWaitSpan(p, (0,1,0), 128, 0.5)
mtimWaitSpan(p, (0,0,1), 255, 0.5)

mtimWaitSpan lights up the given color of LED for the given duty cycle (out of 256) for the given number of seconds.

I’m going to cut to the chase and say that mtimWaitSpan is a thin wrapper around mtimWait:

mtimWaitInstances = set()
def mtimWait(p, color, brightness, timeSeconds):
    if brightness < 1 or brightness > 255:
        raise Exception("Brightness must be a reasonable value")

    numIters = int(timeSeconds * 256 / MTIM_PERIOD)
    if numIters < 1 or numIters > 255:
        raise Exception("'%s' is out of range with '%s' iters"%(timeSeconds, numIters))

    fnname = "mtimWait_%s%s%s_%s_%s"%(color[0], color[1], color[2], brightness, numIters)
    p.jsr(fnname)
    mtimWaitInstances.add((color, brightness, timeSeconds))

What’s happening here? First some error checking, but that isn’t interesting. Then we set the fnname variable to some long string. That string is basically a hash of the function call - so different invocations get different strings, unless they have the same arguments in which case they get the same string.

For example, the call:

mtimWaitSpan(p, (1,0,0), 64, 0.5)

Gives us a fnname of “mtimWait_100_64_somenumber”.

What do we do with the fnname? We JSR (Jump SubRoutine) to it.

But it isn’t defined! Ah, but we add it to the variable mtimWaitInstances, also. Then, in this code at the bottom:

for c,b,t in mtimWaitInstances:
    mtimWaitCode(p, c, b, t)

is where we define all of the fnnames, by calling mtimWaitCode on all of the mtimWait functions we called. Let’s look at mtimWaitCode now:

def mtimWaitCode(p, color, brightness, timeSeconds):
    numIters = int(timeSeconds * 256 / MTIM_PERIOD)
    fnname = "mtimWait_%s%s%s_%s_%s"%(color[0], color[1], color[2], brightness, numIters)

    # Accumulator counts down from numIters
    p.label(fnname)

The first two lines are just repeating the logic to compute fnname, because I’m a lazy programmer. Then we have p.label(fnname) - this is the magic that defines the start of the subroutine we jump to.

This is basically a hacky homebrew linker. One could think of mtimWaitCode as generating an entire library of functions, and mtimWait is generating function calls into that library. Then the for loop at the bottom is inserting the library code (generated by mtimWaitCode) at the end of the program.

And that’s the sort of cleverness that writing assemblers with Python as a framing language allows. You’ll notice that the rest of the mtimWaitCode is coded at compile time - no RAM is used to pass arguments, for example. In fact, let’s just run through that code real fast:

p.movi(0x04, ".MTIMCLK") # Divide the bus clock by 16
    p.ldai(numIters)

The MTIM is the modulo timer. It runs off of the clock with a prescaler. There’s a register you can set so that when the counter hits the value in the register, it sends an interrupt. This code just configures the prescaler, and loads numIters into A (notice the “i” suffix to “lda” - p.lda(numIters) has a different meaning).

Note also that numIters is calculated at compile time, so is resolved to a number when this code is compiled. As far as the assembly is concerned, it’s an immediate value.

This code has a loop, which for numIters iterations will

Turn on the LED
Wait for “brightness” counts on the modulo timer
Turn off the LED
Wait for 256-“brightness” counts on the modulo timer, so that the total time inside the loop is 256 timer counts.

Or, in code:

p.label(fnname+"_loop")

    # Run the mtimWait inner loop once
    # Turn on all LEDs
    setColor(p, color)

    # Configure the MTIM for the uptime
    p.movi(brightness, ".MTIMMOD")
    p.movi(0x60, ".MTIMSC")

    p.wait()

    # Clear the LEDs
    p.movi(0x00, ".PTAD")

    # Configure the MTIM for the downtime (writing to MTIMMOD also clears TOF)
    p.movi(256-brightness, ".MTIMMOD")
    p.movi(0x60, ".MTIMSC")

    p.wait()

    # Clear TOF and disable the MTIM
    p.movi(0, ".MTIMMOD")
    p.movi(0x00, ".MTIMSC")

Nothing too fancy. Note that after waiting, an interrupt will continue execution after the wait instruction.

p.deca()
    p.cmpi(0)
    p.bne(fnname+"_loop")
    p.rts()

This is where we decrement A, check if it’s zero (again! note the “i” suffix), if it’s not zero loop around again, and if it is zero then ReTurn from Subroutine.

I’ll leave parsing the rest of the assembly as an exercise for the reader.

Let’s take a brief look at the chunk at the bottom, though:

if __name__ == "__main__":
    print(p.assemble()[1])

    def doProgram():
        from programmer import Programmer
        programmer = Programmer(device)
        print(programmer._slice(p))
        programmer.program(p)

    doProgram()

After you’ve added all of your instructions to your program p, you call p.assemble(), which returns to you two things: Your memory map, and some stats. The memory map is a dictionary where the keys are addresses and the values are the memory set at that address.

The stats looks something like:

{
   'flashUsed': 347,
   'flashAvail': 2048,
   'flashRows': {
      0, 1, 2, 3, 4, 5, 31
   }
}

Basically, telling you how much flash is used and which rows are used.

The doProgram function takes this data and passes it to a Programmer instance, which contains all of the logic necessary to mass erase your device and program it. (for the curious, programmer._slice(p) returns the rows that will be programmed and the data they will be programmed with)

And voila! You have a programmed chip.

Now, once it’s programmed, all you have to do is debug it…