Time. It’s an amazing song by Hans Zimmer. It’s an inexorable passage. It’s many things to many people.
Today, we’ll talk about the second one. And how’s it’s portrayed in programming languages.
In most programming languages, time is implicit, and subsequent instructions are “later” in time. Of course, compiler optimizations and hardware reordering can change when instructions actually execute, but from the perspective of the language it proceeds linearly. For example, in this C code:
int a = 5;
int b = 6;
b = a + b;
printf("%d\n", b);
Each statement executes after the other. This program prints out 11
.
This is how many languages are. But it’s not the only way to do it. Languages like Haskell and Prolog eschew the notion of time altogether, and say that there is no notion of time, not even an implicit one like in the languages we described above. This leads to confusion among many programmers when they need to make their purely functional language interact with the real world.
Because their programming language has a different representation of time than the platform (CPU) they’re targeting.
Of course, there exist targets for which neither of these representations is ideal. FPGAs are one such target. Time on an FPGA is not linearly sequential. But, also, it cannot be ignored: Much of what FPGAs do have to do with strict cycle-accurate timing. Output a pulse on this edge. Copy these bits all at the same time. Read these pins.
One of the two main languages for FPGAs is Verilog. Here’s some sample Verilog code (courtesy of Alchitry):
module servo (
input clk,
input rst,
input [7:0] position,
output servo
);
reg pwm_d, pwm_q;
reg [19:0] ctr_d, ctr_q;
assign servo = pwm_q;
always @(*) begin
ctr_d = ctr_q + 1'b1;
if (position + 9'd165 > ctr_q[19:8])
pwm_d = 1'b1;
else
pwm_d = 1'b0;
end
always @(posedge clk) begin
if (rst) begin
ctr_q <= 1'b0;
end else begin
ctr_q <= ctr_d;
end
pwm_q <= pwm_d;
end
endmodule
This code performs a fairly simple function. It takes three input parameters, clk
, rst
, and position
, and outputs a single bit servo
which is high for anywhere between roughly 50,000 to 100,000 positive edges of clk
.
That is, imagine clk
is going up and down repeatedly. If position
is 0x0
(i.e., eight zero bits), then servo
will be high for 50,000 ticks of clk
, and low for the rest of the one million tick cycle. If position
is 0xff
(i.e. eight one bits), then servo
will be high for 100,000 ticks, and low for the rest of the cycle.
Simple enough. There are two different assignment operators: <=
and =
, and these represent two different concepts of time. =
is what’s called a “blocking statement”: These statements execute one after another, like in C. <=
is non-blocking: These execute simultaneously within a block. For example:
ctr_d = ctr_q + 1'b1;
if (position + 9'd165 > ctr_q[19:8])
pwm_d = 1'b1;
else
pwm_d = 1'b0;
The if
statement uses the value of ctr_d
that equals ctr_q + 1'b1
. However, for this code:
if (rst) begin
ctr_q <= 1'b0;
end else begin
ctr_q <= ctr_d;
end
pwm_q <= pwm_d;
these assignments all happen simultaneously.
So in order to use Verilog effectively, you must constantly switch between two different concepts of time as you program. Additionally, you must consider the effect of the always
condition: always @(posedge clk)
means that the block happens only at a clock edge. always @(*)
means the block is always happening.
So you must not only switch between different types of time, you must switch between different… times. Statements can be happening one after another, on clock edges. Or they can be happening simultaneously, always. Or a combination: Some statements can be happening before other statements, while some statements are happening simultaneously.
And what do we gain for this complexity? State. This construct:
always @(*) begin
variable_d = variable_q + 1;
end
always @(posedge clk) begin
variable_q <= variable_d;
end
Is how state is implemented. At any given point, variable_d
equals variable_q + 1
. Exactly on a clock edge, variable_q
is set to be what variable_d
is (i.e., variable_q + 1
). So combined, variable_*
is incrementing.
So here’s a quiz for you. If you were to add more code to this - should you use variable_d
, or variable_q
? What are the implications of either one?
I don’t know either. Let’s compare to how C increments a variable:
int variable = 0;
while (1) {
variable += 1;
}
Simple! Just keep incrementing the variable forever! variable
always has the current value.
Or at least, so said the industry that pushes “high-level synthesis” or HLS, which is compilers which translate C or C++ (or other “high level” languages) into HDL. They want to tap into the vast pool of C programmers to write FPGA code.
Except it isn’t that easy. I maintain that any programmer worth their salt can, in the span of months, pick up a new language as long as they understand the underlying fundamentals.
But if you don’t understand the underlying fundamentals, it doesn’t matter whether you know the syntax, you won’t be an effective programmer. All that is to say, HLS is dumb.
We need something better. And we can do that. In a world of discrete, digital logic, made of clocks and their domains, we can steal syntax from the mathematics world and express our above recurrence (that the variable keeps incrementing) like this:
variable[T] = variable[T-1] + 1;
No need to track different variable_d
and variable_q
variables. Additionally, with multiple variables, there’s no confusion about which order statements “run” in:
counter[T] = counter[T-1] + 1;
variable[T] = counter[T]; // Versus counter[T-1]
Since HDLs actually have no side effects, everything can be purely functional. For example, if
statements could return values:
// This counter counts up to 100
counter[T] = if counter[T-1] == 100 {
0
} else {
counter[T-1] + 1
};
Of course, we want to retain the closeness to hardware, and the variable declarations would represent that:
// counter is a 8-bit counter
bits[7:0] counter;
counter[T] = counter[T-1] + 1;
and you would need to be able to sub-index variables:
bits[7:0] counter;
bits[15:0] variable;
counter[T] = counter[T-1] + 1;
// Duplicate counter in the high and low bytes of variable
variable[T][7:0] = counter[T];
variable[T][15:8] = {counter[T][3:0], counter[T][7:4]}; // Swap the nibbles of the high byte
So with this, how do we implement the servo code from above? Maybe something like:
module servo (
// clk is the clock
clock clk,
// rst is a single bit input
bit rst,
bits[7:0] position,
) -> ( // Outputs are specified in a functional "return" of sorts
bit servo,
);
bit pwm;
bits[19:0] ctr;
servo[T] = pwm[T-1];
ctr[T] = if rst {
0
} else {
ctr[T-1] + 1
};
// Note that pwm[T] is based on ctr[T]
pwm[T] = position[T] + 9'd165 > ctr[T][19:8];
endmodule
In terms of strict lines of code, we’re down from 31 lines to 21 lines - a 30% reduction, even with three additional comments added. We have simpler variables. All the code that affects a variable is in one place: It isn’t spread out between multiple always
blocks. There’s a single system of time used throughout: All state updates every “tick” of the clock. It’s clear whether variables depend on “this” cycle’s outputs or the previous’s.
Of course, there are potential disadvantages to this syntax too. For example, you could imagine something like this:
servo[T] = pwm[T-1000];
This is a single line of code. Yet, it generates a thousand flip-flops! So the ratio of lines-of-code to synthesized area can easily be quite large. But on the flipside, this allows writing FIR filters very intuitively:
output[T] = a[T] * input[T] + b[T] * input[T-1] + c[T] * input[T-2];
Above we defined a module. But what might using that module look like? Well, everything is an expression, so it might look like:
module main(
clock clk,
bit rst,
) -> (
bit servo_output,
);
bits[27:0] position;
position[T] = position[T-1] + 1;
servo_output[T] = servo(clk, rst, position[T][27:20]);
endmodule
if we wanted the position to ramp up and wrap around after ~268 million cycles.
Of course, there’s a lot more to be defined to make a workable hardware description language, and a lot of trial and error along the way. For example, what’s the right way to handle state machines? How should multiple clock domains be handled? What about user-defined types - can a user define a struct like this?
struct my_struct {
bits [7:0] counter;
bit is_upcounting;
}
module update_mystruct(
clock clk,
bit rst,
my_struct counter,
) -> (
my_struct counter_out,
);
counter_out[T].is_upcounter = if rst {
0
} else if counter[T] == 100 {
0
} else if counter[T].is_upcounter == 0 {
1
} else {
counter[T].is_upcounter
};
counter_out[T].counter = if rst {
0
} else if counter[T].is_upcounter {
counter[T].counter + 1
} else {
counter[T].counter - 1
};
endmodule
// Usage:
bits[7:0] count;
my_struct foo;
foo[T] = update_mystruct(clk, rst, foo[T-1]);
count[T] = foo[T-1].counter;