So I play this game, World of Warships:
It’s a fun game. You drive around and shoot at boats and such. Highly recommend it (despite the fact that their marketing department is seemingly high on drugs.)
Like many other games, it generates replay files. A small community has sprung up around building programs that parse the replays.
Mine is here. There are many like it, but this one is mine.
I was recently rewriting most of it, so I thought I’d take the opportunity to talk about something interesting I noticed. Take a peek at this C++ struct:
struct RpcValue {
virtual ~RpcValue() = default;
}
struct Uint32 : public RpcValue {
uint32_t _value;
}
struct String : public RpcValue {
std::string _string;
}
struct FixedDict : public RpcValue {
// A series of key/value pairs, indexed by string
std::map<std::string, std::shared_ptr<RpcValue>> _elements;
}
struct RpcMethod {
std::string _method;
std::vector<std::shared_ptr<RpcValue>> _args;
}
Nothing too fancy. The replay files essentially are a dump of the packets sent between the game client and server. Some of the packets are RPC messages, so an RpcValue
represents a value used as an argument for these messages (it can take on many forms - here I only show String
and FixedDict
). The actual RPC endpoints (i.e., what methods take what parameters) are determined dynamically by parsing XML files that ship with the specific game version, which is why we have this dynamic polymorphic system.
As a concrete example, one RPC call is the onRibbon
method, which occurs when you earn a ribbon. From the game XML, we can determine that this method takes a single argument, a uint32_t
. So when we make a RpcMethod
that represents this, we set the _method
field to “onRibbon” and the _args
field to a vector containing a single Uint32
.
Then, later down the line, we can unpack this and format it as JSON or whatever. But here, within the packet parsing code, we can decode it as such.
The astute reader will note that this is a personal project, and since I have a full-time job and only a finite amount of spare time I wouldn’t waste it dealing with C++, and that surely it must be in Rust. The astute reader who clicks on links while reading the article will have already noted that the project is in fact in Rust.
And good thing, too. Look at how many shared_ptr
s there are. In order to hold any RpcValue
, you have to hold a pointer to it - in order to collapse it to a specific type, you have to dereference that pointer and then dereference a different pointer to lookup the vtable. This code:
struct RpcValue {
...
virtual uint32_t get_u32() = 0;
};
struct Uint32 : public RpcValue {
...
uint32_t get_u32() override {
return _value;
}
};
uint32_t value_to_u32(std::shared_ptr<RpcValue> value) {
return value->get_u32();
}
compiles to dereferencing the shared pointer, dereferencing that pointer, then dereferencing that pointer and extracting the value:
Uint32::get_u32():
mov eax, DWORD PTR [rdi+8]
ret
value_to_u32(std::shared_ptr<RpcValue>):
mov rdi, QWORD PTR [rdi]
mov rax, QWORD PTR [rdi]
mov rax, QWORD PTR [rax+16]
cmp rax, OFFSET FLAT:Uint32::get_u32()
jne .L8
mov eax, DWORD PTR [rdi+8]
ret
.L8:
jmp rax
(as always, huge kudos to godbolt.org)
Three pointer dereferences.
And if the programmer is wrong (and the value isn’t a Uint32
), it aborts.
Let’s get back to the actual code. What I wrote in Rust was:
enum RpcValue {
Uint32(u32),
String(String),
FixedDict(HashMap<String, RpcValue>),
}
before we go any further - note how we’ve already compressed 15 lines into 5.
Looking up this value would look something like:
fn value_to_u32(value: &RpcValue) -> u32 {
match value {
RpcValue::Uint32(v) => *v,
_ => panic!("Not a u32!"),
}
}
which compiles to a much simpler:
example::value_to_u32:
push rax
cmp dword ptr [rdi], 0
jne .LBB7_1
mov eax, dword ptr [rdi + 4]
pop rcx
ret
.LBB7_1:
call std::panicking::begin_panic
ud2
Dereference the argument to see the type. If it’s a valid Uint32
, dereference the argument to extract the value. Done - no need to pay for what you don’t want.
(The C++ enthusiast is now saying “well, but that’s unfair! C++ supports unions which could do what you want!” writing the above code, including using std::map
, using unions is left as an exercise to the reader)
But this isn’t even why I’m thinking about this code, I’m thinking about it for a different observation. Note, especially, the method name in RpcMethod
. In practice, there will be hundreds of thousands of packets, but there are only one or two hundred method names. That means there will be thousands of RpcMethod
s all with the string “onRibbon” copied into them. Same with the key value of the FixedDict
- the keys are all defined in the XML, and there are only a small number of possible values, all known at load time.
Rust has this neat feature, called lifetimes, which allow you to annotate references with the “lifetime” of that reference. So, for example, I could change the Rust enum above to:
enum RpcValue<'a> {
Uint32(u32),
String(String),
FixedDict(HashMap<&'a str, RpcValue>),
}
Notice the first part of HashMap
, the &'a str
. This is saying “this type is a reference to a str, and the reference must have lifetime 'a
, and therefore RpcValue
must not exceed that lifetime”. Before, the HashMap
key was its own heap-allocated string.
And, equivalently, for the RPC calls:
struct RpcCall<'a> {
method: &'a str,
args: Vec<RpcValue<'a>>,
}
When I parse the files, I first create an object to parse all the strings out of the XML. Then I constructed the RpcMethods
something like this:
return RpcCall {
method: methodTable[idx].method.clone(),
..
}
but, with the new string references, I can simply do:
return RpcCall {
method: methodTable[idx].method,
..
}
Notice how this is less code.
I know you’re thinking, is this helpful?
Before I made this change, parsing this file took ~140ms. Of this, 40ms was decrypting and uncompressing the file, leaving ~100ms for the actual generation of packets.
After this change, parsing this file took ~100ms: Or only ~60ms for parsing. Shaving a third of the time off, for no net change in line count, with no added risk for bugs (due to the compiler’s ability to check lifetimes).
And there’s no real way to represent this in C++. I mean, I could use a char*
:
struct RpcMethod {
const char* _method;
}
but you, the programmer, have to make sure that where _method
points doesn’t get freed until you’re done with all of your RpcMethod
s. The compiler won’t help you with that. And if you mess it up? It might even still work, if nothing’s overwritten the memory. Or maybe something overwrote the memory and it’s garbage now. You just don’t know.
So, the moral of all this is, language features don’t just make your programmers happy. They amplify your abilities as a programmer, and a carefully crafted language can directly increase performance for zero cost (if not net gain).
(Rust, of course, being famous for being so optimizable that it finds bugs in LLVM, finding corner cases which the optimization code didn’t anticipate happening)
This blog series updates at Programming for Maintainability