if (regex_match(portIterator->second, boost::regex("0*([0-9]{1,4}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5]);"))) {
...
}
if you look at it hard enough, you can tell what the regex does. But, if you’re not familiar, it comes with the comment:
// check the RPC port number which is an unsigned 16-bit integer. The value ranges from 0-65535
(also, I highly recommend the website regex101.com, if you don’t have it already bookmarked)
Obviously, regex isn’t anybody’s first choice to see if a number is an integer. C++ even has functions that could help:
const int port_no = std::stoi(portIterator->second);
if (port_no <= 65535) {
...
}
and we could go on all day about different ways to solve this otherwise-mundane problem.
But the really interesting question is, what set of design constraints made this seem like the right solution? It’s rare that we write code in a vacuum, usually we’re writing code in a codebase that already exists. Usually even a codebase that we didn’t ourselves write originally.
This code comes from a parameter validation section of the code, validating string parameters coming from the user before passing them (as strings!) to the layer below. The code looks something like this:
if (regex_match(parameter1, boost::regex("..."))) {
output_params["parameter1"] = parameter1;
}
if (regex_match(parameter2, boost::regex("..."))) {
output_params["parameter2"] = parameter2;
}
if (regex_match(parameter3, boost::regex("..."))) {
output_params["parameter3"] = parameter3;
}
if (regex_match(parameter4, boost::regex("..."))) {
output_params["parameter4"] = parameter4;
}
call_inner_api(output_params);
etc etc. (obviously the names aren’t numbered like that, they have actual names)
And goodness! If I were the programmer looking at that, adding a new parameter to validate, I’d want to add a regex too. I know because I was in that situation.
And that’s fair, programmers love repetition. Repetition means we don’t have to think so hard about the code. Repetition means we can rely on the fact that our new code is “like” the other code, and the other code worked and so will this code. Repitition means the reviewer doesn’t have to review so hard.
But this is a false win. Fake. Meaningless. In this case, our conformity cost the reviewer who had to prove in their head that this regex:
0*([0-9]{1,4}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5]);
actually matches if a number is less than 65536.
So the moral is: Be wary of repetition. In the worst case, it can create bugs. Often it indicates that the code can be refactored, made more abstract, so the programmer and the reviewer don’t get into a lazy rut of assuming everything is the same. And sometimes, like today, it makes you a contortionist.
This blog series updates pseudo-weekly at Programming for Maintainability