Sunday, July 24, 2016

2016-07-24: Improve research code with static type checking

The Pain of Late Bug Detection

[The web] is big. Really big. You just won't believe how vastly, hugely, mindbogglingly big it is... [1]

When it comes to quick implementation, Python is an efficient language used by many web archiving projects. Indeed, a quick search of github for WARC and Python yields a list of 80 projects and forks. Python is also the language used for my research into the temporal coherence of existing web archive holdings.

The sheer size of the Web means lots of variation and lots low-frequency edge cases. These variations and edge cases are naturally reflected in web archive holdings. Code used to research the Web and web archives naturally contains many, many code branches.

Python struggles under these conditions. It struggles because minor changes can easily introduce bugs that go undetected until much later. And later for Python means at run time. Indeed the sheer number of edge cases introduces code branches that are exercised so infrequently that code rot creeps in. Of course, all research code dealing with the web should create checkpoints and be restartable as a matter of self defense—and mine does defend itself. Still, detecting as many of these kinds of errors up front, before run time is much better than dealing with a mid-experiment crash.

[1] Douglas Adams may have actually written something a little different.

Static Typing to the Rescue

Static typing allows detection of many types of errors before code is executed. Consider the function definitions in figure 1 below. Heuristic is an abstract base class for memento selection heuristics. In my early research, memento selection heuristics required only Memento-Datetime. Subsequent work introduced selection based on both Memento-Datetime and Last-Modified. When the last_modified parameter was added, the cost functions were update accordingly—or so I thought. Notice that the last_modified parameter is missing from the PastPreferred cost function. Testing did not catch this oversight (see "Testing?" below). The addition of static type checking did.

class Heuristic(object):
     ...
class MinDist(Heuristic):
    def cost(self, memento_datetime, last_modified=None):

class Bracket(Heuristic):
    def cost(self, memento_datetime, last_modified):

class PastPreferred(Heuristic):
    def cost(self, memento_datetime):

Figure 1. Original Code
Static type checking is available for Python through the use of type hinting. Type hinting is specified in PEP 484 and is implemented in mypy. Type hints do not change Python execution; they simply allow mypy to programmatically check expectations set by the programmer. Figure 2 shows the heuristics code with type hints added. Note the addition of the cost function to the Heuristic class. Although not implemented, it allows the type checker to ensure that all cost functions conform to expectations. (This is the addition that led to finding the PastPreferred.cost bug.)

class Heuristic(object):
    def cost(self, memento_datetime: datetime,
             last_modified: Optional[datetime]) \
             -> Tuple[int,datetime]:
        raise NotImplementedError

class MinDist(Heuristic):
    def cost(self, memento_datetime: datetime,
             last_modified: Optional[datetime] = None) \
             -> Tuple[int,datetime]:

class Bracket(Heuristic):
    def cost(self, memento_datetime: datetime,
             last_modified: Optional[datetime]) \
             -> Tuple[int,datetime]:

class PastPreferred(Heuristic):
    def cost(self, memento_datetime: datetime,
             last_modified: Optional[datetime] = None) \
             -> Tuple[int,datetime]:

Figure 2. Type Hinted Code

Testing?

Many have argued that if code is well tested, the extra work introduced by static type checking out weighs the benfits. But what about bugs in the tests? (After all tests are code too—and not immune from programmer error). The code shown in Figure 1 had a complete set of tests (i.e. 100% coverage). However when Last-Modified was added, the PastPreferred tests were not updated and continued to pass. The addition of static type checking revealed the PastPreferred test bug, three research code bugs missed by the tests, and over dozen other test bugs. Remember, "Test coverage is of little use as a numeric statement of how good your tests are."

— Scott G. Ainsworth

No comments:

Post a Comment