Characterization testing, aka Golden Master testing, is a technique where you apply known inputs to a process to verify the output against a known result. I have found this to be a great technique for testing legacy code that does not have many tests. However, my first use of this technique was at a nuclear power plant. So how did I use characterization tests in nuclear power?

Clinton Power Station, Clinton, Illinois

Clinton Power Station, Clinton, Illinois

Shipping Low-Level Radioactive Waste

After college, I started working at the Clinton Power Station as a radiological engineer shipping low-level radioactive waste. Working in the nuclear power industry required having sufficient controls in place to ensure the health and safety of the workers and the public. This was especially true when transporting low-level radioactive waste. Both the Nuclear Regulatory Commission (NRC) and Department of Transportation (DOT) have regulations that required documentation to ensure each shipment was prepared safely.

RADMAN Software

We had a step-by-step procedure on how to create the proper paperwork using a software program known as RADMAN. We could have created the paperwork manually, however, doing so was time consuming, complicated and subject to human error.

Radman Software

The software program generated the appropriate NRC and DOT paperwork so we didn’t have to do it by hand. It was an early Windows application that used Fortran to do the necessary calculations.

The software was reviewed and approved by the NRC in 1984. However, the NRC and DOT frequently changed the regulations. This caused us to get frequent updates from the vendor. As a result, we wanted to ensure that for several different types of shipments the software always produced the proper paperwork. But how did we know that the software was producing the correct paperwork? To answer this question I developed a procedure for regression testing the software using a characterization testing approach.

Characterization Tests

Tim Ottinger and Jeff Langr describe the steps to create a characterization test below. Similarly, to test the RADMAN software, I manually prepared the paperwork for several different types of shipments. I saved the results to a characterization test document. The document would be used to confirm that the software was generating the correct paperwork.

instructional list on writing tests

Agile in a Flash

Subsequently, when we received a new version of the software, I created fake shipments for the scenarios in the characterization tests. Then I compared the software output to the characterization test document. If there were any discrepancies, we would notify the software vendor immediately. Using characterization tests gave me full confidence in using the software to generate the required shipment paperwork.

Legacy Software

After several years, I left nuclear power for software development. And I continue to use characterization testing. I find it very useful for working on legacy code that does not have a lot of existing tests. Before making changes to the legacy code, I will write characterization tests to provide a safe way to proceed. Doing so gives me a way to get rapid feedback should I unintentionally alter the existing behavior.

Gilded Rose

For example, consider the Gilded Rose code kata. It is a fictitious legacy code base for an inventory system. The system works but is not very clean code. The code is all in one function as shown here in the Python version:

# -*- coding: utf-8 -*-


class GildedRose(object):

    def __init__(self, items):
        self.items = items

    def update_quality(self):
        for item in self.items:
            if item.name != "Aged Brie" and item.name != "Backstage passes to a TAFKAL80ETC concert":
                if item.quality > 0:
                    if item.name != "Sulfuras, Hand of Ragnaros":
                        item.quality = item.quality - 1
            else:
                if item.quality < 50:
                    item.quality = item.quality + 1
                    if item.name == "Backstage passes to a TAFKAL80ETC concert":
                        if item.sell_in < 11:
                            if item.quality < 50:
                                item.quality = item.quality + 1
                        if item.sell_in < 6:
                            if item.quality < 50:
                                item.quality = item.quality + 1
            if item.name != "Sulfuras, Hand of Ragnaros":
                item.sell_in = item.sell_in - 1
            if item.sell_in < 0:
                if item.name != "Aged Brie":
                    if item.name != "Backstage passes to a TAFKAL80ETC concert":
                        if item.quality > 0:
                            if item.name != "Sulfuras, Hand of Ragnaros":
                                item.quality = item.quality - 1
                    else:
                        item.quality = item.quality - item.quality
                else:
                    if item.quality < 50:
                        item.quality = item.quality + 1


class Item:
    def __init__(self, name, sell_in, quality):
        self.name = name
        self.sell_in = sell_in
        self.quality = quality

    def __repr__(self):
        return "%s, %s, %s" % (self.name, self.sell_in, self.quality)
            

The kata exercise asks you to add a feature to this system. This will be difficult unless you refactor the code first. However, you don’t have adequate test coverage to do the refactoring safely. This is where you can use characterization tests.

Approval Tests

Emily Bache has a great video explaining how you can do characterization testing on the Gilded Rose kata using a framework called Approval Tests. The Gilded Rose kata exists in many different programming languages. We will look at the Python version. The kata provides a failing test that can be modified to use Approval Tests:

    # -*- coding: utf-8 -*-
    import unittest
    
    from gilded_rose import Item, GildedRose
    from approvaltests.combination_approvals import verify_all_combinations
    from approvaltests.reporters.generic_diff_reporter_factory import GenericDiffReporterFactory
    
    
    class GildedRoseTest(unittest.TestCase):
    
        def setUp(self):
            self.reporter = GenericDiffReporterFactory().get_first_working()
    
        def test_update_quality(self):
            verify_all_combinations(self.do_update_quality, [
                ["foo", "Aged Brie", "Backstage passes to a TAFKAL80ETC concert", "Sulfuras, Hand of Ragnaros"],
                [-1, 0, 2, 6, 11],
                [0, 1, 49, 50]])
    
        @staticmethod
        def do_update_quality(name, quality, sell_in):
            items = [Item(name, sell_in, quality)]
            gilded_rose = GildedRose(items)
            gilded_rose.update_quality()
            return str(items[0])
    
    
    if __name__ == '__main__':
        unittest.main()

Test Inputs

Knowing the GildedRose class is being used in a fictitious production environment gives us confidence that it is behaving as desired. Therefore, you can use Approval Tests to generate about 80 known test inputs. The inputs will exercise all parts of the code, as shown here for Python:

    args: ('foo', 0, -1) => 'foo, -1, -1'
    args: ('foo', 0, 0) => 'foo, -1, 0'
    args: ('foo', 0, 2) => 'foo, -1, 0'
    args: ('foo', 0, 6) => 'foo, -1, 4'
    args: ('foo', 0, 11) => 'foo, -1, 9'
    args: ('foo', 1, -1) => 'foo, 0, -1'
    args: ('foo', 1, 0) => 'foo, 0, 0'
    args: ('foo', 1, 2) => 'foo, 0, 1'
    args: ('foo', 1, 6) => 'foo, 0, 5'
    args: ('foo', 1, 11) => 'foo, 0, 10'
    args: ('foo', 49, -1) => 'foo, 48, -1'
    args: ('foo', 49, 0) => 'foo, 48, 0'
    args: ('foo', 49, 2) => 'foo, 48, 1'
    args: ('foo', 49, 6) => 'foo, 48, 5'
    args: ('foo', 49, 11) => 'foo, 48, 10'
    args: ('foo', 50, -1) => 'foo, 49, -1'
    args: ('foo', 50, 0) => 'foo, 49, 0'
    args: ('foo', 50, 2) => 'foo, 49, 1'
    args: ('foo', 50, 6) => 'foo, 49, 5'
    args: ('foo', 50, 11) => 'foo, 49, 10'
    args: ('Aged Brie', 0, -1) => 'Aged Brie, -1, 1'
    args: ('Aged Brie', 0, 0) => 'Aged Brie, -1, 2'
    args: ('Aged Brie', 0, 2) => 'Aged Brie, -1, 4'
    args: ('Aged Brie', 0, 6) => 'Aged Brie, -1, 8'
    args: ('Aged Brie', 0, 11) => 'Aged Brie, -1, 13'
    args: ('Aged Brie', 1, -1) => 'Aged Brie, 0, 0'
    args: ('Aged Brie', 1, 0) => 'Aged Brie, 0, 1'
    args: ('Aged Brie', 1, 2) => 'Aged Brie, 0, 3'
    args: ('Aged Brie', 1, 6) => 'Aged Brie, 0, 7'
    args: ('Aged Brie', 1, 11) => 'Aged Brie, 0, 12'
    args: ('Aged Brie', 49, -1) => 'Aged Brie, 48, 0'
    args: ('Aged Brie', 49, 0) => 'Aged Brie, 48, 1'
    args: ('Aged Brie', 49, 2) => 'Aged Brie, 48, 3'
    args: ('Aged Brie', 49, 6) => 'Aged Brie, 48, 7'
    args: ('Aged Brie', 49, 11) => 'Aged Brie, 48, 12'
    args: ('Aged Brie', 50, -1) => 'Aged Brie, 49, 0'
    args: ('Aged Brie', 50, 0) => 'Aged Brie, 49, 1'
    args: ('Aged Brie', 50, 2) => 'Aged Brie, 49, 3'
    args: ('Aged Brie', 50, 6) => 'Aged Brie, 49, 7'
    args: ('Aged Brie', 50, 11) => 'Aged Brie, 49, 12'
    args: ('Backstage passes to a TAFKAL80ETC concert', 0, -1) => 'Backstage passes to a TAFKAL80ETC concert, -1, 0'
    args: ('Backstage passes to a TAFKAL80ETC concert', 0, 0) => 'Backstage passes to a TAFKAL80ETC concert, -1, 0'
    args: ('Backstage passes to a TAFKAL80ETC concert', 0, 2) => 'Backstage passes to a TAFKAL80ETC concert, -1, 0'
    args: ('Backstage passes to a TAFKAL80ETC concert', 0, 6) => 'Backstage passes to a TAFKAL80ETC concert, -1, 0'
    args: ('Backstage passes to a TAFKAL80ETC concert', 0, 11) => 'Backstage passes to a TAFKAL80ETC concert, -1, 0'
    args: ('Backstage passes to a TAFKAL80ETC concert', 1, -1) => 'Backstage passes to a TAFKAL80ETC concert, 0, 2'
    args: ('Backstage passes to a TAFKAL80ETC concert', 1, 0) => 'Backstage passes to a TAFKAL80ETC concert, 0, 3'
    args: ('Backstage passes to a TAFKAL80ETC concert', 1, 2) => 'Backstage passes to a TAFKAL80ETC concert, 0, 5'
    args: ('Backstage passes to a TAFKAL80ETC concert', 1, 6) => 'Backstage passes to a TAFKAL80ETC concert, 0, 9'
    args: ('Backstage passes to a TAFKAL80ETC concert', 1, 11) => 'Backstage passes to a TAFKAL80ETC concert, 0, 14'
    args: ('Backstage passes to a TAFKAL80ETC concert', 49, -1) => 'Backstage passes to a TAFKAL80ETC concert, 48, 0'
    args: ('Backstage passes to a TAFKAL80ETC concert', 49, 0) => 'Backstage passes to a TAFKAL80ETC concert, 48, 1'
    args: ('Backstage passes to a TAFKAL80ETC concert', 49, 2) => 'Backstage passes to a TAFKAL80ETC concert, 48, 3'
    args: ('Backstage passes to a TAFKAL80ETC concert', 49, 6) => 'Backstage passes to a TAFKAL80ETC concert, 48, 7'
    args: ('Backstage passes to a TAFKAL80ETC concert', 49, 11) => 'Backstage passes to a TAFKAL80ETC concert, 48, 12'
    args: ('Backstage passes to a TAFKAL80ETC concert', 50, -1) => 'Backstage passes to a TAFKAL80ETC concert, 49, 0'
    args: ('Backstage passes to a TAFKAL80ETC concert', 50, 0) => 'Backstage passes to a TAFKAL80ETC concert, 49, 1'
    args: ('Backstage passes to a TAFKAL80ETC concert', 50, 2) => 'Backstage passes to a TAFKAL80ETC concert, 49, 3'
    args: ('Backstage passes to a TAFKAL80ETC concert', 50, 6) => 'Backstage passes to a TAFKAL80ETC concert, 49, 7'
    args: ('Backstage passes to a TAFKAL80ETC concert', 50, 11) => 'Backstage passes to a TAFKAL80ETC concert, 49, 12'
    args: ('Sulfuras, Hand of Ragnaros', 0, -1) => 'Sulfuras, Hand of Ragnaros, -1, -1'
    args: ('Sulfuras, Hand of Ragnaros', 0, 0) => 'Sulfuras, Hand of Ragnaros, -1, 0'
    args: ('Sulfuras, Hand of Ragnaros', 0, 2) => 'Sulfuras, Hand of Ragnaros, -1, 2'
    args: ('Sulfuras, Hand of Ragnaros', 0, 6) => 'Sulfuras, Hand of Ragnaros, -1, 6'
    args: ('Sulfuras, Hand of Ragnaros', 0, 11) => 'Sulfuras, Hand of Ragnaros, -1, 11'
    args: ('Sulfuras, Hand of Ragnaros', 1, -1) => 'Sulfuras, Hand of Ragnaros, 0, -1'
    args: ('Sulfuras, Hand of Ragnaros', 1, 0) => 'Sulfuras, Hand of Ragnaros, 0, 0'
    args: ('Sulfuras, Hand of Ragnaros', 1, 2) => 'Sulfuras, Hand of Ragnaros, 0, 2'
    args: ('Sulfuras, Hand of Ragnaros', 1, 6) => 'Sulfuras, Hand of Ragnaros, 0, 6'
    args: ('Sulfuras, Hand of Ragnaros', 1, 11) => 'Sulfuras, Hand of Ragnaros, 0, 11'
    args: ('Sulfuras, Hand of Ragnaros', 49, -1) => 'Sulfuras, Hand of Ragnaros, 48, -1'
    args: ('Sulfuras, Hand of Ragnaros', 49, 0) => 'Sulfuras, Hand of Ragnaros, 48, 0'
    args: ('Sulfuras, Hand of Ragnaros', 49, 2) => 'Sulfuras, Hand of Ragnaros, 48, 2'
    args: ('Sulfuras, Hand of Ragnaros', 49, 6) => 'Sulfuras, Hand of Ragnaros, 48, 6'
    args: ('Sulfuras, Hand of Ragnaros', 49, 11) => 'Sulfuras, Hand of Ragnaros, 48, 11'
    args: ('Sulfuras, Hand of Ragnaros', 50, -1) => 'Sulfuras, Hand of Ragnaros, 49, -1'
    args: ('Sulfuras, Hand of Ragnaros', 50, 0) => 'Sulfuras, Hand of Ragnaros, 49, 0'
    args: ('Sulfuras, Hand of Ragnaros', 50, 2) => 'Sulfuras, Hand of Ragnaros, 49, 2'
    args: ('Sulfuras, Hand of Ragnaros', 50, 6) => 'Sulfuras, Hand of Ragnaros, 49, 6'
    args: ('Sulfuras, Hand of Ragnaros', 50, 11) => 'Sulfuras, Hand of Ragnaros, 49, 11'
    

Refactored Code

Having these characterization tests then allow you to safely refactor the GildedRose class. For example, here is one possible way to refactor the GildedRose class:

# -*- coding: utf-8 -*-

class GildedRose(object):

    def __init__(self, items):
        self.items = items

    def update_quality(self):
        for item in self.items:
            item.update_quality()
            item.decrement_sell_in()


class Item:
    def __init__(self, name, sell_in, quality):
        self.name = name
        self.sell_in = sell_in
        self.quality = quality

    def update_quality(self):
        raise NotImplementedError("Subclass must implement this abstract method")

    def decrement_sell_in(self):
        self.sell_in = self.sell_in - 1

    def __repr__(self):
        return "%s, %s, %s" % (self.name, self.sell_in, self.quality)


class SulfurasHandOfRagnaros(Item):
    def update_quality(self):
        return


class AgedBrie(Item):
    def update_quality(self):
        if self.quality < 50:
            self.quality = self.quality + 1
        if self.sell_in < 1 and self.quality < 50:
            self.quality = self.quality + 1


class Backstage(Item):
    def update_quality(self):
        if self.quality < 50:
            self.quality = self.quality + 1
        if self.sell_in < 11 and self.quality < 50:
            self.quality = self.quality + 1
        if self.sell_in < 6 and self.quality < 50:
            self.quality = self.quality + 1
        if self.sell_in < 1:
            self.quality = 0


class GeneralItem(Item):
    def update_quality(self):
        if self.quality > 0:
            self.quality = self.quality - 1
        if self.sell_in < 1 and self.quality > 0:
            self.quality = self.quality - 1

Refactored Unit Test

As well as one possible way to refactor the unit test:

    # -*- coding: utf-8 -*-
    import unittest
    
    from gilded_rose import GildedRose, SulfurasHandOfRagnaros, AgedBrie, Backstage, GeneralItem
    from approvaltests.combination_approvals import verify_all_combinations
    from approvaltests.reporters.generic_diff_reporter_factory import GenericDiffReporterFactory
    
    BACKSTAGE = "Backstage passes to a TAFKAL80ETC concert"
    AGED_BRIE = "Aged Brie"
    HAND_OF_RAGNAROS = "Sulfuras, Hand of Ragnaros"
    
    
    class GildedRoseTest(unittest.TestCase):
    
        def setUp(self):
            self.reporter = GenericDiffReporterFactory().get_first_working()
    
        def test_update_quality(self):
            verify_all_combinations(self.do_update_quality, [
                ["foo", AGED_BRIE, BACKSTAGE, HAND_OF_RAGNAROS],
                [0, 1, 49, 50],
                [-1, 0, 2, 6, 11]])
    
        @staticmethod
        def create_item(name, sell_in, quality):
            if name == HAND_OF_RAGNAROS:
                return SulfurasHandOfRagnaros(name, sell_in, quality)
            if name == AGED_BRIE:
                return AgedBrie(name, sell_in, quality)
            if name == BACKSTAGE:
                return Backstage(name, sell_in, quality)
            else:
                return GeneralItem(name, sell_in, quality)
    
        def do_update_quality(self, name, sell_in, quality):
            items = [self.create_item(name, sell_in, quality)]
            gilded_rose = GildedRose(items)
            gilded_rose.update_quality()
            return str(items[0])
    
    
    if __name__ == '__main__':
        unittest.main()

Conclusion

As you can see, the refactored code is much easier to understand. Consequently, the code will be much easier to modify in the future. Likewise, there are many other possible ways to refactor the GildedRose class and unit test. This is merely one example.

In conclusion, one benefit of working in software development is that I can use a testing framework like Approval Tests to automate the characterization tests. I’m glad that I don’t have to do them manually like I did at the nuclear power plant.

Special thanks to early reviewers Josh Kerievsky and Tim Ottinger.