While I was working with Skylar Watson and some others at a client in Michigan, we were discussing metrics with their uses and abuses (see What Should We Measure).

Skylar had suggested that there was a metric used in manufacturing that has strong applicability to software development in a CI/CD world.

The concept is First Time Through (FTT).

alt_text

Obstacle course racer climbs through pipe

Origin of the Concept

A factory (and, yes, we know that software is not very like a factory… hold on to that objection) seeks to make a perfect instance of their product every time. The aspirational goal of Lean manufacturing is to have perfect value while producing no waste at all.

Sometimes, though, there is a problem.

Perhaps a part is out of tolerance, or something was damaged in handling. The problem must be remedied before the product can be released to customers. The product cannot be allowed to go through to delivery in a defective state.

Since remediation involves a loss of material and time, having and fixing problems are operations that involve waste rather than operations that add value.

So every time a product is assembled perfectly without remediation, the factory has done a splendid job.

The factory work is managed by considering the First Time Through rate.

If a factory had some problem to remediate once for every dozen widgets produced, you would consider it a very poor factory indeed. One would expect that a factory would not see one failure for several hundred or several thousand units produced.

Software “Construction”?

Software is unlike factories. Factories create replica instances of a product but in software, perfect replicas are created by downloading or file-copy commands.

Likewise, the original product isn’t being created by programmers either. The compiler is the analog of fabrication, taking a specification (source code) and building a product according to the specification given by the programmer. Our compilers are bit-perfect, repeatable, and have few imperfections.

Compilers and file-copy are to software as factories are to tangible products.

Certainly, if one measured FTT of a set of compilation and file copying tools, one would find that metric consistently at 100%. Every time one compiles the same code and copies the executable, one gets an essentially perfect copy of the product.

Of course, a metric that never changes is not very helpful in managing an effort, so we don’t bother to measure the FTT rate of our nigh-perfect machinery.

But imagine that you were producing the compiler or the copying command. You would want to have a measurement of how often the machinery correctly produced the intended code and successfully deployed it, right?

If a copy command ran very quickly but the result was not byte-for-byte identical to the original file, you would consider it to be faulty and would set about uncovering and solving its problems.

Likewise, if a compiler produced different and flawed results for the same input, you would set about finding and correcting the causes of error in the compiler. You would certainly leave no stone unturned in fixing or rearchitecting the compiler to produce consistently good results.

FTT In Normal Software Development

Your development teams are continually beginning with whatever version of code currently exists, then extending the design to incorporate additional functions and features. When they are done, they have produced a new revision of code that has never existed before.

This new product is expected to solve problems for some members of the product community. Maybe it is to make the product easier to install and operate. Perhaps it makes it easier to administer various configurations for varied users. Perhaps it provides new operations for users or a better user experience. Maybe all of the above.

This is a design operation, rather than manufacturing.

It’s rather harder to tell if a design will meet its user expectations in the abstract, and the deliverable code is the first complete working model that has been created.

But we deliver our designs (source code) to the machinery that builds and copies the product as we’ve specified (the pipeline).

If the design cannot pass through the pipeline, it is because there is a defect that stops the compiler or the code does not pass the quality checks (often levels of static scanning and tests) in the pipeline, or perhaps there is a failure to install and start the software at the deployment end of the pipeline. Either way, there is some defect that must be returned to the development team for remediation.

Ah, here we see a parallel.

We want the code to pass effectively through the pipeline every time, but sometimes it doesn’t. We want to know how often we can successfully pass through the pipeline and how often we are able to actually solve user problems with the code we produce.

Every time we fail to pass through the pipeline, we have been saved from a public failure, but we also have remediation work to perform.

This remediation costs time and money. Remediation is a waste since no user would pay extra for software where developers spent a lot of time fixing rejected builds.

The remediation waste may (or may not) be a lower cost than releasing defects to the end users in production. Our intention is to make it inexpensive to build software with no defects, or to catch all defects so quickly that they effectively never happened.

I spent a few years in which I polled developers at conferences and user groups, and corporate programmers in various companies I visited. I found that roughly 70% of all developer efforts were going to fixing defects found either in testing or in production code.

I know that sounds obscene, but between 60 and 80 percent of all development effort was waste. It’s not really surprising that their work frequently falls behind schedule.

The “solution” (really a coping mechanism) in many companies is just to fix the errors and move on – pretend after-the-fact that they never happened and don’t think about it anymore.

What if we measured how often we could successfully deploy software, as a ratio of perfect releases to total attempts to release?

Say we created version 2.4 yesterday. If it can go to production, that gives us one try with one success: 100% First Time Through (FTT)!

Today we create version 2.5. It has problems. It takes us 3 tries to get it to pass through the production pipeline. Now we’re 50%. We tried twice, and the second try had issues. This gives us 50% FTT (one troubled attempt with retries, one successful attempt).

That dip suggests that something different happened in 2.5. What was it? Was there a flaw in our process or our understanding that let the issues slip past our awareness until it failed in the pipeline? What does the difficulty have to teach us?

Tomorrow we create version 2.6, and it goes out without an issue. We’re now at 66% FTT.

Armed with this information, and an intention to improve FTT, we start to change our procedures to create more safety and confidence.

We may add security and quality scans to our IDEs so we can find problems and correct them sooner.

We test better and earlier, increasing our supply of automated tests so we have reason to believe that our code will pass in testing (and to support refactoring).

We review each other’s work, perhaps continuously.

We remove code smells like primitive obsession, to make it harder to make common mistakes.

We eliminate duplication in the code base, so we reduce our maintenance effort.

We may take a little longer to get code into the pipeline, but now we aren’t spending 70% of our time on fixing failures and so we are delivering more often.

If you spend 20% of your time on new code, and 80% on fixing broken code then if you could reduce your defects to consume only 40% of your time, you could spend 60% of your time on new code. That is a threefold increase in development time. Even if you spent twice as much time developing each feature you would deliver code to production more often and more predictably.

So measuring FTT and acting to improve it can result in better code sooner.

As with any metric, measurement without focused improvement efforts is useless.

Unanswered Question

“Tim, this is helping our process of making code, but it doesn’t answer whether the products are any good – whether they satisfy user needs or not! What about validating the code with users?”

Guilty. FTT focuses on the process of making software, but doesn’t directly influence whether we are making the right software.

At least, using FTT doesn’t do anything to damage or impair your ability to produce the right solutions. It is entirely synergistic with attempts to build better products.

But consider this, if you can’t get your software to production quickly then your ability to experiment with new solutions is stymied by your inability to release experimental code.

If your process is not very good at getting your software delivered, it doesn’t much matter if it’s pretty good or pretty awful - if it’s not perfect in every way, you are going to struggle to improve it.

Additional External References