Programmer's Water Cooler: Defining Technical Debt. Has everybody been getting it "wrong"?

Technical debt is one of those concepts that has both gained a lot of traction over the last few years, and in some ways been misused as a concept quite heavily. It seems that nearly anything and everything that people don't like about a code base gets lumped into a broad technical debt category, so it's getting harder for people to define with clarity what technical debt actually is as the concept becomes more and more muddied but people's perceptions as to what it is they are looking at when they perceive technical debt. Contributing to this lack of clarity is the need to be able to quantify this debt, and to measure it in a meaningful way. The problem is however, that technical debt isn't like bank loan, where you accrue a fixed amount of debt and you pay it of bit by bit, and I can't help but think that for many, the definition of the word Debt has left many feeling that it is something very specific. For others, the issue is that everything they do incurs a measure of "debt" that negatively impacts on the product, so there is a kind of underlying hysteria that leads some people to believe that they need to measure every little thing about their software development, and that somehow they can see a report that will pinpoint where the problems lie. The reality however is both much more subtle, and perhaps much simpler than many people are leading themselves to believe.

I'm a recent convert to LinkedIn, and I noticed a question which prompted me to write up a blog article about the subject. Truth is that I started to write an answer, and found myself writing a bit of an involved answer that threatened to spill over the character limit. :-)

So, what is a debt?. Clearly it's where you owe something tangible which that you intend to repay later. In software development terms a technical debt is a technical constraint that you will need to overcome at some point. From the LinkedIn Q/A, I really liked an answer by a guy named Gary, in which he defined technical debt as "...anything that impedes a team's ability to deliver changes quickly in the future.", which means that everything from what you code, to how you code, from your tool-chain to your internal development processes can all effectively be labelled as contributors towards technical debt. Now, while I agree with this in principle, I also feel that these sorts of answers tend to contribute to muddying the definition even further, so perhaps we might look at this from a slightly simpler definition and viewpoint.

Let's start with a hypothetical example. You have any situation where you might have several ways to do something, and the thing that you choose to do is the most convenient, but not necessarily the most efficient, or the best practice, and your intention is that it's something that you can live with, but which you may need to change at a later date. Every time you have a situation like this, where you see a problem and you decide to apply the most expedient rather than the "best" option at the time, you increase the likelihood that your choice will result in a future change which could be potentially costly if it isn't dealt with sooner rather than later. The more time that passes, and the more complex and fixed your architecture becomes, the greater the problem will be if and when you choose to address it. So just like with a regular debt, your technical debt accrues interest in the form of both task complexity, and the time required to deal with the task. The thing is, you can have all sorts of technical reasons why the technical debt is so large and costly, however the reality of the debt is that each and every decision that led to the accrual of a technical debt is the reason why the debt exists in the first place, and it's this particular point that I feel is the key to understanding the reality of technical debt. That is, that it isn't so much about the state of a system, or the processes involved, but rather it's more about the decisions that are made, and how those decisions impact on your ability to deal with the outcome of those decisions over time.

So yes, this does include issues such as poor code quality, poor processes, and so on, however I think that these things are more symptomatic, and not the cause. This is also why I feel that technical debt can be so difficult for people to get a grasp of, or to define and measure so easily, because what you really want to measure is the decisions made against the symptoms that later arise. Simply measuring symptoms isn't enough either, because while you might see something that you might call technically "wrong", it doesn't necessarily mean you have incurred technical debt.

Here's an example of what I mean. You write some software, sell it and make a profit, and you and your team look back on it and feel that you did the best possible job with the resources at hand. You can't see any way that you could have possibly done any better or been more efficient, your customer was happy, and you were able to deliver the product you intended to. Fast forward a few years, and you decide to review your code to see if it can be used for another purpose. You run a series of measurements based on your current knowledge of best practices etc., and perhaps you even have some software that can analyse your code with the latest rules, and you suddenly uncover a whole lot of things that by the present standards tell you that your software is of an average to poor quality. Is this technical debt that has crept up on you over the years? Of course not. It's simply that things have changed over time, and your decisions today are influenced by things you have learned in the time since you last worked on that particular code base. You had not incurred a debt at the time simply because you achieved all of your goals, and you weren't impeded in any way at the time.

But wait! What about those situations where you write software, and maintain it continuously over the years? Well, that gets a little trickier to define. I tend to think of the long running and endless legacy products as being in a permanent state of technical debt. In particular, where a legacy system is used without any chance of being able to maintain or modify it later. So it just gets used continuously, and eventually becomes the proverbial millstone around the neck of your latest and greatest product, simply because you can't bear the thought of throwing out the huge investment your company made in the older product. In this case, you've decided to keep something that you know can't change, perhaps won't scale or grow, and it will created technical problems for you over time that will become more difficult to deal with the longer the product relies on your legacy stuff. Again, however, it's a debt as a result of choices, and not an inherent debt based solely on task complexity, particularly because you will over time be making the same or similar decisions over and over again, compounding the problem through decision making processes - and effectively compounding the interest raised in terms of your technical debt.

So what can you measure? How can you prove that there is or isn't a technical debt issue? Does code coverage tell you whether you have a technical debt issue, or merely that you haven't written enough tests to validate that your code behaves as intended? Is there any measurement you can apply to a system that will define and quantify technical debt in a meaningful way? Personally, I believe that you can measure technical debt, but that you can't take measurements of your code because that will only give you potential symptoms, and not actual reasons... unless you have something else that can provide some sort of a meaningful context for the measurements that you take.

It's lies, damned lies, and statistics. You can draw all sorts of correlations from data measured about your code base, however if you neglect the human element in the equation, you miss the most important message of all, that it really is more a debt of decisions rather than a debt of technicalities that you are facing. Or, perhaps it's really a combination. I don't believe that you can simply use measurements such as Cyclometric Complexity, Code Coverage, or - the most often abused - Lines of Code to tell you about whether you have a technical debt. Many others would say these do measure debt, however they don't really know. They are guessing, because there usually IS a correlation between poor quality of code and projects where there is a high relative level of technical debt. However, code quality itself isn't a measure of technical debt. It simply tells you that the code could be improved, and empirically we know that when the quality of the code is poor, the likelihood that there is also technical debt is high. I believe this is because teams that care about how they craft their code most often do their utmost to avoid design complexity issues by going the extra mile to implement the better solution, rather than simply choosing the expedient one. Or, when the expedient choice is made, such people will come back to the problem as soon as they can, to avoid having a simple problem getting out of hand and becoming a costly one. In this case, decisions which are made that result in a technical debt are met with subsequent decisions to pay off the debt early, to avoid costly technical interest being accrued.

So if you can't use any of those fancy code measuring tools, what can you do to measure technical debt. I believe that the only quantitative measure that you can reasonably take is to determine the cost of changes in terms of task complexity, which in real measurable terms means the time to implement, the resources required, and the impact on the rest of the team's work in terms of slipped deadlines, anxiety, and other potential intangibles. Which is to say that you won;t really be able to measure the technical debt until after the fact. There may however be a way to predict the costs of such debt, but those predictions would require data that is relevant to your particular team gathered over time in order to define a predictive model on which you might be able to base your predictions.

Hypothetically, you might log the number of times you make a compromise about design, and to track those times against the number of problems that you end up facing in the future that resulted from an earlier compromise decision. To make such measurements more meaningful, you might also measure how your team's decision-making performs over time. A good way to do this might be by borrowing a trick from Joel Spolsky's Evidence-based Scheduling, using a Montecarlo Method to help you to predict how accurate your team's decisions and estimates pan out in actuality.

Let's say the team on average estimates very well, and on a reasonably good project you find that you haven't really needed to choose a compromise between design quality and expedience. On a later project, you might count many more compromises, and I suspect you may see many more design problems. You may even be able to see a measurable difference in the accuracy of your teams estimates, or you may find that the velocity of your sprints slows significantly. You may also be able to take a measure of the number of bugs reported and map those against the estimates, and the compromise decisions you made in the code where the bugs are found.

In reality, technical debt is insidious, because it usually creeps up on you quietly, and doesn't really present itself until you find yourself in some sort of costly design boggle late in a project. It's hard to measure without committing to tracking lots of non-technical information about the way your team is working, and difficult to quantify objectively, as it is largely a subjective measure which can vary from team to team, and project to project in terms of it's overall cost and relevance. I believe that technical debt is something that is understood to exist more than it has actually been proven to exist. The thing is that your experience as a software developer tells you that it's there, however it's damned hard to prove, particularly to the non-technical people who hold the purse-strings and who you generally need to convince in order to allow you to avoid costly compromises during the early stages of software development. This is perhaps an area that requires some rigorous study, however that in itself is likely to be quite costly if it risks diverting expensive resources away from projects to deal with the additional "red-tape" that would be required in order to gather the data necessary to analyse the problem.

In summary, Technical Debt is any technical impediment to future change which arises from a decision to compromise between code/design quality, and expedience. The degree of the Technical Debt is measured in the complexity and relative cost to apply a change in the future, with such measurements only determined with any accuracy after the technical debt has been recognized and addressed accordingly. This is perhaps why the issue of technical debt has many people worried, because the costs are hidden, and can prove to be very expensive, yet they can be mitigated somewhat by applying best practices and lean development principles, doing the minimum needed to get the job done, testing everything, and avoiding the need to compromise between quality and expedience by dealing with problems as soon as they arise, and not leaving problems for long periods before they are properly dealt with.

Programmer's Water Cooler

Saturday, 7 July 2012

Defining Technical Debt. Has everybody been getting it "wrong"?

No comments:

Post a Comment