TDD Proven Effective! Or is it?

by Jacob 16. January 2008 22:33

skeptic One of the most useful classes I took in college was an introductory statistics class that was intended to discourage Poli. Sci. majors from continuing in their course of study (I was one at the time). The interesting thing about the class is that it included study and analysis of research as a core part of the class. We learned how to put a study together, how to develop controls, and different formulas used to confirm whether we had statistically significant results or not.

So when Phil Haack announced that Research Supports the Effectiveness of TDD I was more than a little interested in seeing what the linked report actually contained. Phil quotes from the abstract.

We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed.

Phil has obviously read the rest of the report and provides his favorite pieces that seem to do as his title suggests. One of the things I worry about when I see things supporting the latest and greatest software development practices, however, is a strong tendency towards confirmation bias—of looking for confirmation of current theories and overlooking counter-indicators.

So, being the curious type and since TDD is something I’m keeping an eye on to see if its something I might want to adopt myself some day, I went into the report.

The Report

The report itself is remarkably well thought-out. Though their sample size is small (a mere 24 students completed the exercise), it looks like they were very careful to make their study relevant to how software development actually works. They created tasks that built on and eventually altered previous functionality, communicated requirements as user stories (meaning the participants had to design the projects, including class layout and interfaces etc.) and the administrators had a large suite of black-box tests to confirm functional implementation (i.e. tested actual functionality delivered and ignored the structure, design, and tests).

It’s important to note that both the TDD and non-TDD groups created and used unit tests. Indeed, the groups are described as "Test First" and "Test Last" throughout the paper.

The authors include their data and results and even mention counter indications (as good report authors should). This is a good thing. If you have a turn for that kind of thing, I recommend reading through it. It isn’t long, particularly as these things go.

Lies, Damn Lies, and . . .

Unfortunately, the authors disappoint me when I compare the abstract to the data in the paper. Indeed, the report becomes an example of why I don’t give much credence to abstracts anymore. Clever academics have long used abstracts to support the conclusions they desire and this seems to be one such. If you’re careful (as these authors are) you can even do it without actually lying. Each of the following statements from the abstract is true:

  • The test-first students on average wrote more tests.
  • Students who wrote more tests tended to be more productive.
  • The minimum quality increased linearly with the number of tests.

Note that only the first is TDD-"Test First"-specific. The other two stand alone, though most readers won’t catch that. Here are some equally true statements based on the data (pages 9 and 10 of the report if you want to read along with me in your book):

  • The control group (non-TDD or "Test Last") had higher quality in every dimension—they had higher floor, ceiling, mean, and median quality.
  • The control group produced higher quality with consistently fewer tests.
  • Quality was better correlated to number of tests for the TDD group (an interesting point of differentiation that I’m not sure the authors caught).
  • The control group’s productivity was highly predictable as a function of number of tests and had a stronger correlation than the TDD group.

So TDD’s relationship to quality is problematic at best. Its relationship to productivity is more interesting. I hope there’s a follow-up study because the productivity numbers simply don’t add up very well to me. There is an undeniable correlation between productivity and the number of tests, but that correlation is actually stronger in the non-TDD group (which had a single outlier compared to roughly half of the TDD group being outside the 95% band).

My Interpretation

One of the things my statistics class pounded home is that correlation doesn’t equal causation. That’s particularly important to remember in reports like this one. Indeed, I think causation is a real problem with this study, at least insofar as "proving TDD effective" is concerned. It actually serves to undercut many common TDD claims for superiority.

Productivity is an example where causality is far from certain. It makes sense to me that more productive programmers write more tests if only because productive programmers feel like they have the time to do things the way they know they should. Even with "Test First" the emotional effect of "being productive" is going to have an impact on the number of tests you create before moving on to the next phase. (note: you have to be careful here because the study measures tests per functional unit so it’s not that productive programmers are creating more tests over all, but rather that they’re creating more tests per functional unit.) Frankly, I think it’s natural to feel (and respond to) this effect even without a boss hanging over your shoulder waiting to evaluate your work.

In other words, you create more tests because you are a productive programmer rather than being a more productive programmer because you create more tests. At least, it makes sense to me that it would be so.

Truly problematic, however, are the quality results. That’s simply a disaster if you propound "Test First" as a guarantor of quality. I mean, sure, the number of unit tests per functional unit suggests a minimum quality through increased testing, but that’s only interesting in a situation where minimum quality is important (like, for example, at NASA or in code embedded in medical equipment). The lack of any other correlation here is pretty pronounced any way you care to slice the data. Having a big clump in that upper left quadrant is troubling enough but then having the "Test Last" group almost double your "Test First" group in the over 90% quality range is something that should be noticed and highlighted.

While correlation doesn’t equal causation, the lack of correlation pretty much requires a lack of causation.

Since quality is obviously not related to number of tests (at least in this study), I think it is highly disingenuous to highlight the possible connection between number of tests and minimum quality. While it’s the only positive thing you can say about quality and unit testing, it is hardly the only thing revealed. The lack of correlation between number of tests and quality is incredibly important and of broad general interest whether you are using formal TDD or not.

Thoughts and Wishes

I really wish that the authors had included a third group that did no unit testing. Since they broadened their scope post-facto to make statements about the efficacy of unit testing over all, it would be nice to have a baseline that didn’t do any unit testing yet went through the same process. The study setup was ideal for including such a group as both the productivity and quality measurements were entirely independent of whatever testing was done (or not done).

That said, something occurred to me while reading this study that hasn’t before: quality can be a property of unit tests as well as of production code. That’s a huge duh, but consider what that could mean. For one, it means that I’d love to see a follow-up that considers the quality of the tests themselves. I would expect to see a relationship between the quality of the product and the quality of the tests if only because quality programmers will be competent in both arenas. This might be such a duh, though, that people assume that test quality matches production quality. Still, I wonder if that is actually the case.

More interesting still would be the correlation between "Testing First" and unit test quality. Ask yourself this, which unit tests are going to be better, the ones created before you’ve implemented the functionality or those created after? Or, considered another way, at which point do you have a better understanding of the problem domain, before implementing it or after? If you’re like me, you understand the problem better after implementing it and can thus create more relevant tests after the fact than before. I know better where the important edge-cases are and how to expose them and that could mean that my tests are more robust.

And yes, I realize after the fact that this is an inversion of section 6.2 which is quoted at length by Phil. I wish the authors had spent as much time in section 6.1 explaining reasons for the quality results instead of what looks to me like an attempt to excuse them. That’s the problem with confirmation bias, those results that run counter to expectations aren’t as interesting and receive cursory exploration at best.

Anyway, without question, testing first leads to having more tests per functional unit. The question is if this is valuable. This study would seem to indicate that this is probably not the case, at least if quality is your intended gain. But then, I’m not that surprised that number of tests doesn’t correspond to quality just as I’m not surprised that the number of lines of code doesn’t correspond to productivity.

Tags: , , ,

Programming

Comments


 Greg 
January 23. 2008 14:23
Greg
Boy, is it ever nice to see someone who can think critically and argue cogently. Really nice work.


January 23. 2008 16:43
Shawn Garbett
What a wonderful review of the subject.

I've recently encountered TDD zealots. Two things struck me immediately in dealing with them. One was they believed writing tests first would somehow magically make the requirements clear. Second was more subtle; One must ask "What am I testing?" when writing a series of tests. Unit tests answer that question with "I'm writing the code I intented to write." Intentionality is tested, but conformance to requirements is not. I've seen a lot of code dump core, that had 100%+ unit coverage from the TDD folks who were of the opinion that actually *using* the code was not required.


 Ben 
January 23. 2008 18:30
Ben
Excellent analysis.  Thank you for not immediately hopping on the TDD bandwagon. =)


January 23. 2008 21:23
Jeremy
One could argue that the Test-First students were more productive because they had already considered and described the implementation before writing it, therefore there was a shorter jump from required code to written code.


 Curt 
January 23. 2008 21:39
Curt
Nice work. Given the huge variance in quality and production between programmers, it strikes me that th e quality of tests is going to correlate well to the quality of the resulting code. We probably need to understand under which problem domains test-first is most productive/produces the most quality. One thing that drives me crazy with all the TDD stuff (and I'm a proponent) is that it doesn't differentiate between problem domains. Not everything is a nail.


 Bobby 
January 23. 2008 22:05
Bobby
Brad J Cox and Andrew J Novobilski back before 1991 discussed the "software crisis" (software engineering not keeping up with hardware engineering productivity), software ICs and their test sockets - for interchangeable parts to be interchangeable, the parts have to be made within tolerance as specified, they are tested with a gauge.
Implementation languages verses testing languages is also discussed. I'm not sure the metaphor isn't stretched a bit too much.
Much software is hand crafted or a roughly similar solution hand tailored to the new solution - this can be "creative" and rewarding, but surely hurts productivity.
I do not think there is much difference between test-first (test-early is good? ) and test-last the program is likely in the programmer's head when the tests are written. The perceived risk with writing the tests after the program, is that the vain programmer will not test the areas that will likely not be much used and where the program is fragile.
I see tests-first as more like top-down programming/goal oriented programming and test-last - I'm not sure.


January 24. 2008 01:58
Nikolas Coukouma
Re: TDD zealots:
It seems that part of the motivation for moving from TDD to BDD is to (try to) address the shortcomings you mentioned (clarity of requirements and meaningfulness of tests). For testing code, there's an emphasis on encoding the requirements for how parts of the software should behave. Test suites are replaced by contexts, where the suite is unified by the setup for it. Nebulous "test methods" that assert things (and seem to often end up as suites themselves) are replaced with smaller, more-specific "should be" checks.

I'm fairly disturbed by how centered on the Sapir-Whorf-hypothesis a lot of the material is ... I simply view it as rebranding to shed luggage; everything I've read about BDD seems to say "TDD is really ... ", then defines some new terms (optionally with mappings), and then declare it BDD.


 cvx 
January 24. 2008 03:03
cvx
Thanks for the great post! It's difficult to separate all the hype from reality in s/w studies these days. Seems like everyone's pushing an agenda.

You're analysis is an excellent reminder that a lot of claims could bear further scrutiny.


 GUI Junkie 
January 24. 2008 04:34
GUI Junkie
Great post.

How about QA test first? I've been trying to do TDD and have written thousands of tests and still believe some areas of my software are under-tested. Maybe having others establish tests would improve quality...


January 24. 2008 10:28
Cosmin Lehene
" Ask yourself this, which unit tests are going to be better, the ones created before you’ve implemented the functionality or those created after? Or, considered another way, at which point do you have a better understanding of the problem domain, before implementing it or after? If you’re like me, you understand the problem better after implementing it and can thus create more relevant tests after the fact than before. I know better where the important edge-cases are and how to expose them and that could mean that my tests are more robust."

I dare you to ask yourself: which production code would be better, the one created before creating any unit test, or the one created after?

Which one is more important?

At the end of the day, what is more important, the quality of the production code, or the quality of the tests? I'd say you will build upon the production code in the future, expanding it and refactoring it, but you won't really build upon unit tests - which, by all means, should be small and simple as possible.

If you are like me, then you understand the real problem better after trying to "use" the production code by writing a unit test for it. This is what top-bottom or outside-in is really about. You really make a car seat to fit your spine, not your spine fit whatever car seat you imagined.

I'd say it's hard to measure productivity on short time intervals. The reality in real life and real systems is really different Smile. It's hard and boring to write many unit tests after having the code and it requires a lot more discipline than having to write one test followed by one piece of code repeatedly.  

Nice article though,

Cosmin


 Andy 
January 24. 2008 13:47
Andy
I think there are possibly many hidden factors in this study as well. For example what was the relative skill of the programmers in each group? We assume they were even but how was that judged?

Also the test later group, how later is later. Did they write a function then test it or did they write the whole program then go back and write their tests? I would be inclined to think that they did it in small chunks. Same question goes for the test first group. Really interesting blog article though. Thanks for sharing your clear thoughts.

Andy


 Hakan Erdogmus 
January 24. 2008 19:06
Hakan Erdogmus

Also I should point out that our research team included three people: one was a TDD advocate, the other person was mildly skeptical, the third was highly skeptical. Plus, publications like Transactions of Software Engineering, where this study was published, scrutinize confirmation bias very carefully. And negative results are publishable and valued in software engineering research (of which there are many examples, including for TDD).

The discussion under "Lies, Damn Lies, and..." give the impression the abstract was trying to fool the readers. The abstract says "We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed." Notice the "independent of the development strategy employed" part, which you omit when you quote, and then imply this was deliberate on our part.  

Regarding the quality results, first it's important to take into account the fact that the control group had to write tests as well, so we didn't evaluate TDD against a "testing is optional, go ahead do it in the end" alternative. Second even if the average and median quality scores for the non-TDD group were slightly better, the effect size was very small and the statistical significance was, well, non-existent. The box plot is a bit misleading because it's plotting scale zooms in on a small range to highlight the differences. Better to look at the absolute numbers. Together small effect size and no-significance point to "no discernable quality difference."

Productivity was different because the effect size was actually large.

We did catch that quality was more predictable for the TDD group: we state in the paper "Test-First programmers did not achieve better quality on average, although they achieved more consistent quality results. We attribute the latter observation to the influence of skill on quality, which Test-First tended to dampen. Writing more tests improved the minimum quality achievable and decreased the variation, but this effect does not appear to be specific to Test-First."

You conclude that TDD's relationship to quality is problematic at best. In this ONE study, it was indeed problematic, and we believe because tests were mandatory by the very design of the study.

Since 2001, 23 TDD studies were published. Of the 22 studies that evaluated some aspect of internal or external quality with vs. without TDD, 13 reported improvements of various degrees (some being significant), 4 were inconclusive, and 4 reported no discernable difference (including our study). Only one study reported a quality penalty for TDD. Studies that evaluated defect density exhibit most dramatic improvements. Notably all studies conducted with professional developers reported a quality advantage for TDD. The same however cannot be said of productivity.

Of the 17 studies that evaluated productivity, only 4 reported an improvement with TDD (ours included), while 2 reported no important difference, and the remaining studies reported a penalty ranging from minor to significant. The extent to which participants adhered to TDD and the choice of the alternative technique against which TDD is compared are likely determinants of whether TDD incurs a productivity benefit or penalty. In cases where the alternative technique doesn't involve testing or a viable substitute, or in which testing is effectively optional, a productivity penalty is almost invariably expected and observed. In addition, differences in the way productivity is measured most likely account for the differences in the observations.  

I hope this information helps.


 Hakan Erdogmus 
January 24. 2008 19:09
Hakan Erdogmus
Looks like the beginning of my post (#12) was truncated... here it is...

I am one of the co-authors of this study. Good points, but not all are justified.

You are right about the relationship between observations and causality. Causality is a matter of coming up with a viable theory that explains the observations, and as such the theory remains refutable or revised as new evidence emerges. A

We never claimed "we proved TDD". In fact, such a thing is impossible, especially with a single small study like ours. Even with many studies, you can only hope to collect evidence and build strength of evidence (or lack thereof) and a viable theory that is refutable. In the end, even when you have compelling collective evidence, that's all you have and you still haven't proved anything in the mathematical sense... Not in our field, where context is so important and so multi-factorial.


January 24. 2008 22:06
Jacob
Some really good comments and I've held back a little here because I want to see how things develop.

It looks like many of those who commented didn't actually read the study. That's not unexpected and I'm not saying that to be derogatory. It was only a weird fluke that I read it myself. In the study, it answered questions like developer skill break down in the different groups (though not in enough detail to do much with, sadly) and when precisely the tests were written for the "Test Last" group (answer: after each functional unit was written. There's a handy chart on page 4). I didn't include those in the original post simply because there wasn't much I wanted to say about them. They are interesting, just not enough to have penetrated into my (admittedly limited) analysis.

I'm particularly flattered that Hakan Erdogmus (one of the study's three very handsome authors) dropped by. I started addressing some of his points, but it got kind of long and I wanted to spend more time on what I had to say. I'll post something later, but wanted to get this more general response out today.


 Carl 
January 25. 2008 11:23
Carl
Nice article, if nothing else it indicates that one should try something  for themselves to see if it is right for them.

I have tried TDD and it is a paradigm shift for me, as was procedural to OO programing and waterfall to agile methodology. At first I was uncertain  wither these paradigm shifts were worth the effort and much like TDD I think it would b difficult to prove that OO and Agile are more productive and produce better results - but I took a leap of faith on both counts and I at least "perceive" myself to better at software development because of OO and Agile.

I expect the TDD (and BDD) skill level is still ramping up and in the long run it will become more accepted, much as OO and Agile practices have. That said, I must admit it still tastes a bit like a bitter pill to me, but perhaps that will change as I become more skillful  using it.


January 25. 2008 18:04
Jacob
@Cosmin: You dare me? Did we regress back to the third grade while I wasn't looking? You spout a lot of tropes without feeling the need to explain how they relate to the topic in question and that's not only sloppy, it makes you (and by extension whatever you advocate) look bad. Your car seat, uh, I can't call it an analogy, really. Your car seat, um, tangent, for example has no relation to the problem set because spine meets car seat only in production and the spine isn't created by the same agent that creates the car seat. Anything you say about car seats and spines has exactly zero chance of relating usefully to the problems, benefits, and analysis of TDD.

@Carl: No, no, no. You're missing my point entirely. While I'm a pretty religious guy in my family life, I try to keep faith out of my profession as much as I possibly can. I didn't start adopting OO programming methods until I understood the theories regarding why and how adopting it improves software development. When I ran into iterative development techniques, I compared them logically to the waterfall-Rational methods I was then using and saw that there was likely some good meat on them bones. If I can't explain why then I'm no better than a shaman telling his tribe to join the bonfire boogie so the rain will fall.

Note that I'm not looking for absolute proof, and I'm not discounting peer testimony. Logic and vicarious experience are both convincing in their own way. Indeed, you can trade one for the other to some extent. If, for example, a theory is rational but weak, you can substitute the experiences of people you trust to make up the difference. Just as when a theory is strong, I'm willing to spend the resources experimenting with it on my own even if everyone else says that it doesn't work.

Oh, and I couldn't care less how implementing something makes me feel. If I perceive myself going the speed limit and a cop nabs me for 25 miles over, I'm not going to be happy with whatever it was that engendered the perception. If OO didn't deliver the goods, I wouldn't be using it no matter how I felt about my leet skillz when doing so. Any theory you understand can be used to make predictions. Those predictions should be made and then tracked.


January 27. 2008 09:44
Cosmin Lehene
@Jacob: The "dare" thing was the friendly intro and apparently a bad decision as you didn't smell the smile in the text. I thought it's obvious that my comments relate to the quote, not to your entire rant.

You talk about having a better understanding of the problem domain AFTER implementing it. ("at which point do you have a better understanding of the problem domain, before implementing it or after?")

Do you really implement the thing first and than try to understand what you did there and why? And by doing that you could test it better AFTER you did the whole thing? Is it easier to change it when it's already finished, or during development?

Anyway, I tried to explain that you should know what the problems are while implementing it. By writing a unit test for a piece of code that doesn't exist yet. You state your expectations for some behavior (which hasn't been created yet) this is in harmony with the current layer needs. This way you get an interface that fits your needs, not an implementation that dictates the interface you should use and for which you have to ADAPT your needs. (top-bottom vs bottom-up)

"spine meets car seat only in production". I doubt that. You should meet some industrial designers.

car seat analogy: you think about your spine's best position before actually doing the seat: The car seat should provide a curvature that is expressed by the S system of inequations.
This is one "unit test"
The car seat should be configurable to different people heights. We transform a requirement in a measurable set of constraints that give birth to another set of constraints and another set of unit tests.
We fix one unit test that fails, we optimize our "product" (the seat or the code) and then we move on to the next.

Eventually you'll discover that your tests should exercise each unit that is part of your car seat.

At every step, not only that I think about the problem domain, but also build my system in a way that can be measured and tested. So my unit is testable and the performance can be measured. By design. During development I have an assurance that if I break one design requirement I'll know it before advancing development - which would make it harder to fix.

If you create your system and AFTER that try to test it you eventually have to redo big parts of it just to be able to properly test it.

As for what the your rant (and the study) talks about: you focus on productivity as a local optimum problem.

Advocate? No sir. I was just making a note on some of your more general thoughts.

Thanks,
Cosmin


 Matt 
January 28. 2008 12:24
Matt
Two groups of twelve students?  That doesn't seem like nearly a big enough sample size.  How did they split the groups?  How do you know one group had better programmers or better team players?  Or had previous experience to the tools used?  This all would greatly affect the results.  

You would need a lot more completely randomized groups to isolate this variable.


 Samuel A. Falvo II 
January 28. 2008 16:23
Samuel A. Falvo II
This whole article is bunk if you ask me.  First of all, 24 people is simply not large enough.  Second, many folks are responding with the usual cheerleading, "YEAH!  YOU TELL 'EM!  I encountered a bunch of TDD zealots and they didn't know squat!"

Well, let me tell you something: I have adopted TDD in everything I do, from deep-embedded work to desktop application work to systems programming, and ACROSS THE BOARD my code quality and productivity improved.  Does this mean I don't introduce bugs in my code?  Absolutely not; they still occur.  And WHEN they occur, I update my tests accordingly.  Ego doesn't enter into the picture here; I do not think my code is bug free.  I *DO* think my code does not regress EVER, however.

Let's get one thing clear above all else: A UNIT TEST IS A DOUBLE-CHECK FOR YOUR CODE.  That's all it is.  You write the tests first to make CONCRETE IN YOUR HEAD the API.  It has *NOTHING* to do with product requirements.  Those are established by integration tests.  NOT UNIT TESTS.

The evidence in support of your arguments are all biased, perhaps accidentally, but it sure seems otherwise to me, distinctly against TDD.

Let's see the same thing not with 24 programmers, but with 2400.  Yes, that's right -- two thousand, four hundred.  That represents a relatively realistic number for an average sized, multi-national software engineering firm.  NOW let's see the results.


January 29. 2008 16:29
Jacob
@Samuel A. Falvo II: I can't be the first one you've heard this from but in case repetition will drive the point home here it is from me: maybe it would be wiser for you to refrain from comment until you know what you're talking about.

I'll follow up with this thought: emotional appeal has little chance of convincing anyone not already convinced. This is true no matter how vehemently the emotional appeal is expressed. Next time, I suggest using things like logic and/or reason in presenting your case. I'm not saying that your claims aren't true. I'm just pointing out that the way you deliver your sermon isn't actually producing any converts.


February 1. 2008 13:15
Nate Kohari
Excellent post, Jacob. Your argument is very well thought-out. I myself have trouble sometimes seeing the benefit of TDD. My major argument against it is that the two places that have the most flaws in a typical application -- the interface and the database -- are the ones that are the hardest to test.


February 1. 2008 19:13
Adron
Excellent write up.

I'm all for TDD, because most developers tend to not understand the problems and instead just plummet into writing code (something developers are taught not to do in college, but seems to not matter once they get out in the world).  When a developer goes straight to code without figuring out what is actually being solved, problems grow down cycle exponentially.

TDD is a way to prevent that by forcing people to code a test of the solution, thus realizing what the solution is, before plummeting into attempting to build it.


February 1. 2008 19:42
Jacob
@Adron: Yeah, I keep hearing that as a refrain that TDD forces people to do whatever. Let's just say that I'm anti-coercion. If you have to force me to do my job right, why hire me in the first place? That's kind of why I wrote scruffylookingcatherder.com/.../tdd-or-pout.aspx">my latest post about TDD and POUT.


February 2. 2008 20:41
Rob Conery
This, my friend, is how you write a blog post. I haven't read a post as well-written, well-supported, and thought-provoking as this in quite a while.

I disagree on a few points, but that doesn't matter. This post was as sexxy as it gets.


February 3. 2008 05:35
Adron
Hey Jacob...

Ya got a really good point.  If I hire the developers, you bet yer ass I'm not going to worry about coercing anyone to do anything.  They'll most likely do it or learn to do it the best way that will work for the team I put together.

However, in many if not most scenarios I walk into I don't get to pick the developers on the team.  In many places, especially I've noticed that something like this is almost needed.  Because you can't just get rid of the the ones who are obstructionist.

But you do have a point, one shouldn't need to force the developers to do things right, but often it literally comes down to that.

I always say something similar in another debate I commonly have;  One should never have to defend themselves from another human being, ever, but it happens regardless and one should be prepared, the consequences are too drastic to imagine.


February 4. 2008 18:44
Jacob
@Adron: You're right that a situation where you have no say in who is on your team can be frustrating. And I'll give you that sometimes a process change can be sold to an existing organization easier than "purer" alternatives (like firing half of the department and replacing them with competent developers). I will point out that just as a good developer is going to succeed whether using POUT or TDD, a bad developer has as big a chance to screw up whether using POUT or TDD. TDD isn't going to make a bad developer good, at least, that isn't a part of the claims I've been reading so far.

That said, pushing through a process change gives you a window of disruption that you can use to teach better principles. It's a chance, but a slim one. I can see it being worth it with TDD if better unit testing is needed and you can't get traction with POUT. TDD's emphasis on unit testing could be leveraged to bring effective unit testing into an organization that currently has none. I'd prefer to be able to simply teach tenets of effective unit testing and put processes and expectations in place that encourage its use. But our preferences aren't always consulted in our individual situations. I'll keep that thought in mind in case I get into a situation where that's the case.


 Russ 
February 7. 2008 16:20
Russ
One facet of the debate I think warrants more discussion is this:
In "Test Last" organizations what actually happens when teams are trying to get something out the door? I propose that inevitably they will be forced to abandon testing efforts to meet deadline and move on to new projects (at least some of the time). - Do they ever go back and finish writing the tests?
I would hypothesize that TDD organizations at a bare minimum gain something from saying "we test first" because if you test first, all the tests at least get written, and at a time when the ideas are fresh.


February 7. 2008 16:53
Jacob
@Russ: Yeah, I hear that a lot.  Two points.  First, Test Last is misleading because most people who use scruffylookingcatherder.com/.../tdd-or-pout.aspx">"POUT" techniques don't actually save testing for last. I personally will put in unit tests as soon as a "functional unit" is complete ("functional unit" can vary in definition, but mostly is a class or set of closely related classes). So if crunch time comes, many of the tests have already been written.

Second, TDD doesn't provide immunity to deadline pressure. Nice if you can get your testing to stick in the face of pressure, but every argument you use to continue strict TDD in the face of slipping deadlines can and should be used for POUT as well. Effective testing is difficult enough that when pressure ratchets up, I'll bet TDD test quality and coverage drops as much as POUT testing does. It takes discipline to test well. That discipline is needed as much to make your TDD tests effective as it is to make other tests effective.


February 14. 2008 17:51
Rup
One thing to note with TDD is that it's primarily a design technique, rather than being only a testing strategy.


March 5. 2008 13:51
Patrick Wilson-Welsh
Perhaps we can side-step the TDD issue and go straight to the following:

-- for a given project, how good is your unit-test suite? I take my definition of "good" from Gerard Meszaros excellent "jUnit Test Patterns" book from 2007.

So elements of "good" include how breakable/robust your tests tend to be, how well they cover happy paths, edge cases, and exception cases in your code, and how well they localize failures (pinpoint bugs).

Further, let's focus on this question: how well is your team continuing to improve its unit testing skill?

So, for example, do they know when and when not to mock? Do they know how to pull apart test suites into different CI targets of different kinds (isolation tests vs end-to-end tests) that run at different frequencies.

Finally, let's summarize it this way: how well does your team understand the value of feedback from healthy unit tests?

After writing thousands of tests  test-first, and thousands test-second, I personally prefer TDD. But what I care most about these days really is this:  how healthy are your tests, and how healthy is your design?

If you've got good coverage, good failure localization, a quick build, fast feedback, low overall cyclomatic complexity, and a good object model, I don't care as much as I once did how you got there.


January 28. 2010 15:02
Joe
OMG! I hope that no project managers are reading this blog and convincing themselves that TDD is useless and should therefore be banned from use on future projects.

I think almost anyone would agree that testing is a good thing, whether we are using a test-first or test-last approach. Can we at least agree on that?

I haven't read the study but my guess is that it had a short time frame that did not include a maintenance cycle where features were added and bugs were fixed over time. Furthermore, I would guess that the study did not use real-world, for-profit software projects, where projects managers often cut corners to achieve short-term objectives and are willing to let testing slide to make a deadline.

The real value of creating tests is the ability to regression test as we change previously working code to ensure it still works. This allows developers to add features and do "fearless refactoring" because they're confident the code is not going to break severely and even if it does, the tests should catch the breakage sooner rather than later.

On a real-world project, it is often tempting to code first and engage in wishful thinking like "We can write tests later, when we have more time. Let's just get it out the door". The problem is, under management pressure, "later" often never comes.

Without tests, there will be an increasing fear of making code changes as the maintenance cycle moves along. This can end up resulting in either less features or buggier software.

The value of TDD or the test-first approach is that it forces you to write tests, ensuring that you can do proper regression testing over the maintenance cycle. Some developers might not like that but those are the same developers who usually don't go back and write tests after writing the code. They usually defend themselves by saying "I meant to go back and write tests by the project manager kept giving me new development tasks so I never had time". Unfortunately, some of us have to maintain the code those guys write.

The test-last approach can work...as long as the developers are disciplined enough to actually write the tests after the code has been written. In the real-world, this doesn't always happen.

Information

    Calendar

    <<  February 2012  >>
    MoTuWeThFrSaSu
    303112345
    6789101112
    13141516171819
    20212223242526
    2728291234
    567891011

    View posts in large calendar
    Disclaimer
    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2012 Scruffy-looking Cat Herder