Isaac Gouy, 10th June 2006.
Ron Jeffries told me "Craig Larman's book, chapter 6 IIRC, is full of studies, some quite large (tens of thousands of projects)…" and Chapter 6 Evidence is interesting, so interesting that I wanted to know more, so interesting that I actually went to a library and looked at the source material. That's when it changed from interesting to sad.
I'll refer to Craig Larman "Agile and Iterative Development: A Manager's Guide" Addison Wesley Professional (2003) as the book and referenced source material as the source.
The book said "Timeboxing by itself" increases productivity and developer productivity at DuPont was 4x better. Wow! 4x better! I have to learn their techniques! I found some information online but it didn't match the book; and so to a library, to look at the same source material the book referenced.
Oh.
I learned that we cannot trust Chapter 6 Evidence - sometimes we are not given the evidence that's in the source, sometimes the evidence has been changed, sometimes the evidence seems not to be in the source.
These are just examples, look and you'll find more. Each example has - quotations from the book, corresponding quotations (or summary) from the source, and a quick discussion of how the book misstates the source.
"Research shows that timeboxing itself brings benefits in terms of increased productivity." (the book, page 54)
"Timeboxing by itself has been shown to have a productivity effect. DuPont, one of the earliest timebox pioneers, found developer productivity around 80 function points per month with timeboxed iterations, but only 15 to 25 function points for other methods. [Martin91]" (the book, page 77)
Unfortunately the book does not refer to a specific figure or table or page in the source, so we have to check the whole thing. The "15 to 25 function points for other methods" data can be read from the graph on page 225. Here are the labels that describe the data on that graph:
"Average function points per person-month using the Timebox methodology with the Cortex toolset…" (the source, graph on page 225)
"Function points per person-month with a traditional lifecycle, including development with the 4GL FOCUS and NATURAL, as well as COBOL and PL/1." (the source, graph on page 225)
James Martin "Rapid Application Development" 1991
(The graph is reproduced in "Rapid Application Development (RAD): Studies of Differences in Productivity" Productivity with CASE
http://sysdev.ucdavis.edu/WEBADM/document/radmanage-studies.htm).
The book provides DuPont data as the evidence that "Timeboxing by itself" benefits productivity. The book fails to mention that the higher productivity figures were for code generation using CASE tools, and the lower productivity figures were for programming with 3GLs and 4GLs.
Timebox Development has been found to produce extraordinary productivity at DuPont, where it was initially developed. DuPont averages about 80 function points per month with timeboxing, compared to 15 to 25 with other methodologies (Martin 1991).
Steve McConnell "Rapid Development" 1996, page 582
Both Steve McConnell's "Rapid Development" and the book credit James Martin as the source for the DuPont data. The evidence James Martin provides does not support the book's claim that "Timeboxing by itself" benefits productivity. We are not being given the evidence.
'In the first study [MacCormack01, MVI01] the question, "Does evolutionary development, rather than the waterfall model, result in better success?" was explained in a two year in-depth analysis of projects. The report's conclusion?
Now there is proof that the evolutionary approach to software development results in a speedier process and higher-quality products. … The iterative process is best captured in the evolutionary delivery model proposed by Tom Gilb.' (the book, pages 65-66)
"Now there is proof that the evolutionary approach to software development results in a speedier process and higher-quality products." (the source, page 75)
"The iterative process is best captured in the evolutionary delivery model proposed by Tom Gilb." (the source, page 78)
Alan McCormack "Product Development Practices That Work: How Internet Companies Build Software" MIT Sloan Management Review 42(2) Winter 2001, 75-84.
http://www.sloanreview.mit.edu/smr/issue/2001/winter/6/
Those two italicized sentences are identical to the source, so what's the problem? One problem is that the book has cut out ("…") three pages of text and stuck those two sentences together - that's just wrong! The bigger problem is the suggestion that those sentences are the report's conclusion, when in fact the report's conclusion Putting It All Together doesn't even begin until four pages later. We are not being given the evidence.
The real conclusion in the source is very positive about evolutionary-delivery, here's how it ends:
Uncertainty in the Internet-software industry dictates short microprojects - down to the level of individual features. Traditional market research has little value here, so companies need an early working version to gain feedback on the product concept. In more-mature environments, however, companies can specify more of the product design upfront, use longer microprojects and develop greater functionality before needing feedback. In a world where customer needs and the underlying technologies in a product are known with certainty, only one large microproject is necessary, and the waterfall model suffices. An evolutionary-delivery model represents a transcendent process for managing the development of all types of software, with the details tailored to reflect each project's unique challenges. (the source, page 83)
"… and the waterfall model suffices."
We are not being given the evidence.
"The study identified four practices that were statistically correlated with the most successful projects:
1. An iterative lifecycle with early release of the evolving product to stakeholders for review and feedback.
2. Daily incorporation of new software and rapid feedback on design changes (daily builds with regression testing).
3. …"
(the book, page 66)
"Four Software-Development Practices That Spell Success
Analysis of Internet-software-development projects in a recent study uncovered successful practices
• An early release of the evolving product design to customers
• Daily incorporation of new software code and rapid feedback on design changes
• …"
(the source, page 76)
Alan McCormack "Product Development Practices That Work: How Internet Companies Build Software" MIT Sloan Management Review 42(2) Winter 2001, 75-84.
http://www.sloanreview.mit.edu/smr/issue/2001/winter/6/
The book has stuck "An iterative lifecycle" onto the practice described as "An early release of the evolving product design" - that's just wrong! Instead of reporting the evidence, the book has distorted the evidence.
Empirical research into software development is rare. It provides an opportunity to question the conventional wisdom, and when research results are misstated we lose that opportunity to learn.
Here's the nearest thing to investigating the effect of an iterative lifecycle that I can find in the source:
The process of distributing an early release, gathering feedback, updating the design and redistributing the product to customers would seem an ideal way to ensure that the evolving functionality meshes with emerging customer needs. Surprisingly, however, the data showed no relationship between the performance of the final product and the number of beta releases. (the source, page 80, emphasis added)
The real evidence shows no relationship between success and an iterative lifecycle. We are not being given the evidence.
The book refers to "Exploiting Trade-offs between Productivity & Quality in the Selection of Software Development Practices" Working draft submitted to IEEE Software. We only have access to the final published article, and should therefore expect to see some differences.
"Similarly, in the model of productivity factors, over 50% of the variation in productivity was related to just two factors, both related to iterative practices:
• …
• The use of daily builds with integration and regression testing…" (the book, page 68)
"The significant development process measures are the use of
• An early prototype
• Daily builds" (the source, page 83)
MacCormack, A., Kemerer, C.F., Cusumano, M., and B. Crandall, "Trade-offs between Productivity and Quality in Selecting Software Development Practices." IEEE Software 20(5) 2003 78-85
http://www.cc.gatech.edu/classes/AY2005/cs6300_fall/papers/maccormack.pdf
The book has stuck "with integration and regression testing" onto the measure described as "Daily builds" - that's just wrong! Instead of reporting the evidence, the book has distorted the evidence.
…the use of integration or regression tests as code is checked in appears in our final model predicting defect rate but not in the model predicting productivity. Conversely, the use of daily builds appears in our final model predicting productivity but not in the model predicting defect rate. (the source, page 83, emphasis added)
That's why the working draft is titled "Exploiting Trade-offs between Productivity & Quality …" We are not being given the evidence.
The book refers to "Exploiting Trade-offs between Productivity & Quality in the Selection of Software Development Practices" Working draft submitted to IEEE Software. We only have access to the final published article, and should therefore expect to see some differences.
"In a follow-up study [MKCC03], MacCormack and colleagues examined the effect of eight practices on productivity and defects (reported by customers), including IID and releasing a partial system early for evaluation and evolutionary design. The projects ranged from application software to embedded systems, with median values of nine developers and a 14 month duration; 75% used iterative and evolutionary development, 25% the waterfall." (the book, pages 66 and 67)
"Specifically, the study [MKCC03] showed that IID was correlated with lower defects." (the book, page 78)
"We collected data on eight software development practices.", "Functional (or requirements) specification", "Detailed design specification", "formal design reviews", "formal code reviews", "Subcycles", "Early prototype", "daily system builds", "integration or regression test" (the source, page 80)
"Finally, although 76 percent of projects divided development into subcycles, they varied greatly in how early they showed a prototype to customers. (the source, pages 80 and 81)
MacCormack, A., Kemerer, C.F., Cusumano, M., and B. Crandall, "Trade-offs between Productivity and Quality in Selecting Software Development Practices." IEEE Software 20(5) 2003, 78-85
http://www.cc.gatech.edu/classes/AY2005/cs6300_fall/papers/maccormack.pdf
The book equates IID (and iterative and evolutionary development) with the practice identified in the source as subcycles - dividing development into separate subcycles. About 75% of the projects divided development into subcycles.
The book labels some of the other development practices iterative-related and reports that they were found to be significant in the multivariate regression models of defect rate and productivity. There's no mention of subcycles.
…the weak relationships observed between defect rate and both the completeness of the detailed design specification and breaking development into subcycles are no longer present." (the source, page 83)
The book fails to mention that the practice it equates with IID, breaking development into subcycles, was not found to be significant in either the defect rate or productivity multivariate model. We are not being given the evidence.
"In a study of failure factors on 1,027 IT projects in the UK [Thomas 01] (only 13% didn't fail), scope management related to attempting waterfall practices (including detailed up-front requirements) was the single largest contributing factor for failure, being cited in 82% of the projects as the number one problem, with an overall weighted failure influence of 25%." (the book, page 74)
"Figure 1: Management activities contributing to failure. Poor scope management 81.6% [frequency mentioned] 24.7% [importance]"
"Figure 2: Failure stages." "Requirements definition 76.3% [frequency mentioned] 23.2% [importance]"
"Figure 3: Causes of failure." "Unclear objectives and requirements 73.7% [frequency mentioned] 18.1 [importance]%"
"Figure 4: Critical success factors." "Clear, detailed requirements 81.6% [frequency mentioned] 81.6% [importance, 19% in BCS Computer Bulletin]"
"Figure 5: Key project management characteristics. …"
(The citation given in the book is incorrect, the author of the article is Andrew Taylor.)
Andrew Taylor "IT projects sink or swim." BCS Review 2001
http://archive.bcs.org/BCS/review01/articles/itservices/itprojectssinkorswim.htm
Poor scope management was the single largest management activity contributing to failure. The single largest cause of failure was "Unclear objectives and requirements". We are not being given the evidence.
"In another study [Jones00] 47 factors that increase or decrease productivity were identified, including project complexity:
| Low Complexity | High Complexity |
|---|---|
| +13% | -35% |
This indicates a productivity advantage by organizing projects in low-complexity mini-project iterations." (the book, page 78)
Table 5.4 ranked "Low project complexity" 16th out of 24 factors on positive impact; the top 3 factors had +350%, +65% and +55% positive impact (compare to +13%) (the source, page 133)
Table 5.5 ranked "High project complexity" 10th out of 24 factors on negative impact; the top 3 factors had -300%, -90% and -87% negative impact (compare to -35%) (the source, page 134)
Capers Jones "Software Assessments, Benchmarks, and Best Practices" 2000
The book fails to mention how unimportant "Low Complexity" and "High Complexity" are compared to the other factors. We are not being given the evidence.
You should not deceive yourself into thinking that simply dividing the system into lots of pieces can solve difficult architectural and technical problems. Indeed, in the case of software, faulty decomposition into large numbers of small pieces can make the application harder to build rather than easier. (the source, pages 82 and 83)
The source explicitly warns against the conclusion given in the book. We are not being given the evidence.
"Another relevant study [Jones00] showed that as the size of the project decreases (measured in language-independent function points), the monthly productivity of staff increases (Figure 6.4). The data illustrates the motivation of organizing a project into small mini-project iterations…" (the book, pages 76 and 77)
Unfortunately the book does not refer to a specific figure or table or page in the source, so we have to check the whole thing. Figure 6.4 "Productivity vs. Size 500 Projects 1997-1999" in the book seems to have been created from the averages column in Table 3.2 "United States Productivity Rates circa 1999". (the source, page 65)
"If data representing all size ranges and all classes of software is graphed, the ranges are too broad to draw meaningful conclusions. … When the range of data is this broad, it is quite unsafe to use "averages" for any serious business purpose. This is why we segregate our benchmark results by size and type of software." (the source, page 81)
Capers Jones "Software Assessments, Benchmarks, and Best Practices" 2000
The book fails to mention the warning that the data ranges are too broad to draw meaningful conclusions. We are not being given the evidence.
An example of a simplistic and flawed conclusion that we encounter fairly often is: Because large software projects are hazardous, let us divide all large projects into smaller independent components. … Unfortunately, large software projects are large because there is no effective technology for decomposing them into smaller independent pieces. (the source, page 82)
The source warns against the notion put forward in the book. We are not being given the evidence.
"A study [Solon02] against a sample set (43,700 projects) showed the following productivity differences between IID and waterfall:
| Rigorous IID or Evolutionary Prototyping | Rigorous Waterfall |
|---|---|
| 570 function points per full- time equivalent developer | 480 |
(the book, page 76)
"The benchmark also asks respondents to identify their generic lifecycle as one of waterfall or prototyping…" (the source, page 9)
The "570" and "480" values can be read from a graph with the data labels "Prototype" and "Waterfall". (the source, Figure 1, page 10)
Robert Solon Jr., and Joyce Stratz "Benchmarking the ROI for Software Process Improvement (SPI)" Software Tech News 5(4) 2002 6-11
http://www.softwaretechnews.com/stn5-4/stn5-4.pdf
The book has stuck "IID" and "Rigorous IID or Evolutionary Prototyping" in place of "prototyping" and "Prototype". Instead of reporting the evidence, the book has distorted the evidence.
… the software resulting from each iteration is not a prototype or proof of concept but a subset of the final system. (the book, page 11, emphasis added)
The book clearly says a prototype is not the same as the result of an iteration. We are not being given the evidence.
"A study by Deck [Deck94] also shows a statistically significant reduction in defects using an iterative method." (the book, page 79)
Unfortunately the book does not refer to a specific study or section or page in the source.
I haven't been able to find any mention in the source of "a statistically significant reduction in defects".
Michael Deck "Cleanroom Software Engineering: Quality Improvement and Cost Reduction" Proceedings 12th Pacific Northwest Software Quality Conference 1994.
http://www.cleansoft.com/cleansoft_library.html#PNSQC94
The source doesn't seem to say anything about a statistically significant reduction in defects. We are not being given the evidence.
When it looks too good to be true, it probably is.
The book says Chapter 6 Evidence provides data. There are examples where the book fails to mention data that would undermine conclusions about timeboxing and an iterative lifecycle. There are examples where the data has been changed, and the changes support conclusions about an iterative lifecycle when the real data does not. There is no easy way to tell which data in Chapter 6 Evidence is incomplete or which data has been changed.
We cannot trust Chapter 6 Evidence of "Agile and Iterative Development: A Manager's Guide" - use that chapter as a reference-list and be sure to read the real evidence.