estimating software development time

What now seems like ages ago, I used to work at a small startup company - I was the sysadmin/web programmer. Long before I got there, they had purchased a shopping cart system that had integrated credit card transactions.

A few weeks after I started, they released their product, and one of the higher ups in marketing told me redesign the program so that it had a different flow - one click. She wanted an estimate as to when I could get it done - my initial estimate was two to three months from when I started coding.

This estimate seemed a bit high to her, and I suggested bringing it up with the CEO. The CEO of the company was previously from a large company in which he worked closely with engineering - this turned out to be a very good thing. When he asked how long will it take, my response was "It is a change to fundamental design assumptions in a one thousand line long program that was written to someone else. The choices are to rip it apart and put it back together or to re-write it. Whatever the case, it will be on the order of one thousand lines long."

The marketeer balked saying that I didn't say how long it would take. The CEO shook his head and replied that what I had said not only was an estimate of work, but a time frame on the order of months for a single programmer with other responsibilities.

Estimating the time for a project based upon lines of code is a fairly common process in the computer world. Granted, different languages have different rates - while it is possible to write 100 lines of C code a day without problem, you would be lucky to get a debugged sendmail rule set (10-20 lines) in a week (and thats with a gun at your head and several bottles of hard liquor).

One algorithm for computing the time it takes for a project is:

work-months = effort * (lines/1000)^size

This has the assumption that the actual programming language does not matter because a higher level language requires fewer lines of code and thus a shorter development cycle.

lines of code: The common term for a line of code in estimating models is 'SLOC' which stands for Source Line Of Code. This is a line that is not blank and not a comment. Also the same line cannot be counted twice (say if a complex if conditional or SQL statement is used twice it is only counted once). This number is often handled in the thousands or 'KSLOC'. This form of counting can be found in Barry Boehm's book Software Engineering Economics and refers to the 'Constructive Cost Model' which is a 'default' value for much of the estimating here (abbreviated COCOMO).
effort: The 'effort' for a project relates to how difficult it is to write code for it and the amount of bug fixing. It can range from difficult and rigorous (E-Commerce Development = 3.60) to much easier or tolerant (the fact that military development is listed as 2.77 scares me). The default is 2.94, and web development falls in at 3.30.
Size: As programs get bigger, they become more difficult to write. Anyone who has ever dealt with projects in the KSLOC range will know of this. Projects that are in a lower level language have a higher penalty for size (Embedded systems = 1.110) while projects and languages can more easily handle this complexity (Web development and E-Commerce = 1.030). The larger the project, the less productive it is to write code because of the increased coordinating, communication (meetings), and rework due to misunderstandings. These go up exponentially with the size of the program.

So going back to the original project, it was an E-Commerce system at about 1000 lines of code. This translates into
3.60 * 1^1.030 = 3.60 months
How did I know it was about 1000 lines of code? For that case it was easy, I had someone else's code to guesstimate off of. We are not always so fortunate in estimating.

The Delphi technique (described in Karl Wiegner's article Stop Promising Miracles(Feb. 2000) in the Software Development Magazine http://www.sdmagazine.com/) is one way to build this estimate. The first step is to break the project down into modules of some sort. These could be functional modules (menu code, scrollbar code, display code) or processes (front end, backend, database). From this, it is easier to make guesses as to the lines of code for a module.

For each module get three numbers - best case, worst case, and expected. From this compute a weighted mean:
(best + worst + 4*expected)/6
and a standard deviation:
(worst - best)/6
The standard deviation is the measure of the amount of deviation in the final number. 99% of all estimates come under mean + 3*SD.

Another method for calculating lines of code is based upon 'Function Points'. This looks at the delivered functionality and then the cost for having that function.

Function                Cost
External input           4
External interface file  7
External outputs         5
External queries         4
Logical internal tables 10

A screen with a tabbed notebook has each tab as an external input. External files are for input or output. Multiple record formats or XML data object types within a file counts as a separate file count as separate files, even if they reside in it the same file. External outputs are reports that are generated from the data. External queries are messages into or out of the program to other systems. Logical tables are the number of tables in the database that are necessary.

A program with 10 data entry screens, 5 files, 10 reports, 2 queries, and 20 tables counts up as:
(10*4) + (5*5) + (10*5) + (2*4) + (20*10) = 323 function points.

For each function point, it takes some number of lines. The table proposed comes from Estimating Software Costs by Capers Jones.

Language    SLOC/function point
C++           53
Cobol        107
Delphi 5      18
HTML 4        14
Visual Basic  24
SQL           13
Java          46

To implement a 323 function point project in C++, this would be on the order of 17,000 lines of code. If this is a web project this would take as a first estimate:
3.30 * (17,000)^1.030 = 3.30 * 18.5 = 61 months

Of course, this is only a first estimate and only takes into account the project type and the lines of code. Many other things can influence the speed of project development. The impact can either be 'linear' or non-linear depending on the location of impact.

On a range from 'very low' to 'nominal' to 'extra high', the non-linear impact can be seen in this table:

Factor            Very Low / Nominal  /  Extra High
Risk Resolution    0.0423     0.1400     -0.0284
Dev. Flexibility   0.0223     0.0020     -0.0284
Precendentedness   0.0336     0.0088     -0.0284
Process Maturity   0.0496     0.0814     -0.0284
Team Cohesiveness  0.0264     0.0045     -0.0284

Precendentedness is a term for the degree of familiarity with new technology and problem domains. So, have a team that is nominal on all counts except for process maturity and team cohesiveness (sounds like a startup?). The project above that had a 1.030 multiplier moves to a 1.030 + 0.014 + 0.002 + 0.0088 + 0.0496 + 0.0264 = 1.1308. This moves the initial size from 18.5 after inefficacy to 24.6 resulting in a re-estimate of 73.8 work months -- a full year more time.

There are also linear adjustments that are multiplied against the re-estimate. Many of these center around the experience of the team (programmer and management) and various constraints and tools at disposal.

Factor                       Low    High
--------------------------  ------ ------
Analyst Capability           1.42   0.71
App. Experience              1.22   0.81
Lang/Tool Experience         1.20   0.84
Personal Continuity          1.29   0.81
Management Capability        1.18   0.87
Management Experience        1.11   0.90
Platform Experience          1.19   0.85
Programmer Capability        1.34   0.76
Execution Time Constraint    1.00   1.63
Main Storage Constraint      1.00   1.46
Platform Volatility          0.87   1.30
Effective Management Tools   1.22   0.84
Multi-site Development       1.22   0.80
Office Ergonomics            1.19   0.82
S/W Tools                    1.17   0.78
Database Size                0.90   1.28
Documentation Match Stage    0.81   1.23
Internationalization         0.97   1.35
Required Re-usability        0.95   1.24
Required Reliability         0.82   1.26
Graphics/Multimedia          0.95   1.35
Legacy Integration           1.00   1.18
Site Security                0.92   1.40
Text Content                 0.94   1.16
Tool Selection               0.95   1.14
Transaction Loads            0.96   1.59
Web Strategy                 0.88   1.45

The above table came from study by Barry Boehm, Casper Jones, and William H. Roetzheim looking at the life cycles of 20,000 projects. If the 73.8 work month project is average in every way except that the programmers are all high capability (0.76) and are language gurus (0.84), the time line moves to 47 months. Allot can be said for the right people and the right tools.

All of this work on estimating has been on the amount of effort it will take. Optimal scheduling of the effort can be found by taking the cube root of the effort and multiplying it with a schedule multiplier.

Default    3.67
Embedded   4.00
E-Commerce 3.20
Web Devel  3.10
Military   3.80

Thus a 47 effort month E-Commerce project takes:
3.20 * 47^1/3 = 11.4 months
Realize, this is the best case for delivery - best case being least total effort. There are some other numbers that can be computed from this.

Least Effort Delivery Time: This number is 2x the optimal delivery time and represents the point at which it takes the least total effort to reach. This is from less pressure, more testing, and fewer mistakes because things are more well thought out. This will have a cost reduction of roughly 50%.
Region of Impossibility: As delivery time grows less than the the optimal time, the amount of effort increases exponentially. At less 75% of the optimal time things become impossible. "Impossible" you ask? Of the 20,000 projects examined above, 750 of them tried to deliver a final product in less time than the optimal. None of them met a schedule of less than 75% of the optimal time. It appears to be impossible to accelerate the schedule less than this time.
Adjusted Staff Months: The actual amount of effort it takes to release a project is about optimal⁴/actual⁴. For a 12 optimal month project (47 effort months) to be released in 10 months roughly doubles the the amount of effort it takes to release. This reflects the increased number of coders, accelerated rate of testing and debugging.

So, how can you get a product out the door faster?

Reduce functionality: By reducing the number of function points, this scales back the number of lines of code necessary and thus reduces the effort required.
Decouple tasks: Instead of one large project, break it into two projects that have a well defined interaction. The fewer interactions between parts of code, the easier it is to write.
Redundant parallel development: If you happen to have the resources (people and money), simply assign multiple teams to write the same code and use the one that is done first. The other code is then backed up in case there are inefficiencies or bugs in the first code. Much can be learned from another design standpoint that can assist in debugging your own code.
Increase reuse: With languages that are cleanly modular, if previous projects have been designed with reuse in mind, then later ones can build upon them. This is often a Good Thing.

All of this seems like black magic and voodoo? Well, it is - yet, a lot of time and research has been put into getting these numbers. For software development, it becomes very useful to know when something will get done and how long it will take. Any clue more than 'it will be done when it is' can be helpful to a company and those depending on it - just look at how many headaches Microsoft's poor estimates hurts people depending on them.

Primary sources:

http://www.sdmagazine.com/
Project Management Made Simple by David King

The Eugenics Problem	SLOC	Brooks's Law	software development
Function Points	Project Management	vaporware	Rational Unified Process
marketing	Application Lifecycle Management	Software Development Life Cycle	Drop dead date
Extreme Programming	conditional	complexity	military
sendmail	Software design	Web Development	Distributed Everything
internationalization	Transaction	Renown	Notebook