What now seems like ages ago, I used to work at a small
startup company - I was the
sysadmin/
web programmer.
Long before I got there, they had purchased a
shopping cart
system that had integrated credit card transactions.
A few weeks after I started, they released their product, and one of the higher ups in marketing told me redesign the program so that it had a different flow - one click. She wanted an estimate as to when I could get it done - my initial estimate was two to three months from when I started coding.
This estimate seemed a bit high to her, and I suggested bringing it up with the CEO. The CEO of the company was previously from a large company in which he worked closely with engineering - this turned out to be a very good thing. When he asked how long will it take, my response was "It is a change to fundamental design assumptions in a one thousand line long program that was written to someone else. The choices are to rip it apart and put it back together or to re-write it. Whatever the case, it will be on the order of one thousand lines long."
The marketeer balked saying that I didn't say how long it would take. The CEO shook his head and replied that what I had said not only was an estimate of work, but a time frame on the order of months for a single programmer with other responsibilities.
Estimating the time for a project based upon lines of code is a fairly common process in the computer world. Granted, different languages have different rates - while it is possible to write 100 lines of C code a day without problem, you would be lucky to get a debugged sendmail rule set (10-20 lines) in a week (and thats with a gun at your head and several bottles of hard liquor).
One algorithm for computing the time it takes for a project is:
work-months = effort * (lines/1000)size
This has the
assumption that the actual programming language does not matter because a higher level language requires fewer lines of code and thus a shorter
development cycle.
- lines of code
-
The common term for a line of code in estimating models is 'SLOC' which stands for Source Line Of Code. This is a line that is not blank and not a comment. Also the same line cannot be counted twice (say if a complex if conditional or SQL statement is used twice it is only counted once). This number is often handled in the thousands or 'KSLOC'. This form of counting can be found in Barry Boehm's book Software Engineering Economics and refers to the 'Constructive Cost Model' which is a 'default' value for much of the estimating here (abbreviated COCOMO).
- effort
-
The 'effort' for a project relates to how difficult it is to write code for it and the amount of bug fixing. It can range from difficult and rigorous (E-Commerce Development = 3.60) to much easier or tolerant (the fact that military development is listed as 2.77 scares me).
The default is 2.94, and web development falls in at 3.30.
- Size
-
As programs get bigger, they become more difficult to write. Anyone who has ever dealt with projects in the KSLOC range will know of this. Projects that are in a lower level language have a higher penalty for size (Embedded systems = 1.110) while projects and languages can more easily handle this complexity (Web development and E-Commerce = 1.030).
The larger the project, the less productive it is to write code because of the increased coordinating, communication (meetings), and rework due to misunderstandings. These go up exponentially with the size of the program.
So going back to the original project, it was an E-Commerce system at about 1000 lines of code. This translates into
3.60 * 1
1.030 = 3.60 months
How did I know it was about 1000 lines of code? For that case it was easy, I had someone else's code to guesstimate off of. We are not always so fortunate in estimating.
The Delphi technique (described in Karl Wiegner's article
Stop Promising Miracles(Feb. 2000) in the Software Development Magazine http://www.sdmagazine.com/) is one way to build this estimate. The first step is to break the project down into modules of some sort. These could be functional modules (menu code, scrollbar code, display code) or processes (front end, backend, database). From this, it is easier to make guesses as to the lines of code for a module.
For each module get three numbers - best case, worst case, and expected. From this compute a weighted mean:
(best + worst + 4*expected)/6
and a standard deviation:
(worst - best)/6
The standard deviation is the measure of the amount of deviation in the final number. 99% of all estimates come under mean + 3*SD.
Another method for calculating lines of code is based upon 'Function Points'. This looks at the delivered functionality and then the cost for having that function.
Function Cost
External input 4
External interface file 7
External outputs 5
External queries 4
Logical internal tables 10
A screen with a tabbed
notebook has each tab as an external input. External files are for input or output. Multiple record formats or
XML data object types within a file counts as a separate file count as separate files, even if they reside in it the same file. External outputs are reports that are generated from the data. External queries are messages into or out of the program to other systems. Logical tables are the number of tables in the database that are necessary.
A program with 10 data entry screens, 5 files, 10 reports, 2 queries, and 20 tables counts up as:
(10*4) + (5*5) + (10*5) + (2*4) + (20*10) = 323 function points.
For each function point, it takes some number of lines. The table proposed comes from Estimating Software Costs by Capers Jones.
Language SLOC/function point
C++ 53
Cobol 107
Delphi 5 18
HTML 4 14
Visual Basic 24
SQL 13
Java 46
To implement a 323 function point project in C++, this would be on the order of 17,000 lines of code. If this is a web project this would take as a first estimate:
3.30 * (17,000)
1.030 = 3.30 * 18.5 = 61 months
Of course, this is only a first estimate and only takes into account the project type and the lines of code. Many other things can influence the speed of project development. The impact can either be 'linear' or non-linear depending on the location of impact.
On a range from 'very low' to 'nominal' to 'extra high', the non-linear impact can be seen in this table:
Factor Very Low / Nominal / Extra High
Risk Resolution 0.0423 0.1400 -0.0284
Dev. Flexibility 0.0223 0.0020 -0.0284
Precendentedness 0.0336 0.0088 -0.0284
Process Maturity 0.0496 0.0814 -0.0284
Team Cohesiveness 0.0264 0.0045 -0.0284
Precendentedness is a term for the degree of familiarity with new technology and problem domains. So, have a team that is nominal on all counts except for process maturity and team cohesiveness (sounds like a startup?). The project above that had a 1.030 multiplier moves to a 1.030 + 0.014 + 0.002 + 0.0088 + 0.0496 + 0.0264 = 1.1308. This moves the initial size from 18.5 after inefficacy to 24.6 resulting
in a re-estimate of 73.8 work months -- a full year more time.
There are also linear adjustments that are multiplied against the re-estimate.
Many of these center around the experience of the team (programmer
and management) and various constraints and tools at disposal.
Factor Low High
-------------------------- ------ ------
Analyst Capability 1.42 0.71
App. Experience 1.22 0.81
Lang/Tool Experience 1.20 0.84
Personal Continuity 1.29 0.81
Management Capability 1.18 0.87
Management Experience 1.11 0.90
Platform Experience 1.19 0.85
Programmer Capability 1.34 0.76
Execution Time Constraint 1.00 1.63
Main Storage Constraint 1.00 1.46
Platform Volatility 0.87 1.30
Effective Management Tools 1.22 0.84
Multi-site Development 1.22 0.80
Office Ergonomics 1.19 0.82
S/W Tools 1.17 0.78
Database Size 0.90 1.28
Documentation Match Stage 0.81 1.23
Internationalization 0.97 1.35
Required Re-usability 0.95 1.24
Required Reliability 0.82 1.26
Graphics/Multimedia 0.95 1.35
Legacy Integration 1.00 1.18
Site Security 0.92 1.40
Text Content 0.94 1.16
Tool Selection 0.95 1.14
Transaction Loads 0.96 1.59
Web Strategy 0.88 1.45
The above table came from study by
Barry Boehm, Casper Jones, and
William H. Roetzheim looking at the life cycles of 20,000 projects.
If the 73.8 work month project is average in every way except that
the programmers are all high capability (0.76) and are language gurus
(0.84), the time line moves to 47 months. Allot can be said for the
right people and the right tools.
All of this work on estimating has been on the amount of effort it
will take. Optimal scheduling of the effort can be found by taking the cube
root of the effort and multiplying it with a schedule multiplier.
Default 3.67
Embedded 4.00
E-Commerce 3.20
Web Devel 3.10
Military 3.80
Thus a 47 effort month E-Commerce project takes:
3.20 * 47
1/3 = 11.4 months
Realize, this is the best case for
delivery - best case being least
total effort. There are some other numbers that can be computed from
this.
- Least Effort Delivery Time
- This number is 2x the optimal delivery time and represents the
point at which it takes the least total effort to reach. This
is from less pressure, more testing, and fewer mistakes because
things are more well thought out. This will have a cost reduction
of roughly 50%.
- Region of Impossibility
- As delivery time grows less than the the optimal time, the
amount of effort increases exponentially. At less 75% of the
optimal time things become impossible. "Impossible" you ask? Of
the 20,000 projects examined above, 750 of them tried to deliver
a final product in less time than the optimal. None of them
met a schedule of less than 75% of the optimal time. It appears to
be impossible to accelerate the schedule less than this time.
- Adjusted Staff Months
- The actual amount of effort it takes to release a project
is about
optimal4/actual4.
For a 12 optimal month project (47 effort months) to be released in
10 months roughly doubles the the amount of effort it takes to release.
This reflects the increased number of coders, accelerated rate of testing
and debugging.
So, how can you get a product out the door faster?
- Reduce functionality
- By reducing the number of function points, this scales back
the number of lines of code necessary and thus reduces the effort
required.
- Decouple tasks
-
Instead of one large project, break it into two projects that
have a well defined interaction. The fewer interactions between
parts of code, the easier it is to write.
- Redundant parallel development
-
If you happen to have the resources (people and money), simply
assign multiple teams to write the same code and use the one
that is done first. The other code is then backed up in case
there are inefficiencies or bugs in the first code. Much can be
learned from another design standpoint that can assist in debugging
your own code.
- Increase reuse
- With languages that are cleanly modular, if previous projects
have been designed with reuse in mind, then later ones can build
upon them. This is often a Good Thing.
All of this seems like black magic and voodoo? Well, it is - yet,
a lot of time and research has been put into getting these numbers.
For software development, it becomes very useful to know when something
will get done and how long it will take. Any clue more than 'it will
be done when it is' can be helpful to a company and those depending
on it - just look at how many headaches Microsoft's
poor estimates hurts people depending on them.
Primary sources: