We are often asked about the reliability rate of software project estimates. If the question seems pretty simple and the answer is not. To answer that question, we should introduce two essential notions of project estimates: accuracy and uncertainty.
Accuracy of the estimate
A measurement tool is more accurate when the results of the measure it shows match the “real value” that we are looking to estimate. Note that accuracy cannot be expressed in digits. It is the assessment of qualitative results.
Accuracy is easier to be defined with measuring errors. It is expressed in a size unit (absolute error), or in percentage (relative error).
To properly understand that notion in an estimation of a project, let’s see an example:
let’s imagine that you have to estimate how much time you need to go from Paris to Alençon. First, you will evaluate the distance to be covered. Then, you will evaluate the average speed. Tools such as Via Michelin, or your own GPS, automate that process:
- Distance to cover: 192 km via the N12,
- Average speed: 66 km/h, from which we can figure out the time.
- We calculate the cost from the average consumption and gas price.
Let’s evaluate the accuracy of the estimate in the following cases:
Case 1:
At the end of the trip, the time spent and the distance are of 3 hours and 193km. The accuracy of the estimate is of:
- 97% for the time, J
- 99% for the distance. J
We end up having an almost totally accurate estimate.
Case 2:
you had some traffic issues along the way. The trip took 4 hours instead of 3, following the same road. The accuracy of the estimate is of:
- 63% for the time, L
- 99% for the distance. J
Case 3:
you decided to drive through Rambouillet, then Dreux, adding 24 km to your trip and end up driving for 3 hours and 41 minutes. The accuracy of the estimate is of:
- 74% for the time, L
- 88% for the distance. L
Is the accuracy of the estimate good? No, it is not.
Is the estimation model poorly built? No, we would not have errors if the distance hypothesis were right.
We could also imagine that the trip would take 5 hours because we are riding a scooter instead of driving a car.
Conclusions:
Accuracy depends on:
- external events (traffic, weather) you did not foresee (bad start hypothesis),
- a choice: you decided to take a detour (change of hypothesis),
- a poor assessment of the speed: you are riding a scooter, not a car.
- Note: if the estimation model does not anticipate that you might ride a scooter instead of a car, then it is inadequate. Otherwise, it is an hypothesis issue.
The accuracy of an estimation model of a project development:
Software development project accuracy is assessed the same way as in the example. Indeed, to assess the accuracy of an estimation model, it is necessary to compare the final value with the initial estimate. But, most importantly, you have to ensure that the hypothesis of the beginning of the project was respected.
- Did the perimeter of the project change (is the size larger than expected)?
- Were the productivity hypotheses respected (team skills, tools, technologies, etc.)?
- Which problems do affect the project (transportation strikes, flue outbreak, etc.)? Those issues must be identified such as the possible risks that the company cannot control.
The estimate accuracy depends on the accuracy of the following elements:
- the estimation hypothesis,
- the estimated quantity,
- the estimated productivity (speed): the amount of the abacus or of the estimation model of the effort that has been used.
Uncertainty of the estimate
The uncertainty of an effort estimate of software development is synonymous with “doubt” when it comes to the validity of the estimate. This doubt is generally the consequence, among others, of incomplete, unclear or limited knowledge of the software development projects to be estimated, and of the volatility of the development environment.
Uncertainty is mostly related to the scattering of values that could reasonably be attributed to a size.
There is a fundamental static dimension, since the estimation models are built on project archives. The experience shows that two similar software projects might have very variable costs or durations.
This is not specific to software projects:
In public transportation, we are regularly informed about important delays, and this is synonymous withuncertainty on the duration of the ride. That uncertainty grows as the frequency and the amount of delays grow. An exceptional delay of an hour, will have less impact on the uncertainty than frequent delays of 15 minutes.
Because of the uncertainties that are intrinsic to the estimation process, the realistic estimate must be presented in a probabilistic or possibilistic way as a possibility distribution. Practically speaking, most authors indicate that the representation of estimates as probability distributions or prediction interval enables to better understand those estimates.
The uncertainty of an effort estimation for a software project may be expressed with a “confidence interval” (IvP).
The boundary markers of that “confidence interval” represent a minimum and a maximum estimated effort corresponding to the anticipated confidence level. That confidence level is graded (1 – α) where α represents the probability that the IvP does not cover the actual effort. Indeed, the actual effort should be in the IvP in (1-α)% cases. Usually, the confidence degree may vary according to the project phase. The recommended confidence level for the planning phase must be higher or equal to 90%.
The smaller interval we have, the more accurate is the estimate. The width of the confidence interval represents the level of uncertainty.
An estimate should always be stated as follows: the cost should be between X and Y with a probability of Z%.
Knowledge bases and gauging
How to estimate a project if we do not know the productivity (speed)?
We will have to make hypothesis.
Consider the example of a marathon runner:
The estimation of the time a participant will take to run a marathon, without knowing his/her average speed, will lead us to a confidence interval from 2h06 (the record) to 8h (maximum granted time) with a confidence level of 100%. Another estimate could be the interval between 3h and 5h30 with a confidence level of 80%. The greater the uncertainty appears, the wider the interval we have.
In the case new information are given about the average speed (for instance, on a distance of 10km), as well as hypothesis concerning the participant training for the marathon, the uncertainty will significatively decrease (about more or less 15 minutes).
And when it comes to software development, it is the same. Without any data on productivity, the uncertainty of estimates will be very high. If we obtain data on former projects, we will be able to calibrate the estimation model with the studied context et reduce the uncertainty of the estimation.
Within organizations with a mature software development process, an effective risk management and an appropriate planning would make it possible to have less productivity variation between projects. That is to say, when the uncertainty decreases, the accuracy of estimation increases.
Conclusions:
A project estimate is an evaluation of the effort and duration of a project, depending on unknown data, hypothesis and estimation models.
The quality of known and unknown data highly influences the results. It is then necessary to verify that the initial hypotheses are always respected and, for the most structuring ones, to treat them as risks.
To improve the estimation models, there is no substitute for projects summaries and gaugings based on actual data.
Finally, when we present an estimate, it is essential to specify the level of uncertainty by resorting to a 3-point estimate: Min, Probable, Max.