Software Engineering PracticesUnit 10 Software Cost Estimation Lecture

Software development takes not only time, but also money. Getting reliable cost and schedule estimates for software development projects is still hard achieved. Software development cost is notoriously difficult to estimate reliably at an early stage. Since progress is difficult to see schedule slippages often go undetected for quite a while, and schedule overruns are the rule, rather than the exception.

Estimating the cost of a software development project is a field, in which all too often relies on mere guesstimates. There are exceptions to this procedure, fortunately. Now exist a number of algorithmic models that allow us to estimate total cost and development time of a software development project, based on estimates for a limited number of relevant cost drivers.

In most cost estimation models, a simple relation between cost and effort is assumed. The effort may be measured in man-months, for instance, and each man-month is taken to incur a fixed amount. The total estimated cost is then obtained by simply multiplying the estimated number of man-months by this constant factor. In this chapter, we freely use the terms cost and effort as if they are synonymous.

The notion of total cost is usually taken to indicate the cost of the initial software development effort, i.e. the cost of the requirements engineering, design, implementation and testing phases. Thus, maintenance costs are not taken into account. Unless explicitly stated otherwise, this notion of cost will also be used by us. In the same vein, development time will be taken to mean: the time between the start of the requirements engineering phase and the point in time when the software is delivered to the customer. Lastly, the notion of cost as it is used here, does not include possible hardware costs either. It concerns only personnel costs involved in software development.

Research in the area of cost estimation is far from transparency. Different models use different measures, so that mutual comparisons are very difficult.

General algorithmic models show a relation between effort needed (E, measure could be the number of man-months needed) and the size of the product (Kilo Lines Of Code, KLOC = Lines Of Code / 1000):

(10.1)[TEX]E=b\cdot KLOC^{c} [/TEX]
here b, c - constants geted experimentally (see table 10.1).

Table 10.1 - Base formulae for the relation between size and effort

Origin	b	c
Halstead	0,7	1,5
Boehm	2,4	1,05
Walston-Felix	5,2	0,91

Several questions come to mind immediately: What is a line of code? Do we count machine code, or the source code in some high-level language? Do we count comment lines, or blank lines that increase readability? Different models use different definitions of these notions.

The actual numbers used in those models result from an analysis of real project data. If these data reflect different project types or development environments, so will the models. Thus differences between formulae are in the characteristics between the sets of projects on which the various models are based.

These models reflect factors that bear on development cost and effort and allow software developers to identify strategies for improving software productivity, the most important of which are:

Writing less code - system size is one of the main determinants of effort and cost. Techniques that try to reduce size, such as software reuse and the use of high-level languages, can obtain significant savings.
Getting the best from people - individual and team capabilities have a large impact on productivity. Better incentives, better work environments, training programs provide further productivity improvement opportunities.
Avoiding rework - researches have shown that a considerable effort is spent redoing earlier work. The application of prototyping or evolutionary development process models and the use of modern programming practices (information hiding) can yield considerable savings.
Developing and using integrated project support environments - tools can help us eliminate steps or make steps more efficient.

COCOMO (COnstructive COst MOdel) is a procedural software cost estimation model developed by Barry W. Boehm and the one of the algorithmic cost estimation models best documented. In its simplest form, often called Basic COCOMO or COCOMO 81, the formula that relates effort to software size, is (10.1), where b and c are constants that depend on the kind of project that is being executed.

COCOMO distinguishes three classes of project:

Organic - a relatively small team develops software in a known environment. The people involved generally have a lot of experience with similar projects in their organization. They are thus able to contribute at an early stage, since there is no initial overhead. Projects of this type will seldom be very large projects.
Embedded - t he product will be embedded in an environment which is very inflexible and poses severe constraints. An example of this type of project might be air traffic control, or an embedded weapon system.
Semidetached - this is an intermediate form. The team may show a mixture of experienced and inexperienced people, the project may be fairly large, though not excessively large, etc.

Table 10.2 - Parameters of Basic COCOMO

Origin	b	c
Organic	2,4	1,05
Embedded	3,6	1,2
Semidetached	3,0	1,12

The COCOMO formulae are based on a combination of expert judgment, an analysis of available project data, other models, etc. The basic model does not very accurate results for the projects on which the model has been based.

You can use the implementation of basic COCOMO model in program USC COCOMO 81.

Function point analysis (FPA) is a method of estimating costs in which the problems associated with determining the expected amount of code are circumvented. FPA is based on counting the number of different data structures that are used. In the FPA method, it is assumed that the number of different data structures is a good size indicator. FPA is particularly suitable for projects aimed at realizing business applications for, in these applications, the structure of the data plays a very dominant
role. The method is less suited to projects in which the structure of the data plays a less prominent role, and the emphasis is on algorithms (such as compilers and most real-time software).

The following five entities play a central role in the FPA-model:

Number of input types (EI) - the input types refer only to user input that results in changes in data structures. It does not concern user input which is solely concerned with controlling the program’s execution. Each input type that has a different format, or is treated differently, is counted. So, though the records of a master file and those of a mutation file may have the same format, they are still counted separately.
Number of output types (EO) - for the output types, the same counting scheme is used.
Number of inquiry types (EQ) - inquiry types concern input that controls the execution of the program and does not change internal data structures. Examples of inquiry types are: menu selection and query criteria.
Number of logical internal files (ILF) - this concerns internal data generated by the system, and used and maintained by the system, such as, for example, an index file.
Number of interfaces (EIF) - this concerns data that is output to another application, or is shared with some other application.

The number of unadjusted function points (UFP) is a weighted sum of these five entities:

(10.2)[TEX]UFP=\sum_{EI}^{}UFP+\sum_{EO}^{}UFP +\sum_{EQ}^{}UFP+\sum_{ILF}^{}UFP+\sum_{EIF}^{}UFP[/TEX]
Table 10.3 gives the counting rules when three levels of complexity are distinguished. The complexity of an input type increases as the number of its data element types or referenced file types increases. For input types, the mapping of these numbers to complexity levels is given in table 10.4.

Table 10.3 - Counting rules for UFP

Type	Complexity level
Type	Simple	Average	Complex
EI	3	4	6
EO	4	5	7
EQ	3	4	6
ILF	7	10	15
EIF	5	7	10

Table 10.4 - Counting rules for UFP

Number of file types	Number of data elements
Number of file types	1-4	5-15	> 15
0-1	Simple	Simple	Average
2-3	Simple	Average	Complex
> 3	Average	Complex	Complex

As in other cost estimation models, the unadjusted function point measure is adjusted by taking into account a number of application characteristics that influence development effort. Figure 10.1 contains the 14 characteristics used in the FPA model. The degree of influence of each of these characteristics is valued on a six-point scale, ranging from zero (no influence, not present) to five (strong influence). The total degree of influence (DI) is the sum of the scores for all characteristics.

Figure 10.1 — Application characteristics in FPA

Then this number is converted to a technical complexity factor (TCF) using the formula:

(10.3)[TEX]TCF = 0,65 + 0,01\cdot DI[/TEX]
The number of adjusted function point adjusted function point measure (FP) is obtained through:

(10.4)[TEX]FP = UFP\cdot TCF[/TEX]
Finally, FP is directly mapped to lines of code by using functional point languages table.

In applying the FPA cost estimation method, it still remains necessary to calibrate the various entities to used development environment.

COCOMO II is a revision of the basic COCOMO model, tuned to the life cycle practices of the 1990s and 2000s. It reflects our cumulative experience with and knowledge of cost estimation.

COCOMO II provides three increasingly detailed cost estimation models. These models can be used for different types of projects, as well as during different stages of a single project:

Application Composition model, mainly intended for prototyping efforts, for instance to resolve user interface issues (Its name suggests heavy use of existing components, presumably in the context of a powerful CASE environment.)
Early Design model, aimed at the architectural design stage.
Post-Architecture model for the actual development stage of a software product.

The Post-Architecture model can be considered an update of the original COCOMO model; the Early Design model is an FPA-like model; and the Application Composition model is based on counting system components of a large granularity, such as screens and reports.
Total effort is estimated in the Application Composition model as follows:

Estimate the number of screens, reports, and 3GL components in the application.
Determine the complexity level of each screen and report (simple, medium or difficult). 3GL components are assumed to be always difficult. The complexity of a screen depends on the number of views and tables it contains. The
complexity of a report depends on the number of sections and tables it contains (see table 10.6) is used to determine these complexity levels.
Use the numbers given in table 10.5 to determine the relative effort (in Object Points) to implement the object.
The sum of the Object Points for the individual objects yields the number of Object Points for the whole system.
Estimate the reuse percentage, resulting in the number of New Object Points (NOP) as follows:

(10.5)[TEX]NOP=ObjectPoints\cdot\frac{100-Reuse}{100} [/TEX]

Determine a productivity rate

(10.6)[TEX]PROD=\frac{NOP}{man-month} [/TEX]
This productivity rate depends on the experience and capability of both the developers and the maturity of the CASE environment they use. It varies from 4 (very low) to 50 (very high).

Estimate the number of man-months needed for the project:

(10.7)[TEX]E = \frac{NOP}{PROD}[/TEX]

Table 10.5 - Counting Object Points

Object type	Complexity level
Object type	Simple	Medium	Difficult
Screen	1	2	3
Report	2	5	8
3GL componen			10

Table - Complexity levels for screens

Number of views	Number and source of data tables
Number of views	total < 4 ( < 2 on server < 3 on client)	total < 8 ( 2 - 3 on server 3 -5 on client)	total > 8 ( > 3 on server > 5 on client)
< 3	Simple	simple	medium
3-7	Simple	medium	difficult
> 8	Medium	difficult	difficult

The Early Design model uses unadjusted function points (UFPs) as its basic size measure. These unadjusted function points are counted in the same way they are counted in FPA. Next, the unadjusted function points are converted to Source Lines Of Code (SLOC), using a ratio SLOC/UFP which depends on the programming language used. These ratio are given in Function Point Languages Table.

The Early Design model does not use the FPA scheme to account for application characteristics. Instead, it uses a set of seven cost drivers, which are a combination of the full set of cost drivers of the Post-Architecture model. These cost drivers are rated on a seven-point scale, ranging from extra low to extra high. The values assigned are similar to those in figure 10.7.

Table 10.6 - Cost drivers and associated effort multipliers in COCOMO II

Cost drivers	Rating
Cost drivers	Very low	Low	Nominal	High	Very high	Extra high
Product factors
Reliability required	0.75	0.88	1.00	1.15	1.39
Database size		0.93	1.00	1.09	1.19
Product complexity	0.75	0.88	1.00	1.15	1.30	1.66
Required reusability		0.91	1.00	1.14	1.29	1.49
Documentation needs	0.89	0.95	1.00	1.06	1.13
Platform factors
Execution time constraints			1.00	1.11	1.31	1.67
Main storage constraints			1.00	1.06	1.21	1.57
Platform volatility		0.87	1.00	1.15	1.30
Personnel factors
Analyst capability	1.50	1.22	1.00	0.83	0.67
Programmer capability	1.37	1.16	1.00	0.87	0.74
Application experience	1.22	1.10	1.00	0.89	0.81
Platform experience	1.24	1.10	1.00	0.92	0.84
Language and tool experience	1.25	1.12	1.00	0.88	0.81
Personnel continuity	1.24	1.10	1.00	0.92	0.84
Project factors
Use of software tools	1.24	1.12	1.00	0.86	0.72
Multi-site development	1.25	1.10	1.00	0.92	0.84	0.78
Required development schedule	1.29	1.10	1.00	1.00	1.00

After the unadjusted function points have been converted to KLOC, the cumulative effect of the cost drivers is accounted for by the formula

(10.8)[TEX]E=KLOC\cdot \prod_{i}^{}cost drivers[/TEX]

Finally, the Post-Architecture model is the most detailed model. Its effort equation is very similar to that of the basic COCOMO model:

(10.9)[TEX]E=a\cdot KLOC^{b} \cdot \prod_{i}^{} cost deivers[/TEX]

It differs from the original COCOMO model in its set of cost drivers, the use of lines of code as its base measure, and the range of values of the exponent b. The differences between the COCOMO and COCOMO II set of cost drivers (see table 10.7) reflect major changes in the field.

In COCOMO II, the user may use both KSLOC and UFP as a base measure. It is also possible to use UFP for part of the system. The UFP counts are converted to KSLOC counts as in the Early Design model, after which the effort equation applies.

COCOMO II model uses five scale factors [TEX]W_{i} [/TEX], each of which is rated on a six-point scale from very low (5) to extra high (0). The exponent b for the effort equation is then determined by the formula:

(10.10)[TEX]b=1.01+0.01\cdot \sum_{i}^{}W_{i} [/TEX]

here the scale factors:

PREC - precedentedness, indicating the novelty of the project to the development organization.
FLEX - development flexibility, reflecting the need for conformance with pre-established and external interface requirements, and a possible premium on early completion.
RESL - architecture/risk resolution, which reflects the percentage of significant risks that have been eliminated.
TEAM - team cohesion, accounting for possible difficulties in stakeholder interactions.
PMAT - process maturity, reflecting the maturity of the project organization according to the Capability Maturity Model.

Basic COCOMO model has only the first two of these factors. Basic COCOMO model allows us to handle reuse in the following way. The three main development phases, design, coding and integration, are estimated to take 40%, 30% and 30% of the average effort, respectively. Reuse can be catered for by separately considering the fractions of the system that require redesign ( DM ) , recoding ( CM ) and re-integration ( IM ) . An adjustment factor AAF is then given by the formula:

(10.11)[TEX]AAF=0.4\cdot DM+0.3\cdot CM+0.3\cdot IM[/TEX]

An adjusted value AKLOC, given by

(10.12)[TEX]AKLOC=KLOC\cdot \frac{AAF}{100} [/TEX]

is next used in the COCOMO formulae, instead of the unadjusted value KLOC.

In this way a lower cost estimate is obtained if part of the system is reused. By treating reuse this way, it is assumed that developing reusable components does not require any extra effort. This hypothesis does not to be realistic.

COCOMO II uses a more elaborate scheme to handle reuse effects. This scheme reflects two additional factors that impact the cost of reuse: the quality of the code being reused and the amount of effort needed to test the applicability of the component to be reused.

Extra effort needed to reuse is denoted by the software understanding increment (SU). If the software to be reused is strongly modular, strongly matches the application in which it is to be reused, and the code is well-organized and properly documented, then SU is estimated to be 10%. If the software has low relations, is poorly documented the SU may be as high as 50%.

The degree of assessment and assimilation (AA) denotes the effort needed to determine whether a component is appropriate for the present application. It ranges from 0% (no extra effort required) to 8% (extensive test, evaluation and documentation required).

Both these percentages are added to the adjustment factor AAF, yielding the equivalent kilo number of new lines of code, EKLOC:

(10.13)[TEX]EKLOC=KLOC\cdot \frac{AAF+SU+AA}{100} [/TEX]

You can use the implementation of COCOMO II Post-architecture model in program USC COCOMO II.

References...Hide

Barry Boehm. «Software engineering economics». Englewood Cliffs, NJ:Prentice-Hall, 1981.
Barry Boehm, et al. «Software cost estimation with COCOMO II». Englewood Cliffs, NJ:Prentice-Hall, 2000.

Part of material was taken from:

Software Engineering: Principles and Practice. Hans van Vliet. 2007.

Software Engineering Practices

Unit 10 Software Cost Estimation

Lecture

Keywords

10.1 COCOMO

10.2 Function Point Analysis

10.3 COCOMO II