Multiple-parameter coupling metrics for layered componentbased software Abstract Coupling represents the degree of interdependence between two software components. Understanding software dependency is directly related to improving software understandability, maintainability, and reusability. In this paper, we analyze the difference between component coupling and component dependency, introduce a two-parameter component coupling metric and a three-parameter component dependency metric. An important parameter in both these metrics is coupling distance, which represents the relevance of two coupled components. These metrics are applicable to layered component-based software. These metrics can be used to represent the dependencies induced by all types of software coupling. We show how to determine coupling and dependency of all scales of software components using these metrics. These metrics are then applied to Apache HTTP, an open-source web server. The study shows that coupling distance is related to the number of modifications of a component, which is an important indicator of component fault rate, stability and subsequently, component complexity. 1 Introduction Component-based software development is a popular approach to improving the practice of software engineering. Potential benefits of the approach include increased productivity and quality, and decreased cost and time-to-market. Quite frequently, existing components are not likely to be ready-to-use building blocks, especially in the case of large-scale design-level reuse. Instead, these components need to be adapted and/or modified to meet the specific requirements of the new product being developed. Furthermore, just as the software product as a whole needs to be maintained, reused software components also need to be periodically updated to meet new requirements or changes in the operating environment. Therefore, software components also need to be designed with maintainability as an important consideration. Hence, reusability and maintainability are two important properties of a software component (Lim 1994; Frakes and Succi 2001). Components are building blocks of a software system. A product contains components of different composition scales (Jonge 2004). A considerable amount of research has been done to try to characterize reusable (Price and Demurjian 1997; Biggerstaff and Perlis 1989; Briand et al. 1994; Card and Glass 1990; Dandashi 2002) and maintainable (Berns 1984; Gibson and Senn 1989; Banker et al. 1993) software components. Both reusability and maintainability are related to software dependency. From the dependency point of view, if a software component is relatively independent, that is, if there are only a few dependencies of this component on other components, it would be easy to understand, maintain, and reuse. Coupling represents the degree of dependencies between two software components. Strong coupling between components strengthens the dependency of one component on the others and increases the probability that changes in one component may affect the other components and introduce regression faults (Kafura and Henry 1981; Selby and Basili 1991; Troy and Zweben 1981), and accordingly have detrimental effects on
software maintenance. Strong coupling can also hamper software reuse. For example, if a software component has many dependencies on other components, it may be impossible to reuse this component in a new product without either (1) incorporating it together with the dependent components, or (2) redesigning and reimplementing this component to remove these dependencies. While option 1 may result in redundant reuse, option 2 may result in changes to the component functionality. Hence, both these two approaches defeat the intended purpose of component reuse (Harrison et al. 2000). Dependencies between software components are not only associated with the type of coupling between the components, but also upon the relevance of the two components. Although the idea of interaction locality (increasing the coupling of relevant components and decreasing the coupling of irrelevant components) is widespread and longstanding, it has not been formalized and thoroughly studied. In this paper, we consider the relevance (signified by the coupling distance measure) between two components as a factor that affects the dependencies between them and propose two multiple-parameter coupling metrics for layered component-based software systems. The remainder of the paper is organized as follows: Sect. 2 reviews software coupling, interaction locality, and coupling metrics. Section 3 describes the representation of component dependency. We describe layered component structure in Sect. 4. Section 5 presents our coupling and dependency metrics. In Sect. 6, we show how to determine component dependency for various kinds of coupling. Section 7 presents our application studies on Apache HTTP. The conclusions, threats to validity, and future work appear in Sect. 8. 2 Software coupling, interaction locality, and coupling metrics Coupling represents the degree of interaction between two software components (classes, modules, packages, or the like). There are many different types of coupling (Stevens et al. 1974; Page-Jones 1980; Offutt et al. 1993). All of them can be shown to fall into one of the following four types: parameter coupling, external/file coupling, inheritance coupling, and common coupling (Abdurazik 2007). This categorization was presented in the context of object-oriented software systems. However, in view of component dependency, the constructs of structured software are a proper subset of those of objectoriented software, coupling in structured software systems can be represented using parameter coupling, external/file coupling, and common coupling1; inheritance coupling is specific to object-oriented software systems. The definitions of these types of coupling are listed in Table 1. In object-oriented software, a class is considered to be the basic manageable unit, while in structured software, a module is considered to be the basic manageable unit. In Table 1, we refer both these basic units, in general, as modules. Table 1 Definitions of various kinds of coupling (Abdurazik 2007) Name Definition Parameter Two modules have parameter coupling if one module invokes method coupling of another module via parameter passing. External/File Two modules have external/file coupling if they access the same coupling external medium including external files.
Name Inheritance coupling
Definition Two modules have inheritance coupling if one module is a descendant of another module. Two modules have common coupling if they access the same global Common coupling variable. Table 1 lists the definitions of coupling between two modules, the smallest scale components. Usually, coupling between modules of two large-scale software components is also used to represent the large scale component coupling (Bruegge and Dutoit 2004). For example, if component C1 contains module A and module B , component C2 contains module E and module F , and module A is parameter coupled to module E , we can say component C1 is parameter coupled to component C2 . Strong coupling means a high degree of dependency between software components. Common coupling has been considered to be a strong form of coupling because it induces strong dependencies between software components, making software components difficult to understand, maintain, and reuse (Offutt et al. 1993; Yu et al. 2004). Inheritance coupling is also considered as a strong form of coupling (Bruegge and Dutoit 2004; Hassoun et al. 2004) in the context of software maintenance and white-box reuse, because any changes to a base class will affect all its derived classes. Parameter coupling is usually considered as a weak form of coupling. Therefore, the degree of dependency increases from top to bottom in Table 1 (Abdurazik 2007). It has been observed that most of the complex systems in the world, from physical systems such as atoms and stellar galaxies to social systems such as organizations and governments, are modular and hierarchically structured. A large system may consist of subsystems, which consists of subsystems, and so on, through several multiresolutional layers. The interactions between subsystems tend to decrease as we go upward in the hierarchy. This is called interaction locality (Simon 1969). Generally speaking, interaction locality can minimize the energy for the system to operate and accordingly stabilize the system. In software systems, interaction locality is expressed via a widely accepted design principle: increasing the coupling of relevant components and decreasing the coupling of irrelevant components. Interaction locality should not be used in isolation by itself. Instead, it should be used together with other two design principles, modularity and hierarchy (Yu and Ramaswamy 2007). Design modularity and hierarchy means the decomposition of the software system into different layers of components in order to separate concern and reduce system complexity. Interaction locality is then applied to assign interactions between these components. Consider an ideal system that consists of components C1 and C2 , which in turn contain modules, m1 through m4 . Figure 1a depicts the modular and hierarchical structure of the system. Figure 1b depicts the interaction locality: high interactions exist between relevant (lower level) modules and low interactions exist between irrelevant (higher level) components.
Fig. 1 An ideal system with (a) hierarchical structure; and (b) interaction locality (Yu and Ramaswamy 2007) One the one hand, because different types of coupling have different effects on software complexity, we can use the definitions of coupling in Table 1 to compare the degrees of dependency between software components. Considerable research has been done in this area to derive software dependency metrics, including (Briand et al. 1999; Chidamber and Kemerer 1994; Basili et al. 1996; Card and Glass 1990). In these studies, software dependency and complexity metrics are proposed and validated for both structured software and object-oriented software. These metrics consider different types of interactions between classes/modules, methods/functions, and attributes/variables. On the other hand, the interaction locality design principle has also been widely accepted. For example, Basili et al. (1996) validated the speculation made by Chidamber and Kemerer (1994) that deep inheritance is more of a complication than shallow inheritance. Lüer et al. (2001) proposed to increase component distance (reduce component interactions) to increase component evolvability. Yu and Ramaswamy (2007) presented a method to verify modularity, hierarchy, and interaction locality of a software design. However, to the best of our knowledge, interaction locality has not been formalized and generally used in the derivation of software metrics. 3 Component dependency representation While coupling represents the degree of interactions between two components, “coupling” by itself does not explicitly express the directionality of the dependency between the two components. For example, the statement “Component C1 is parameter coupled with component C2 ” does not explicitly specify whether component C1 depends on component C2 or if component C2 depends on component C1 . Furthermore, the definition of coupling is often associated with the relationship between only two components. However, it frequently happens that one component may be coupled with several components. Therefore, there is a need to explicitly define and formalize the dependency relationship between components. In our previous work (Yu 2007), we extended the concept of coupling and defined component dependency as follows: Component changes made to
depends on component
have a direct effect on the behavior of
if
(the word “direct”
means that the dependency is not via some third component). Component
is called
the dependency-inducing component and component is called the dependent component. In this paper, we continue to use this terminology. With this notation, the dependency of a component can be represented with all its dependent components. Here we utilize two notations to represent the dependency of one
component. The first is a graphical representation. This notation was first introduced in (Yu 2007): A dashed arrow from component
to component
denotes that
component is dependent on component . The second notation is a matrix representation. A two-column matrix is used; the first column lists the dependent component names and the second column lists the coupling types. Suppose component C1 is parameter coupled to (dependent on) component C2 and C3 . Figure 2 is a graphical representation of the dependency of C1 . Table 2 is a matrix representation of the dependency of C1 . Fig. 2 Dependency of component C1 (graphical representation) Table 2 Dependency of component C1 (matrix representation) Dependent component Coupling type C2 Parameter coupling C3 Parameter coupling 4 Layered component structure While there have been several definitions of software components (Brown 1997; Leavens and Sitaraman 2000), in this paper, we consider a component from a logical perspective and define it as an integral logical constitute (Mei et al. 2001). According to this definition, all artifacts (classes, programs, packages, and so on) can be considered as components. In a software system, there are two types of components: primitive component and compound component. A primitive component is defined as the smallest manageable unit (class in object-oriented software and module in structured software). A compound component is composed of primitive components and/or other compound components. Therefore, a software system can be represented by a component tree: the leaf nodes are primitive components and the internal nodes are compound components, with the primitive components at height 1. The height of a compound component can be recursively defined as one plus the maximum height of its descendent components. Consider the following example: a software product consists of seven modules: C1 , C2 ,…, and C7 . Modules C1 , C2 , and C3 are composed to form one component called Main . Modules C4 and C5 are composed to form a component called Input . Modules C6 , and C7 are composed to form a component called Output . Input and Output are composed to form a component called I/O . Main and I/O are composed to form the product. This product can be represented as a component tree, as shown in Fig. 3. There are seven components at height 1 ( C1 , C2 ,…, and C7) , three components at height 2 ( Input , Output , and Main) , one component at height 3 ( I/O) , and the root, Product , is at height 4. Fig. 3 A component tree example In general, a software product can be represented as a component tree of height h with m leaf nodes or primitive components PC 1, PC 2,…PC m . In the case of an object-oriented
software product, the primitive components are classes; in a structured software product, they are modules. Each internal node or compound component CC i, 1 ≤ i ≤ n, is a composition of primitive and/or compound components situated lower in the tree. A generic component tree is shown in Fig. 4, in which PC represents a primitive component and CC represents a compound component. The height of a component represents its composition scale. Fig. 4 Generic layered component tree structure 5 Coupling and dependency in layered component-based software Figure 5 is a coupling and dependency example in a layered component-based software product. It shows that the primitive component PC2 is dependent on primitive components PC1 , PC3 , and PC4 . It also shows that the compound component CC1 is dependent on compound components CC2 and CC3 . Fig. 5 Graphical representation of component dependency As mentioned in Sect. 2, the couplings defined in Table 1 reflect the interactions of two software components. However, they do not reflect the structure of the product and could not accurately represent the dependency of compound components. As in Fig. 5, PC2 is dependent on PC1 , PC3 , and PC4 . Now suppose that these couplings are of the same type (say, parameter coupling). If we consider the dependency of PC2 itself, there is no difference among these couplings. However, if we consider the dependency of CC1 , the coupling between PC2 and PC1 and between PC2 and PC3 are to be handled differently. In a properly designed software system, related modules are composed into the same component. The coupling between PC2 and PC1 does not affect the dependency of CC1 , but the coupling between PC2 and PC3 and between PC2 and PC4 may affect the dependency of CC1 . Similarly, the couplings between PC2 and PC3 and PC2 and PC4 have different effects on the dependency of CC6 . Therefore, traditional coupling definitions that consider only the type of dependency between primitive components are insufficient to describe dependencies between compound components. To consider the dependency between compound components, we present a metric C (t, d) to measure the coupling between two components, where C stands for coupling, with t representing the coupling type and d the coupling distance. Thus, this metric has two parameters: coupling type and coupling distance. While coupling type is determined by the nature of interactions between two software components as defined in Table 1, coupling distance is determined by the relative location of the two components in the component tree. Hence associated with any type of coupling between two components, there is a corresponding coupling type and a coupling distance. In the following subsections, we discuss how to represent component coupling and component dependency by the coupling distance parameter. 5.1 Coupling representation of primitive components
In this subsection, we consider only primitive components, that is, the leaf components in the component tree. First, we explain how to represent the coupling metric C (t, d). As mentioned before, the coupling type is determined by the interactions between the two modules. The coupling distance is defined below. Definition 1 The coupling distance between two primitive components the height of the lowest common ancestor of
and
and
is
in the component tree.
For example, in Fig. 5, supposing that all the coupling types are parameter coupling, the lowest common ancestors for primitive components PC2 and PC1 , PC2 and PC3 , and PC2 and PC4 , are CC1 , CC6 , and CC8 , respectively. The couplings between these primitive components can be represented as C (parameter coupling, 2), C (parameter coupling, 3), and C (parameter coupling, 4) respectively. Note that there may exist more than one coupling between two modules. In this case, each of them is represented by the corresponding C (t, d). 5.2 Dependency representation of primitive components Next, we explain how to represent the dependency of a primitive component. In Sect. 3 we introduced the dependency of a component and represented it by a two-column matrix without considering coupling distance. Here, we introduce a three-parameter matrix to represent the dependency of a primitive component. The three parameters are dependent component, coupling type, and coupling distance. Accordingly, the dependency of PC2 in Fig. 5 can be represented by the matrix in Table 3. Table 3 Dependency of primitive component PC2 Dependent component Coupling type Coupling distance PC1 Parameter coupling 2 PC3 Parameter coupling 3 PC4 Parameter coupling 4 5.3 Dependency representation of compound components As described in Sect. 2, dependency of a compound component is determined by the dependencies of its underlying primitive components. Therefore, we can represent the dependency of a compound component by using the dependencies of its primitive components. To calculate the coupling distance of a compound component dependency, we only need to consider the dependencies of its underlying primitive components whose coupling distances are larger than the height of this compound component. If the coupling distance of a dependency is smaller than or equal to the height of a compound component, this dependency is within the same compound component. It does not affect the dependency of the compound component. Using this approach, the dependency of compound component CC1 in Fig. 5 can be represented as a three-parameter matrix shown in Table 4. Coupling between PC1 and PC2 is not of concern here. Table 4 Dependency of compound component CC1 (primitive component representation)
Dependent component Coupling type Coupling distance PC3 Parameter coupling 3 PC4 Parameter coupling 4 Sometimes, we are interested in the coupling between two compound components. For example, the primitive components are opaque and we cannot access (or we are not interested in) the lower level components. In this case, we will treat the two coupled compound components as the basic units, the height-1 components. Then, we need to redetermine the tree structure of the software product. For example, in Fig. 5, if we consider CC1 , CC2 , CC3 , CC4 , and CC5 as the basic unit, height-1 component, we can redraw the component tree as shown in Fig. 6. We then can recalculate coupling distance. Therefore, the coupling between CC1 and CC2 can be represented as C (parameter coupling, 2), and the coupling between CC1 and CC3 can be represented as C (parameter coupling, 3). The dependency of CC1 can be represented by the three-parameter matrix shown in Table 5. Fig. 6 Dependency tree with redefined height 1 components Table 5 Dependency of compound component CC1 (specified dependent component representation) Dependent component Coupling type Coupling distance CC2 Parameter coupling 2 CC3 Parameter coupling 3 The first column in Table 5 (dependent component) contains specified components, which could be either primitive or compound. Both Table 4 and Table 5 contain the same information about the dependency of component CC1 . We could use either representation under appropriate situations. In the remaining of this paper, we use the primitive component representation (Table 4) to represent the compound component dependency. 5.4 Discussion C (t, d), the two parameter representation of component coupling, has one advantage over the traditional one parameter representation. The second parameter, coupling distance, represents the relevance of two coupled components. Usually, in software design, relevant functions (methods) are grouped into one module (class) and relevant modules (classes) are grouped into one package, and so on. Therefore, larger values of distance coupling are unfavorable than smaller values of distance coupling, because a larger value normally represents the presence of coupling between two relatively unrelated components. With respect to program comprehension and understandability, coupling between related components is easier to understand. For component maintenance, changes to a software component may have effects on other components due to component coupling; a smaller distance coupling value is preferable to a larger distance
coupling value, because a smaller distance coupling value is indicative of localized adverse effects, and thus, in a small scale component, which is easier to manage. A distance-1 coupling implies the coupling is within a component and it does not affect the independence of the component, which makes this component highly independent of other components. A distance-2 coupling indicates that the coupling is between components that have the same parent (one-height-up) component and hence can be more relevant than other larger distance couplings. Therefore, coupling distance, together with the coupling type specified in Table 1, composes a valuable two parameter coupling metric, C (t, d). This metric not only can be used to compare the degree of dependencies brought about by different types of coupling, but can also be used to compare the degree of relevance of the same type of coupling. For example, considering component CC1 in Fig. 5, we can infer that coupling between CC1 and CC2 is viewed more favorably over coupling between CC1 and CC3 , even though they have the same coupling type (parameter coupling), because they have different coupling distances. In summary, our two-parameter coupling metric and three-parameter dependency metric reveal the deep dependency relationships among components and are applicable to all types of coupling between components of any scales. 6 Determination of dependencies It is clear from the above discussions that software dependency is largely induced by the presence of software coupling. It is easy to automatically determine parameter coupling and inheritance coupling. Parameter coupling is induced via function calls or message passing. For example, if module m1 invokes a function (method) implemented in module m2 , we say m1 is parameter coupled to (dependent on) m2 . Dependencies induced by inheritance coupling can be identified by language specific keyword or semantics. For example, Java uses a keyword extend to represent class inheritance. If module m1 is inherited from module m2 , we say m1 is inheritance coupled to (dependent on) m2 . In contrast, dependencies induced by common coupling and external coupling are more complicated. Common coupling between two modules is identified with the definition and use of a global variable: a definition of a variable x is a statement that assigns a value to x , such as x = 5; t he use of a variable x is a statement that utilizes the value of x , such as if (x > 6) return. Because definitions can affect uses but uses cannot affect definitions, dependencies between components induced by global variables are induced by the definition–use relationship (Yu et al. 2004). For example, if module m1 uses a global variable that is defined in module m2 , we say m1 is common coupled to (dependent on) m2 . External coupling between two modules is identified with the write and read operations to the same external medium, including file, database, and so on. A write operation is to change the content of the external medium and a read operation is to utilize the content of
the external medium. Because write operations might affect modules that read the same external medium but read operations can not affect modules that read/write to the same external medium, dependencies between components induced by external medium are induced by the write–read relationship. For example, if module m1 reads a file and utilizes the content that is written by module m2 , we say m1 is external coupled to (dependent on) m2 . 7 Case studies of Apache HTTP 7.1 Overview In this paper, we analyze Apache HTTP,2 an open-source software product. The Apache HTTP is a project to develop and maintain an open-source web server for modern operating systems including UNIX, Linux and Windows. Because Apache HTTP is designed to run on different platforms, some code must be easily extensible and customizable. However, to make the project manageable and the product maintainable, the amount of code to be rewritten for different platforms must be minimized. In order to solve this maintenance and reuse issue, Apache HTTP is created as a series of code modules. Figure 7 shows the tree structure of Apache version 2.2, which can be considered approximately as a height-4 layered system and the primitive components (modules) are the “ .c ” files. Fig. 7 Component tree structure of Apache HTTP 2.2 In Apache HTTP, there are six height-3 compound components, among which modules is the most important one; it contains 17 height-2 compound components and 107 primitive components. These components are expected to be reused in different platforms. Data regarding the number of primitive components of Apache HTTP version 2.2 is provided in Table 6. Table 6 The data about Apache version 2.2 Height-3 component os test modules support srclib server Number of modules 14 8 107 10 299 44 As mentioned before, the most important height-3 compound component in Apache is modules , which contains platform independent functions. Therefore, to facilitate software maintenance and reuse, we expect that modules would be designed with weak dependencies. In this research, we use our method to study coupling within modules and between modules and other components in order to understand the component dependency of the entire Apache system. Apache HTTP is structured software and is written in the C language, there is no inheritance coupling (structured software does not have inheritance coupling). External coupling is related with the write and read operations to the same file/database. Modern programming no longer uses it to control program flow but to store and retrieve permanent data. Therefore, we decided to study parameter coupling and common coupling in Apache HTTP. These two types of coupling are most commonly apparent in structured software. The coupling data is obtained via the source code cross reference tool, lxr 3.
7.2 Component dependencies induced via parameter coupling Apache version 2.2 is a height-4 tree structure and modules is a height-3 compound component. Table 7 shows the dependency of modules induced by parameter coupling. Multiple occurrences of the same dependent component are counted as 1 unique dependent component. The coupling distances 1, 2, and 3 exist within primitive components of modules; the coupling distance 4 exists between primitive component of modules and primitive component of other height-3 compound components. Table 7 Dependency of compound component modules induced by parameter coupling Coupling Number of dependent Unique number of dependent distance components components 1 23 19 2 86 21 3 13 3 4 881 67 Consider the dependency of a single primitive component in modules . A total of 107 primitive components in modules are dependent on 1,003 components, i.e., on average, each primitive components invokes functions implemented in 9.37 components, most of which belong to other height-3 compound components. According to the interaction locality design principle, these distance 4 parameter couplings should be considered for restructuring to reduce the coupling distance in order to reduce the system complexity. Consider the dependency of modules as a whole. 107 primitive components in modules are dependent on 67 components (Distances 1, 2, and 3 coupling does not affect the dependency of modules as a whole). The fact that modules has less dependent components (67) than its included number of components (107) indicates that modules is well designed with respect to component dependency as a whole. 7.3 Component dependencies induced via common coupling Because the determination of common coupling is associated with the definition-use analysis of global variables, to simply the representation, we define a layer-L global variable that induces distance-L common coupling for a height-H component tree, in which 1 ≤ L ≤ H. Definition 2 A layer-L (1 ≤ L ≤ H) global variable appears in primitive components with the same height-L ancestor, thereby inducing a distance-L common coupling. Because Apache version 2.2 is a height-4 tree structure, the highest layer of a global variable could be layer-4. Table 8 summarizes the global variables appear in modules . A layer-1 global variable appears within a single primitive component and induces a distance-1 dependency. The functions that use the layer-1 global variable depend on the functions that define the global variable within the same primitive component. However, a distance-1 dependency does not induce dependencies between components. Similarly, a distance-2 coupling does not affect the independence of the height-2 compound
component as a whole. Therefore, lower layer global variables are more favorable than higher layer global variables. Table 8 Global variables in Apache compound component modules Layer-number 1 234 Number of global variables 120 7 0 20 First, we evaluate the dependency of the compound component modules . Since modules is a height-3 component, only layer-4 global variables in Table 8 can affect the dependency of modules . Among the 20 layer-4 global variables, only three variables induce dependency of modules on other components (because only definition of a global variable can affect the use of the global variable4). Table 9 presents the dependency of component modules induced by global variables using matrix representation. Table 9 Dependency of component modules induced by common coupling Dependent component Coupling type Coupling distance leader.c Common coupling 4 Mpm_winnt.c Common coupling 4 worker.c Common coupling 4 perchild.c Common coupling 4 threadpool.c Common coupling 4 prefork.c Common coupling 4 Table 10 shows the dependencies of all primitive (height 1) components in modules induced by common coupling. Among 107 primitive components, only eight of them have dependencies on other components via global variables. The first column lists the eight dependency-inducing components in modules; the other three columns are the matrix representations of the dependencies of each dependency-inducing component. Table 10 Dependency of primitive components in modules via common coupling Dependency-inducing Dependent Coupling Coupling type component component distance Common mod_win32.c mpm_winnt.c 4 coupling Common cache_storage.c mod_cache.c 2 coupling Common cache_util.c mod_cache.c 2 coupling Common locks.c 2 coupling repos.c Common dbm.c 2 coupling Common mod_dav_lock.c locks.c 2 coupling Common mod_case_filter.c mod_case_filter_in.c 2 coupling
Dependency-inducing component
Dependent component
mod_case_filter_in.c
mod_case_filter.c leader.c worker.c mpm_winnt.c
mod_status.c perchild.c threadpool.c prefork.c
Coupling type Common coupling Common coupling Common coupling Common coupling Common coupling Common coupling Common coupling
Coupling distance 2 4 4 4 4 4 4
To summarize, in height-3 compound component modules of Apache version 2.2, there are 107 primitive components. Among them, 99 components are independent with respect to common coupling, six components have distance-1 common coupling and 2 components have distance-4 common coupling. Since common coupling, especially larger distance common coupling is a barrier for software maintenance; from the viewpoint of software maintenance, these eight components are potential targets for restructuring to improve the maintainability of Apache. Consider the design quality of modules , 93% (99/107) of its primitive components are well designed from the viewpoint of common coupling: changes to other components will not affect any of them via global variables; reuse of any of these 99 components does not need to consider their dependencies on other components via global variables. 7.4 Validation study We have presented a 2-parameter coupling metric and a 3-parameter component dependency metric. Both metrics included coupling distance as an important parameter for consideration. It was assumed that larger distance coupling values have more detrimental effects than smaller distance coupling values with respect to software understandability, maintainability and reusability. In order to further validate this assumption, in this section we perform an empirical study on Apache HTTP to investigate the relationship between component dependency (represented with coupling distance) and the external properties of software products. The empirical studies were performed on the primitive components of modules in Apache HTTP version 2.2.
The coupling metrics presented in this paper are two dimensional and contains both coupling type and coupling distance. To avoid the cross-effects of coupling type on the results, we studied parameter coupling and common coupling separately. First, we define two evaluation metrics, D parameter (parameter coupling distance) and D common (common coupling distance). D parameter of a component equals to the sum of the parameter coupling distances of all its dependent components and is expressed by the formula:
. D common of a component equals to the sum of the common coupling
distances of all its dependent components and is expressed by the formula: . The D parameter and D common values of all 107 primitive components of modules in Apache HTTP are calculated based on the inspection of source code of version 2.2 using lxr. Second, we count the M value, which is the number of times these dependency-inducing components have been modified, based on the change history of Apache HTTP. For this measurement, the CVS log is used HTTP5 records a complete revision history of all the components and is available online and supports easy data extraction. We used a selfwritten Perl program to obtain the change record information for each of the 107 primitive components and count the number of times it is modified from its first version to the current version 2.2. Finally, we test the correlation between D parameter and M and D common and M. We expect to find that a component with larger dependency value also has larger number of modifications; therefore, we test the following null hypotheses. H01: There is no linear relationship between the parameter coupling distance of a component and the number of modifications made on this component. H02: There is no linear relationship between the common coupling distance of a component and the number of modifications made on this component. To test these hypotheses, we need to calculate the correlation coefficient value that indicates the strength of the relationship between the two variables: independent variable, component dependency value (D parameter or D common), and dependent variable, the number of modifications (M) made to the component, in the software revision history. Several different correlation coefficients have been put forward, including Pearson’s correlation coefficient and Spearman’s rank correlation coefficient (Nolan 1994). For Pearson’s correlation coefficient to be valid, two variables should be normally distributed. However, in this case, it is unlikely that either of these two variables has a normal distribution. Therefore, we use Spearman’s rank correlation coefficient. If the rank correlation coefficient proves to be statistically significant at the 0.05 level, we will reject the null hypothesis. The results of the hypothesis tests are in Table 11. The scatter plots showing the relationship between parameter/common coupling distance and the number of modifications are in Figs. 8 and 9. Figure 8 shows the measurements and Fig. 9 shows the ranks of the measurements. Dashed linear trendlines are displayed in Fig. 9. In both tests,
the correlations are significant at the 0.01 level (two tailed). Therefore, we reject the null hypotheses and conclude that there is significant linear correlation between dependency value (D parameter or D common) of a component and the number of modifications (M) made to this component. Table 11 The results of hypothesis tests Hypothesis Number of pairs of data Correlation coefficient Significance H01 107 0.402 0.01 H02 107 0.299 0.01 Fig. 8 The scatter plot of the number of modifications of a component versus (a) parameter coupling distance; and (b) common coupling distance Fig. 9 The scatter plot of rank of number of modifications of a component versus (a) rank of parameter coupling distance; and (b) rank of common coupling distance It is worth noting that strong correlations between independent variables, coupling distance and dependent variable, number of modifications; do not demonstrate the cause– effect relationship. It only gives empirical evidence that these two variables are related (either directly or through a third variable). Only an experiment where all other factors are fixed can help us derive the cause–effect relationship. Such an experiment would require that the only difference between components is coupling distance, and all other factors such as code size, code structure, should remain the same. This can be done only through a controlled experiment and not on a real world software product. Therefore, we were not able to verify the causation in this study. The number of modifications made to a component is related to the quality and the complexity of the component. A component may be modified for various reasons. For instance, an error found in one component and an improvement requirement on the functionality of the component could result in a direct modification to the component. Moreover, because components are interdependent, changes made to other components could indirectly require modifications to this component. Therefore, we can assume that the number of direct modifications represents the quality of the component, such as stability, fault density; the number of indirect modifications represents the complexity of the component. Note that a component is said to be complex when it is interrelated with many other components; changes made to other components require corresponding changes on this component. The fault density and complexity of a component are directly related to its dependency. Similar conclusions have been achieved in earlier work (Kafura and Henry 1981; Selby and Basili 1991; Troy and Zweben 1981). In these studies, the relationship between coupling type and software quality were established. Our empirical study further reveals the relationship between coupling distance and software quality measures: larger distance coupling values have more detrimental effects than smaller distance coupling values on software quality, including understandability, maintainability and reusability.
8 Conclusions, threats to validity, and future research In this paper, we proposed a coupling metric and a dependency metric for componentbased software. In both metrics, a new and potentially important parameter, coupling distance, which measures the relevance between two coupled components, is used. If a software system can be represented as a layered component tree structure, the coupling distance can be determined easily from the heights of the two components in the tree. As a case study, we evaluated the dependency of Apache version 2.2 based on parameter coupling and common coupling. A validation study was performed and found linear relations exist between coupling distance and component quality. There are several threats to the validity of our study. One threat to internal validity is the accuracy of data. To reduce this threat, we use both open-source and self-written tools to extract coupling data and modification data in order to avoid manual counting mistakes. Another internal threat is that we only investigated the parameter coupling distance and common coupling distance. Due to the limitation of Apache HTTP, we did not investigate the inheritance coupling distance and external coupling distance. Therefore, to reduce this threat, we plan to study other software products, including object-oriented software to validate the relationship between external/inheritance coupling distance and software qualities. The third internal threat comes from the measurement: coupling data is obtained from one specific version of Apache HTTP (version 2.2) while the modification data of a component is obtained from all versions of Apache HTTP. To reduce this internal threat, more coupling data on different versions of Apache HTTP should be obtained and examined against the modification data. One construct threat to validity is the construction of the tree structure of Apache HTTP. Currently, we use the package structure to represent component structure, which might not be a representative of the system architecture. Another construct threat to validity is that our coupling analysis is only based on static analysis and we did not consider dynamic run-time coupling/dependencies. In static analysis, we only considered acyclic dependencies, i.e., sets of dependencies with no recursive references. During run-time, recursive, or cyclic dependencies could exist between software components. The external threat to validity is that the study performed on Apache HTTP is not representative of other component-based software products. To reduce these threats, more studies with dynamic analysis should be performed on other software systems. Due to the observed importance of coupling distance, our studies have the following impacts on software design metrics, which also aptly captures our future research directions: 1. Object-Oriented Design: the measurement of class coupling could be refined by integrating coupling distance. For example, the CBO metric presented by Chidamber and Kemerer (Chidamber and Kemerer 1994) only considers the number of objects coupled to a specified object. In fact, different objects may have different relevancies to a specified object and by applying the coupling distance parameter, the CBO metric could be refined and revalidated. 2. Structured Design: the measurement of architecture design could also be refined. For
example, Card and Glass (Card and Glass 1990) defined the structural complexity of a specified module as the square of the fan-out of a module. Fan-out is the number of modules that are directly invoked by this specified module. In this paper, we show that different modules may have different relevancies to a specified module. Therefore, new metrics for structural complexity, data complexity, and system complexity could be derived if the coupling distance parameter is introduced within these measurements. Acknowledgements This work was based in part, upon research supported by the National Science Foundation (CNS-0619069, EPS-0701890 and OISE 0650939), Acxiom Corporation (# 281539) and NASA EPSCoR Arkansas Space Grant Consortium (# UALR 16804). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies. The authors would like to thank Professor Stephen R. Schach of Vanderbilt University for his many suggestions. The authors would also like to thank the anonymous reviewers for their valuable comments and suggestions which greatly improved the earlier version of this paper.
References Abdurazik, A. (2007). Coupling-based analysis of object-oriented software, Ph.D. Dissertation, George Mason University. Available at: http://www.ise.gmu.edu/~ofut/rsrch/aynur-dissertation.pdf. Banker, R. D., Datar, S. M., Kemerer, C. F., & Zweig, D. (1993). Software complexity and maintenance costs. Communications of the ACM, 36(11), 81–94.
Basili, V. R., Briand, L. C., & Melo, W. L. (1996). A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering, 22(10), 751– 761.
Berns, G. M. (1984). Assessing software maintainability. Communications of the ACM, 27(1), 14–23.
Biggerstaff, T. J., & Perlis, A. J. (1989). Software reusability: Concepts and models (Vol. 1). New York, NY: ACM Press.
Briand, L. C., Daly, J. W., & Wüst, J. K. (1999). A unified framework for coupling measurement in object-oriented systems. IEEE Transactions on Software Engineering, 25(1), 91–121.
Briand, L. C., Morasca S., & Basili V. R. (1994). Defining and Validating High-Layer Design Metrics, Computer Science Technical Report Series, Vol. CS-TR-3301, University of Maryland at College Park, College Park, MD. Brown, A. W. (1997). Background Information on CBD, SIGPC, Vol. 1. No. 18. Bruegge, B., & Dutoit, A. H. (2004). Object-oriented software engineering using UML, patterns, and Java. Upper Saddle River, NJ: Pearson Prentice Hall. Card, D. N., & Glass, R. L. (1990). Measuring software design quality. Upper Saddle River, NJ: Prentice-Hall. Chidamber, S., & Kemerer, C. (1994). A metric suite for object oriented design. IEEE Transactions on Software Engineering, 30(6), 476–493.
Dandashi, F. (2002). Software engineering: theory, application and practice: A method for assessing the reusability of object-oriented code using a validated set of automated measurements. In Proceedings of the 2002 ACM Symposium on Applied Computing, pp. 997–1003. Frakes, W. B., & Succi, G. (2001). An industrial study of reuse, quality, and productivity. Journal of Systems and Software, 57(2), 99–106.
Gibson, V. R., & Senn, J. A. (1989). System structure and software maintenance performance. Communications of the ACM, 32(3), 347–358.
Harrison, R., Counsell, S., & Nithi, R. (2000). Experimental assessment of the effect of inheritance on the maintainability of object-oriented systems. Journal of System and Software, 52(2–3), 173–179.
Hassoun, Y., Johnson, R., & Counsell, S. (2004). A dynamic runtime coupling metric for meta-level architectures. In Proceedings of the Eighth Euromicro Working Conference on Software Maintenance and Reengineering (CSMR’04), pp. 339–346. Jonge, M. D. (2004). Multi-level component composition. 2nd Groningen Workshop on Software Variability Modeling (SVM’04). Kafura, D., & Henry, S. (1981). Software quality metrics based on interconnectivity. Journal of Systems and Software, 2(2), 121–131.
Leavens, G., & Sitaraman, M. (2000). Foundations of component-based systems. Cambridge, UK: Cambridge University Press. Lim, W. (1994). Effects of reuse on quality, productivity, and economics. IEEE Software, 11(5), 23–30.
Lüer, C., Rosenblum, D. S., & van der Hoek A. (2001). The evolution of software evolvability. In Proceedings of the 4th International Workshop on Principles of Software Evolution, Vienna, Austria, September 2001, pp. 134–137. Mei, H., Zhang, L., & Yang F. (2001). A software configuration management model for supporting component-based software development. ACM SIGSOFT, 26(2), 53–58.
Nolan, B. (1994). Data analysis, an introduction. Cambridge, MA: Polity Press. Offutt, J., Harrold, M. J., & Kolte, P. (1993). A software metric system for module coupling. Journal of System and Software, 20(3), 295–308.
Page-Jones, M. (1980). The practical guide to structured systems design. New York: Yourdon Press. Price, M. W., & Demurjian, S. A. (1997). Analyzing and measuring reusability in objectoriented design. In Proceedings of the 12th ACM SIGPLAN Conference on ObjectOriented Programming, Systems, Languages, and Applications, pp. 22–33. Selby, R. W., & Basili, V. R. (1991). Analyzing error-prone system structure. IEEE Transactions on Software Engineering, 17(2), 141–152.
Simon, H. A. (1969). The architecture of complexity, the sciences of the artificial. Cambridge, MA: MIT Press. Stevens, W. P., Myers, G. J., & Constantine, L. L. (1974). Structured design. IBM Systems Journal, 13(2), 115–139. Troy, D. A., & Zweben, S. H. (1981). Measuring the quality of structured design. Journal of Systems and Software, 2(2), 113–120.
Yu, L. (2007). Understanding component co-evolution with a study on Linux. Empirical Software Engineering, 12(2), 123–141.
Yu, L., & Ramaswamy, S. (2007). Verifying design modularity, hierarchy, and interaction locality using data clustering techniques. In Proceedings of the 45th ACM Southeast Conference, Winston-Salem, NC, March 2007, pp. 419–424. Yu, L., Schach, S. R., Chen, K., & Offutt, J. (2004). Categorization of common coupling and its application to the maintainability of the Linux Kernel. IEEE Transactions on Software Engineering, 30(10), 694–706.