Identifying Clone in Linux kernel
By Gaurav Taywade
History • Unix : 1969 Thompson & Ritchie AT& T Bell Lab. • BSD : 1978 Berkeley Software Distribution. • GNU: 1984 Richard Stallman, FSF. • Minix: 1987 Andy Tannenbaum. • Linux: 1991 Linus Torvalds Intel 386 (i386).
Linux Features • UNIX-like operating system. • Preemptive multitasking. • Virtual memory (protected memory, paging). • Demand loading, dynamic kernel modules. • TCP/IP networking. • Open source.
What’s a Kernel? • System monitor. • Controls and mediates access to hardware. • Schedules / allocates system resources: • Enforces security and protection. • Responds to user requests for service (system calls).
Kernel Design Goals • Performance: efficiency, speed. • Stability: robustness, resilience. • Capability: features, flexibility, compatibility. • Security, protection. • Portability
Linux kernels • Consists of 538 .c and .h files, 279,118 LOC. • 42 file system implementations. • Layered design.
Clone Code • The practice of copying code promotes the appearance of duplicated code snippets, called as clones.
• Typically 5% to 10% of code, up to 50%.
Associated Problems • Errors can be difficult to fix. • Change in requirements may be difficult to implement. • Code size unnecessarily increased. • Can lead to unused, dead code. • Can be indicative of design problems. • Bugs may be copied as well.
Where clone occurs???? • Duplicated blocks within the same function. • Cloned blocks across functions, files and directories. • Similar functions, same file. • Functions cloned between files in the same directory. • Functions cloned across directories. • Cloned files.
Frequency of Clone Types
The Clone Identification process
Case Study
Duplication Detection Techniques • String based • Token based • Parse-tree based
The Method Two feasible approaches ,to obtain information on several platform • Pre-Process and parse the code source with different configuration. • Adopt a fictitious reference configuration
The Method Values that can be assumed by pre-processor switch. • Y the code is included into the compiled kernel; • N (commented switch),the code is excluded. • M a dynamically loadable module is produced.
The Method
1) Function Identification. 2) Clone Identification
The Method
1) Function Identification.
The Method 2) Clone Identification • • • • • • • •
The number of passed parameters. The number of LOC. The cyclomatic complexity. The number of used/defined local variable. The number of used/defined non local variable. The number of arithmetic and logical operator. The number of function call & return/exit points. The number of structure pointer access fields.
Results: Accuracy • Number of false matches: • Parameterized suffix tree matching and simple line matching find no false matches. • Parameterized line matching finds few false matches. • Metrics based matching finds many false positives when applying metrics to block fragments, only a few when applying to methods.
Results: Accuracy • Number of useless matches: • Both parameterized methods returned low amounts of useless matches. • Metrics found more useless matches, 133 out of 138 in TextEdit when applying metrics to methods. • Simple line matching finds many, 229 useless matches in TextEdit.
Results: Accuracy • Number of recognizable matches • Parameterized matching techniques return less recognizable matches. • Simple string match returns the lowest.
Kernel analysis
Cloning evolution Levels Of granularity: 1)The overall cloning on the entire Linux kernel; 2) The cloning among major subsystems; and 3) The cloning among architecturedependent code of some subsystems.
Results • 12% of the Linux kernel file-system code is involved in code duplication. • Detected 3116 clone pairs, with an average length is 13.5 lines. • 78% of cloning occurs in the same directory.
Conclusion • We have begun to build a taxonomy of code clones in software. • Cloning activity in the Linux kernel filesystem subsystem is at a non-trivial rate. • Cloning most commonly occurs within a subsystem.
Conclusion • Parameterized string matching provides an interesting and powerful method for function duplication detection. • 3D visualization provided an interesting method of viewing clones amongst subsystems
Visualization of Cloning Without Showing Same Directory Clones
References •
G. Casazza, G. Antoniol_, U. Villano, E. Merlo, M. Di Penta, Identifying Clones in the Linux Kernel, IEEE Computer society press, 2001
•
Christopher Negus, Linux Bible, 2005 Edition
•
I. Bowman, Conceptual architecture of the Linux kernel, Technical report, Technical Report, University of Waterloo.
•
J. Mayrand, C. Leblanc, and E. Merlo., Experiment on the automatic detection of function
•
clones in a software system using metrics. In Proceedings of the International Conference on Software Maintenance - IEEE Computer SocietyPress, Nov 1996.
•
G. Antoniol, U. Villano, E. Merlo, M. Di Penta, Analyzing cloning evolution in the Linux kernel, Information and Software Technology 44 (2002)
References •
Gary nutt , Kernel projects, Addison Wesley,2001 edition
• M.Beck,H. Bohme, Linux kernel Programming,Pearson education,2004 • Neil Matthew,Richard Stones, Wiley Publication,2004 edition.