Migrating From C++ To C# Introduction Since its beginning in the 1980s, C++ has come a long way. It has a large established user base, tested software, its own tools (compilers, etc), and lots of experienced programmers. It has also developed its own idioms and techniques for programmers to write effective software. C++ programmers are comfortable in getting things done with the facilities that are provided with it in an efficient manner. .NET is a powerful new platform with a great deal of promise. C# is designed from the ground up to harness the power of this new framework. It provides a whole host of features and is strongly based on C++. C# is an object oriented language and is the first component-oriented language in the C family. It also makes writing Windows and Web applications faster and easier. C# is gaining wide acceptance and it is clear that it is here to stay for a long time. C# is not a replacement for C++, and it is more than likely that both will be used widely for the foreseeable future. However, there are many practical cases where there is a necessity to migrate from C++ to C#. For instance, your company's policy may be to change all existing code to .NET, or perhaps you wish to take advantage of some of the facilities made available in .NET. The question is, how do we make the transition as smooth as possible while getting the best results? Adopting a new language doesn't just mean converting the existing code from C++ to C#. By just knowing the syntax, a C++ programmer cannot straightaway start programming in C#. These two languages differ largely by their design and approach towards problem solving, which makes the language transition harder. System Requirements It is preferable that the reader has access to the C# compiler available in Microsoft Visual C#.NET. This case study is for those programmers coming from a C++ background, who are new to C# or have just started programming in it. The programmers with a good understanding of C# are in a better position to understand the approaches taken in the conversion process. Case Study Structure The case study consists of three main sections:
•
The approach
In this section, we briefly cover the basic theory that is necessary for understanding the issues in conversion. It is possible that you may not be clear about few of the C# features mentioned here - they are covered in the following section. •
Comparing C++ and C# features:
In this section we look at the different features of the two languages which are necessary to make the conversion possible. •
Steps in converting existing code:
The steps that are required for converting the existing code from C++ to C# are covered in this section. An example of converting class hierarchies from C++ to C# is also covered. The Approach What is the best approach for getting equipped for a smooth transition from C++ to C#? Understanding! Migrating from one language to another involves a considerable effort. This is not because of a change in syntax, rather because of changes in methodology - design approach, underlying technology, and the approach towards problem solving. Understanding that there is such a fundamental shift, and having the knowledge of where the major differences lie, will help a lot. The underlying translation models for C++ and C# are quite different. C++ follows a static linkage model, meaning that the source code is compiled by the compiler to result in object code. The object files are linked to result in an executable file. The operating system loads and controls the execution of the program. The language features are designed with this approach in mind. For example, there is no support for reflection. Moreover, the code is only source code portable and not much runtime support is available. C# follows an entirely different translation model - it combines compilation and interpretation. The source code is converted to an intermediate format known as MSIL (Microsoft Intermediate Language). A virtual machine, referred to as the CLR (Common Language Runtime), takes over to execute the instructions. The execution is thus in the hands of CLR and the code executed is referred to as managed code. This change in translation model is reflected in the language features as well.
A very important difference is in the area of memory management: the programmer no longer has the complete control of the lifetime of the objects in the heap. The garbage collector takes care of deleting objects whose lifetime is over. So there is no need for the keyword "delete". However, there are destructors in C#. If the "delete" keyword is not available in C#, then what is the use of destructors? In reality, the destructor syntax in C# is very misleading, especially for programmers from a C++ background. They are actually finalizers that are called before an object is garbage collected. Another issue to understand is the change in design criteria. C++ is designed for experienced programmers in mind and 'trusts the programmer' (in the C tradition). So no extensive runtime checking is done, there are implicit casts and promotions in function calls. These features have proven to be very useful, but also very bug-prone. Therefore, only experienced programmers should use them. However, C# is designed so that even novice users can learn it fairly easily, and is also designed with robust software in mind. It performs extensive runtime checking with very few implicit conversions and tries to make the life of the programmer easier. How can this understanding of language design change help in transition from C++ to C#? Let us use an example. A single argument constructor also serves the purpose of a conversion operator in C++. When a conversion is required, that constructor will be called implicitly because, it 'trusts the programmer': it is assumed that the C++ programmer is aware of it. Such implicit calls may lead to subtle bugs, like: class Stack{ public: Stack (int initivalCapacity); // constructor that takes int as an argument // other members }; // now consider the code Stack s; s = 25; // implicit conversion, a new Stack object is created with int as argument // s = Stack(25); // beware! the programmer may have programmed without being aware that the // constructor with int argument is called for the conversion operation // from int to Stack
To avoid such problems, you cannot use single argument constructors as conversion operators in C#. You have to support explicit conversion operators for that. Also, you can appreciate the use of implicit and explicit keywords in C#
better. With this knowledge you are better equipped now. When you write equivalent C# code, you will also need to examine if a conversion operator needs to be implemented and decide if it should be declared as explicit or implicit if the original C++ code had any single argument constructors. The problem solving approach also differs considerably in these two languages. Consider writing a simple calculator program. You require a postfix expression evaluator, and for that you may prefer to have your own reusable version of Stack. The interface for Stack is well-known and the logic is pretty straight forward. Still, your approach towards solving such problems may be entirely different depending on the language you use. In C++, you would write a template class for the stack. If you want to evaluate an integral expression then you will instantiate an integer version from that Stack template class. It has its own benefits like static type checking. You can use this same implementation for any type of expression, for example floating point expression, without any changes. It is also extensible. In C#, all objects come from the common base class 'object', and so you can write a Stack class which stores 'objects'. Since all the objects inherit from this class, you can store virtually any object in that Stack. When you retrieve the elements, you have to employ dynamic type checking to make sure that the types don't mix-up. As you can see, even for the same well-defined problem, the problem solving approach differs considerably and you make a different set of decisions and an entirely different implementation depending on the language you are using! Another important factor in the transition from C++ to C# is that it is a transition from an unmanaged environment to a managed environment. In C++ there is only trivial support from the runtime available, whereas C# has the sophisticated .NET runtime environment. C++ programmers need to make special efforts to understand the advantages with the managed environment. For example, reflection is a powerful feature which can be used to generate and execute assemblies dynamically. Runtime checks ensure that the security privileges are available for providing access to resources. You have array bounds checking, versioning support and most important of all - components that are created from any language can interact freely. However, it should be noted that the managed environment also comes with restrictions: you can no longer allocate objects anywhere you wish - you can only allocate to the heap. Also, you cannot do generic programming with templates as you could in C++, as .NET doesn't support it yet. The concept and benefits of a managed environment are new to C++ programmers, and hence exposure to the facilities with the underlying framework is essential to get the most out of C#.
In essence, having a broad picture of these two languages and understanding the differences in the underlying technology and approaches to design and problem solving are essential for migrating from C++ to C#. Comparing C++ and C# Features The first requirement that is needed to move from C++ to C# is a shift in your mindset. C++ is a language which trusts the programmer. This provides the programmer with the ability to do whatever he wants. This power does have drawbacks though - it can be misused and can end up causing major headaches. C# on the other hand, doesn't trust the programmer as much. It takes many of the responsibilities from the programmer and enables him to concentrate on the bigger picture. It removes a few features that were error prone, and introduces new ones that simplify programming. Let us now compare the features available in the two languages. Data Types The types in C++ can be subdivided into three categories: primitive types, aggregate types, and pointer types. The primitive types are: bool, char, int, float, double, wchar_t. The aggregate types are those that are composed of other types. These include arrays, structures, unions and enums. Both pointers and references are called as pointer types. In C#, things are a little different, as it only has value and reference types. The value type is capable of storing data by itself, whereas the reference type cannot. It stores a reference, which points to the actual data. The value types can be thought of as equivalent to the primitive types in C++. They are derived from the class System.ValueType. These types can be stored in the stack frame of a method. The reference types cannot be stored in the stack frame, only in the heap. However, a difference between C++ and C# data types are their size. While the size of most of the types is implementation-dependent in C++, we have fixed sizes in C#. We need to be cautious while converting between the available types. For example, in C++ there is long double, which is 10 bytes. There is no long double type in C#, and double occupies 8 bytes. There is a new type, decimal, available in C# that occupies 16 bytes. Since the decimal type occupies 6 more bytes than the long double (in C++) you may think that you should be able to store a value in decimal in 16 bytes what long double stores in 10 bytes. However, decimal isn't used to give a wider range, rather it's used for getting a more precise value, as in the case of currency values. If your intention of using long double is for higher precision, you don't have any problems, however if it was for a wider range you may have trouble.
Unsigned types are supported in C#, but they are better avoided because using them makes the code non CLS-compliant (Common Language Specification compliant). References We have seen many C# programmers considering C++ references equivalent to C# references. This is wrong! Actually C# references are closer to C++ pointers. Remember that the references in C++ serve as a name alias. They are sure to point to an object, and sure to point to the same object throughout the scope of the reference. However, it's different in C#. Just like pointers, they can be defined without initializers, but they can point to different objects at different times, and they can even point to nothing - the null (actually they throw NullPointerException and not NullReferenceException when you attempt to refer a null reference!). So: //C++ MyClass &ref = null; // error, references cannot be null MyClass &ref = obj; // needs an initializer ref = anotherObj // Error: can't change the reference //C# MyClass ref; // OK initializer not needed ref = anObj; ref = anotherObj; // OK change the reference ref = null; // allowed
You can think of C# references as 'restricted and safe C++ pointers': // C++ string * s; s = new string; // C# string s; s = new string();
Declarations and Definitions With the discussion of C# references, you may have noticed one drastic difference in the semantics of the following statement: string str;
This same statement will be interpreted in different ways by these languages. A C++ compiler sees this statement as a definition of a variable called str. It allocates a new stack object (or data area if declared globally) and calls the default constructor on the allocated object (a string object in this case). A C#
compiler sees the same statement as a declaration for the reference variable str. It allocates space for the reference alone. It neither allocates space for an object nor calls the constructor. This should be done explicitly be the programmer: str = new string();
This statement now allocates memory for the object in the heap and then calls the default constructor. C# combines declarations and definitions together, whereas C++ clearly distinguishes between the two. For this reason, there are no function prototypes and forward declarations. The C# compiler carefully checks for definite assignment - you cannot use a variable without initializing it. Such facilities help avoid bugs, and greatly simplify the life of the programmer. Structs Except for the default access specifier, C++ never differentiates between structs and classes. Both of them are functionally the same. A struct can contain methods and can be inherited by a class, but C# takes a different path. Here the structs are just placeholders of other data types and no behavior can be specified. This means that the structs can no longer contain any methods. No classes can inherit from them. The advantage is that, as they are value types, they can be stored in the stack frame. They do not require any indirection and so are more efficient than classes. When you want to group some related data where no methods have to be associated with them, structs are the best solution. When we want to model a real world entity with both data and methods, classes have to be used. For example, 'Point' in a graph is a simple aggregate type, and for that a struct can be used. Implementing a 'Vehicle' type may require encapsulating lots of data and methods operating on it, and for that, classes are better suited. Actually there are no hard-and-fast rules for deciding between structs and classes. A good rule-ofthumb is to use structs for the simplest aggregate types and classes for any nontrivial types. One notable advantage when you use structs is that they are allocated on the stack itself and there is no memory overhead. Lots of memory will be saved when hundreds of objects are created, for example a big array of struct type. When you use a class type, the objects will be allocated on the heap and hence a lot of memory overhead is involved (In the current version of .NET, 10 more bytes are occupied for each heap object compared to an equivalent stack object!). So, using structs for small types can lead to saving significant amounts of memory.
MyStruct [] sArr = new MyStruct[10];
Whereas for the class type: MyClass [] oArr = new MyClass[10]; for(int i = 0; i < 10; i++) oArr[i] = new MyClass();
Arrays Arrays are the simplest data structures that are widely used in programming. In C++, arrays are treated as a contiguous memory location. The low level nature of arrays create problems with object oriented programming. A base class pointer cannot be used to iterate through the array of derived class objects: class Base{ public: // Base class data members virtual void boo(); }; class Derived: public Base{ public: // Derived class data members virtual void boo(); }; void foo(){ Derived dArr[10]; Base * bPtr = dArr; for(int i =0; i<10; i++) bPtr[i]->boo(); // Will not work properly }
This is because, the size of the base class object may not be equal to the size of the derived class object. The compiler cannot identify the proper object at the time of compilation. C# is a dynamic language and has fully-fledged support from the run time. Arrays are no longer contiguous location in the memory. This makes such operations legal and safe. C# does not treat arrays as mere continuous memory locations. It adds objectoriented characteristic by giving a class System.Array, from which all the arrays inherit. This class abstracts the operations on an array and can be casted into any of the arrays. Remember that arrays of all types are also derived from this class. As arrays are instances of a class, they are always reference types and this holds good for the arrays of value types. This helps in bound checking for every access in an array, but a problem is that has it to be allocated on the heap only.
Both the languages support rectangular and jagged arrays. For rectangular arrays, a chunk of plain memory locations are allocated and indexing is done on it. In C++, jagged arrays can be implemented by having a pointer array and allocating memory dynamically for each array. The same idea is followed in C#, but instead of pointers, references are used. This makes optimal use of space, since the sub-arrays may be of varying length. The compromise is that additional indirections are needed to refer to access sub-arrays. This access overhead is not there in rectangular array since all the sub-arrays are of same size. // C++ language example for 'rectangular arrays' float rectArr[5][20]; // C# rectangular arrays, note the difference in syntax float [,] rect = new float [5,20]; // C++ language example for 'jagged arrays' float **ptr; ptr = new float *[5]; for (i=0; i< 5; i++) ptr[i] = new float [20]; // C# example for 'jagged arrays' float [] [] ptr; ptr = new float[5][]; for(int i=0; i<5; i++) ptr[i] = new float[20];
When more than one method of representation is supported, at some point the user will require to switch from one representation to another. Here, to convert from one array type to another, techniques called boxing and un-boxing are used (discussed later). It also should be noted that C# supports 'Indexer' members that allow array-like access to data structures. Enums Enumerations are of the type int in C and in C++; its type depends on the number of enumeration constants declared. C#, as an improvement over the old enumeration, allows you to specify the type of the enumeration: enum holidays : ubyte{ Sunday = 0, Saturday = 1 }
C# enums differ from C/C++ enums in that the enumerated constants need to be qualified by the name of the enumeration when they are used. enum workingDay { mon,tue,wed,thur,fri }; workingDay today;
today = workingDay.mon; //note that mon is qualified by workingDay
This name.member syntax helps the enumeration constants to remain in a separate namespace, thus preventing them from polluting the global namespace. Furthermore, it prevents name clashes between two different enums: // C#: no name clashes with other enum members enum Days { mon, tue, wed, thur, fri, sat, sun }; enum CosmicObjs { earth, mars, jupiter, sun, moon}; enum Companies {sun, microsoft, dell, digital, compaq}; myDay = Days.sun; computer = Companies.sun; cosmicObject = CosmicObjs.sun;
Variable Length Argument Lists Experience has shown that programmers prefer C style printf format, because it is convenient for exact format specification and is easy to use. C# provides 'params' for the support of variable length argument lists. So you can write your functions using this facility as in: int MyPrintf(string format, params object [] args);
For printing, C follows the format string with variable length argument strings; C++ uses << with cout; Java has overloaded the + operator. In C#, to print the arguments, the numbering should be as follows: Console.WriteLine("{1} {2} {3}", i, obj, "someString);
Writing 'Unsafe' Code C++ is good for writing low-level code, which is useful for programming systems with features like pointer arithmetic. C# understands the importance of that, and allows 'unsafe casts', pointers, and pointer arithmetic to be performed in code segments that are explicitly labeled as unsafe. Note that the keyword 'unsafe' may be misleading - it just specifies that is isn't managed code and that it may perform low-level operations. Also, it is not as easy nor as powerful as in C+ +. Argument Passing When we pass a variable to a method, we are not sure whether it will get modified or not. To ensure that the variable should not be modified, the programmer should use the const qualifier for that argument in that method. The absence of such const qualifiers indicate that the variable could be used for
multiple return values in C#. It introduces two new keywords to achieve these multiple return values. If the method has multiple return values, it should explicitly use the ref or out keyword. Furthermore, C# supports two new types of arguments: ref and out. When we pass an argument to a method, the caller should be aware that the parameter may be modified. The ref keyword indicates this. As wekk as during the method definition, the ref keyword is also used in the method invocation: //C++ void foo(MyClass & arg1, MyClass & arg2){ // other code; arg1 = newValue1; arg2 = newValue2; } foo(obj1, obj2); // Note: the caller may not expect obj1 and obj2 will change //C# int foo(ref MyClass arg1, ref MyClass arg2){ arg1 = newValue1; arg2 = newValue2; } foo(ref obj1, ref obj2); // Now the programmer is aware that obj1 & obj2 may be changed
In a few cases, we may want to initialize the arguments only in the method. The use of the ref keyword will be flagged as an error by the compiler as a definite assignment has to be done before the first use. One elementary way to avoid the error is to initialize the variable with the default value and then to pass it to the method. C# introduces a new keyword for this situation. Instead of ref, we can use out, which doesn't force the caller to initialize the variable. However, it is mandatory for the method to assign some value to it. //C# void foo(ref MyClass arg1, out MyClass arg2){ // other code; arg1 = someValue; // optional arg2 = someValue; // need to assign some value } MyClass obj1, obj2; obj1 = aValue; // need to initialize foo(ref obj1, out obj2); // note obj2 is not initialized
Class Abstraction
Just like C++, the basic unit of abstraction is a class. The access specifiers public, protected and private have the same meaning in both the languages. In addition, C# provides internal and protected internal access specifiers. The internal members are available to the whole assembly and the protected internal to the assembly and the derived classes. Why do you ever need these access specifiers? There are few cases where you need to access members of other classes in the same assembly but shouldn't be exposed to the external classes. Since friend access is not there in C#, this can be a useful feature particularly when you are designing libraries. Inheritance C# doesn't support multiple class inheritance. It only supports single inheritance, but you can still inherit from multiple interfaces. Pure abstract classes in C++ can be treated as interfaces in C#. There are many restrictions in using interfaces for inheritance. You can only have public abstract methods, and no fields are allowed (not even const fields). However, one interface can inherit from another interface. C# only supports public inheritance. Not having private or protected inheritance doesn't affect the functionality as such. There are a few inconveniences with this approach, for example, once you implement ICloneable, all the classes that inherit from that class becomes automatically cloneable, as only public inheritance is available. The Object Base Class C# doesn't support templates as .NET doesn't support it yet. However, a weaker form of generic programming is supported in C# through the System.Object base class. This is the apex class for all the objects. This includes the value types like structs and ints and reference types like arrays and strings. This property is exploited in the Collections provided in the framework that works in terms of Objects. The standard libraries of both C++ and C# provide support for the container classes. Consider this example of using the vector class: MyClass obj; string str = "string object"; const int size = 5; vector<MyClass> vect(size); vect[0] = obj; vect[1] = str; // Compiler Error: vect can store only MyClass and not others
// insert more elements // iterator provides a pointer-like syntax for //traversing the container cout<::iterator iter = vect.begin(); while(iter != vect.end()){ cout << *iter; // calls overloaded << operator of MyClass iter++; }
Thus, you can have elements of only one type, and the traversing and accessing is done through iterators. With C#, .NET provides an equivalent container class for vector - the ArrayList container: // Creates and initializes a new ArrayList MyClass obj; string str = "string object"; ArrayList arrLst = new ArrayList(); arrLst.Add(obj); arrLst.Add(str); // can simply use foreach statement for traversing the colletion foreach(MyClass elem in arrLst){ Console.WriteLine( " {0} ", elem); } // throws 'InvalidCastException' as the second element is a string
Operator Overloading In C++ almost all the operators can be overloaded - there are only a few operators like the conditional operator, . operator, .* and .-> operators that cannot be overloaded. C# provides support for operator overloading but to a limited extent. The syntax for overloading the operators is: // C++ ClassName::operator (arguments) // usage example class MyClass{ public: MyClass operator + (MyClass &rhs); }; // C# public static operator (arguments) // usage example class MyClass{ public static MyClass operator + (MyClass lhs, MyClass rhs){} }
The main difference is that while you can have member or global (mostly friend) functions in C++, you have static methods for overloading in C#. Although the syntax looks similar there are a few constraints imposed by C# for operator overloading. The most important are: • • •
The methods should be declared as public and static. Many of the operators are required to be overloaded in pairs. For example, if you define == you should overload the != operator also. If you define the + operator, the compiler defines the += operator for you to make things easier.
//C++ class CPPClass{ protected: // can be public or protected or private bool operator ==(CPPClass &rhs); // the another argument is passed implicitly by 'this' pointer // note no != operator defined bool operator ++(); // type of the return value is not forced bool operator +(CPPClass &rhs); bool operator +=(CPPClass &rhs); // += is not implicitly defined static int operator-(CPPClass &lhs, CPPClass &rhs); // both static and non static methods are allowed }; //C# class CSharpClass{ //note that all the operators are public and static public static bool operator ==( CSharpClass lhs, CSharpClass rhs){} public static bool operator !=( CSharpClass lhs, CSharpClass rhs){} // relational operators should be overloaded in pair public static CSharpClass operator ++(CSharpClass arg); // return type and argument types are forced for few operator public static bool operator +(CSharpClass rhs); // += is implicitly defined by the compiler when // binary + is defined }
Exceptions Exception handling in C# is similar to C++. The exception specification of a method lists all the possible exceptions that the method might throw. In C++, when the method doesn't lists any exceptions, beware that it is then allowed to
throw any exception, and there is no constraint for a method to catch the exceptions thrown. Further, exceptions are not only thrown in the form of classes, but also in the form of primitive types. The C# exception handling mechanism is much simpler and more elegant. Firstly, a method cannot throw the exceptions that are not listed in the exception specification of the method. Catching of the exceptions is mandatory and only objects of Exception (or derived from) are thrown. //C++ void foo(){ throw 10; throw MyException(); throw "This is an Error"; } void boo() throw (int, Exception){ throw "Something is wrong"; //Error: can only throw int / Exception } void doo() throw (){ // guaranteed that no exceptions will be thrown } //C# void foo(){ //will not throw any exception } void boo() throws IOException{ throw new IOException(); //OK throw new MyException(); // Error: allowed to throw only IOException }
Namespaces Namespaces are supported in C++ for better organizing the code and are valuable in large-scale programming. In C#, the syntax for declaring and organizing classes in a namespace is similar to that of C++. There is no concept of header files (C# design is such that there is no need for header files, for example, it combines declarations and definitions) and you have to use the using directive to open up the members in the namespace for access in the code. You can also have aliases: using alias_name = namespace_or_type;
Just like in C++, you can have nested namespaces. The syntax is a bit different:
namespace outer.inner{ // some members }
Note that you have to use one namespace within another for a similar goal in C+ +: namespace outer{ namespace inner{ // some members } }
There is an importance difference between the namespaces in C++ and C#. In C+ +, namespaces are logical entities and no physical enforcement of namespaces exists. However in C#, in addition to logical separation, a physical separation of namespace members and enforcement of hierarchy is there in the form of assemblies and sub-assemblies. This enables the namespace rules to be enforced at the physical level. Properties It is common for a C++ programmer to give the get and set methods for data members. Not only does this help in abstracting the details, but it also gives a few advantages such as that the user cannot assign illegal values to the field, such as 500 to a field called age, or the programmer can give a read-only version of the member, such as size of a container, etc. class MyClass{ private: int someInt; int length; public: inline int getLength(){ return length; } inline int getSomeInt(){ return someInt; } inline void setSomeInt(int arg){ if(arg >= minValue && arg <= maxValue) someInt = arg; else error("illegal value"); } }; //usage: MyClass anObj; anObj.setSomeInt(100); int len = anObj.getLength();
As most of these methods are inlined, the performance isn't affected. However there are two problems with the usage of such functions. The first is that the syntax of accessing them is a bit unwieldy. The next is that the approach itself violates the object oriented programming guidelines. An object is supposed to expose a behavior and not the implementation. By these methods, obviously the object exposes its private fields to the user. C# provides a whole new way to handle this situation through properties. Properties are very much like the get-set methods, but syntactically different. Consider this example written with properties in C#: class MyClass{ private int someInt; private int length; public int Length{ get{ return length; } } public int SomeInt(){ get{ return someInt; } set{ if(value >= minValue && value <= maxValue) someInt = value; else error("illegal value"); } } } //usage: MyClass anObj; anObj.SomeInt = 100; // set the value of the field through mutator property int len = anObj.Length; // get the value of the field through accessor property
Note that a variable value is used in the set method. It is the implicit parameter passed to the method by the compiler. Its type is the same as that of the property. As we can see, the syntax is more intuitive to use. Indexers We tend to have many container classes that are used to hold a set of objects. Stacks, Queues, Maps and Hashtables are just a few such important containers. There are many other objects that can also be viewed as containers. For example, a menu can be thought of a container of the menu items. In most cases we will need to access the objects in the containers through an indexer. In C++ this can be done by overriding the array subscript operator []. We can override it not only
with integers, but with any object we want, which sometimes makes the subscripting more meaningful: class EmployeeContainer{ private: Employee emp[100]; public: Employee& operator[](int empNo){ //return the employee with the empNo } Employee& operator[](string name){ // return the employee with the name } }; void foo(){ EmployeeContainer empCont; // add the employees to the container Employee emp1 = empCont[5]; empCont["Pranni"].age = 24; }
C# introduces the indexers to fit this problem of indexing a container. The equivalent Employee class can be written in C# as: class EmployeeContainer{ private Employee emp[100]; public Employee this[int empNo]{ // implement it like a property get{ return emp[empNo]; } set{ emp[empNo] = Employee; } } public Employee operator[string name] { // implement it like a property get{ // getting Employee index mapped by string info } set{ // code for setting Employee detail at index position } } } void foo(){ EmployeeContainer empCont = new EmployeeContainer(); // add the employees to the container Employee emp1 = empCont[5]; empCont["Pranni"].age = 24; }
Attributes Attributes are a significant addition to C#. When you are creating your own types or components, there is a necessity to associate related details of the components and their elements. In COM you used type libraries to achieve such functionality. Traditionally, comments and macros are used in C++ programming for storing the metadata about the class and/or its members. C#'s attributes are far more powerful and you can give meta-information for many language elements: fields, methods, events, etc. You can retrieve and examine such meta-information at runtime using reflection (discussed later). There are two types of attributes: intrinsic (predefined) and custom attributes. C# supports a preprocessing facility but there is no separate tool - it is handled by the compiler itself. The preprocessor support has restricted use though, for example you cannot have macros. One of the uses of the preprocessor is conditional methods and that is achieved through Conditional attributes. It is an intrinsic attribute used for including the method depending on the condition. In C++ you use preprocessor facilities directly. //C# code #define DEBUG // such definitions should occur only in the beginning class MyClass{ [Conditional("DEBUG")] public static void debugFunction(string message){ cout<<message<<endl; } // other members }
C#'s conditional methods are very powerful when used with the Debug and Trace classes available with the System.Diagnostics namespace. There are many such useful attributes; one is Serializable, which is discussed later. You can define your own custom attributes. You have to derive your class from the AttributeUsage class. Here is one simple example for maintaining the code comments from the author of the code: using System; [AttributeUsage(AttributeTargets.All, AllowMultiple=true)] // tells that this attribute can be used on any program element // and there can be multiple entries for each use of attribute public class CommentAttribute : Attribute{ public CommentAttribute(string comment){ this.commentText = comment;
}
} private string commentText; public string CommentText{ get{ return commentText; } }
[Comment("Written by Ganni and Pranni")] class GuineaPig { // ... } class Test { public static void Main(){ Attribute[] attributes = Attribute.GetCustomAttributes(typeof(GuineaPig)); //This static method GetCustomAttributes //is used to retrieve the attribute info foreach(CommentAttribute attribute in attributes) Console.WriteLine(attribute.CommentText); } }
You can use the custom attributes with the same special syntax as in intrinsic attributes and there is no need to call the constructor explicitly - you can initialize the attribute directly. The static method GetCustomAttributes of the Attribute class is used for retrieving the attributes by passing the type. Callback Functions Function pointers are a useful facility in C/C++. The following example shows a real world example of using function pointers. Say you want to write a menu program. The aim is to write a program that will call a corresponding function that is selected in the menu at runtime. Therefore, we have to declare a function pointer whose signature matches the functions that are written for the menu: void (*menuSelector)( ); // get the input from the user - selection of the menu item switch(select){ case NEW : menuSelector = & New( ); break; case OPEN : menuSelector = & Open( ); break; // assign the address of the corresponding function to menuSelector } menuSelector( ); // now call the selected functionality
The calling of functions using these function pointers, whose value is determined at runtime, is known as 'call back'. C# provides support for callback functions
and it is called 'delegates' (you can also consider it as an improved version of the 'function objects' in C++). Delegates closely resemble function pointers, and C# promises that delegates are type-safe, secure, and object-oriented. A delegate is capable of a holding a reference to another function so that function can be called later. Even multiple functions can be installed like that. Callbacks are valuable for event handling. C# also supports events that are useful in the case of event driven programming like Windows Forms: public delegate void Selector(); // Selector is the type that can be used to instantiate // delegates that take no arguments and return nothing public Selector menuSelector; public void New(){ Console.WriteLine("You selected 'New' option"); } public void Open(){ Console.WriteLine("You selected 'Open' option"); } string select; // get the value of select from calling the menu... Test t = new Test(); switch(select){ case "New" : t.menuSelector = new Selector(t.New); break; case "Open" : t.menuSelector = new Selector(t.Open); break; // ... // register the selected method to menuSelector } t.menuSelector( ); // call the delegate and it will inturn call the registered method
Reflection and RTTI When doing object oriented programming, we treat an object as if it were an more general type. So, for example, we can view a Dog as a mammal, an animal, or even simply a living thing. So when we have a more generalized version, sometimes we would like to know what the exact type is and act accordingly. Say if we have a living thing, we would perform some operations on a mammal, that we wouldn't on an amphibian. We would perform even more specific operations if it were a Dog. In such cases, RTTI (Run Time Type Identification) comes into the picture. C++ provides the typeid operator and a set of classes that
enable the querying of the type of an object at runtime. This operator will return the exact type of the object only if there is at least one virtual function in it: class Base{ // no virtual methods void Base1(); void Base2(); }; class Derived1 : public Base{ virtual void vMethod(); }; class Derived2: public Derived1{ }; void foo(){ Derived2 d2Obj; Base* bPtr; bPtr = &d2Obj; cout<
Reflection is a feature available only in dynamic (interpreted) languages. Reflection is a powerful facility as we can dynamically load classes, create objects, change their properties, and invoke methods on it. Although fully exploiting the power of reflection will not be explored in this case study (see http://www.csharptoday.com/content.asp? id=1852&WROXEMPTOKEN=1518115ZIn19JBRkpiV5wX71qkfor a whole piece on the topic), here is a sample that loads an assembly and invokes its methods dynamically: using System; using System.Reflection; class ReflectionTest{ // this method will be called dynamically public void InvokeDynamic(){ Console.WriteLine("Hello, dynamic world!"); } public static void Main(){ Type t = Type.GetType("ReflectionTest"); // get the type by passing the name of this class MethodInfo m = t.GetMethod("InvokeDynamic"); object o = Activator.CreateInstance(t); // Activator is a class defined in System namespace // you can use it to create objects (remote or local) m.Invoke(o, null); // the second argument is the list of arguments passed
// to Invoke - null in this case } } // output: // Hello, dynamic world!
Memory Management Moving from C++ to C# takes away a lot of the programmer's freedom. C++ allows you to determine whether to create an object in the stack or on the heap, whereas C# doesn't. The change from unmanaged to managed environment has drawbacks to. C# is a dynamic language and all the allocation is done on the heap. Only value types are allocated on the stack. So, you have to allocate the memory for all the objects on the heap manually, even for those objects you used to allocate statically in C++. In C#, in addition to using 'new' for dynamic allocation for heap objects, you can use it for stack objects (structs) to call the constructors. The difference in where the objects are allocated is significant. For example, when casting is done from a value type to a reference type, memory needs to be allocated on the heap and initialized. This process is referred to as 'boxing'. For example: int i = 10; object o = i;
Note that you don't need an explicit cast here, as it is an 'upcast'. When the conversion is done from reference type to value type, it is referred to as 'unboxing'. However you need explicit casting to do that as it is a 'downcast': int i = 10; object iRef = i; int j = iRef + 100; // doesn't compile, needs explicit cast int k = (int)iRef +100; // now OK
Such conversions are not possible in C++ as there is no common base class. Boxing and unboxing are costly operations and need to be avoided whenever possible as it involves creation and destruction of objects. Garbage Collection The burden of managing the memory is greatly reduced in C#, as the garbage collector automatically reclaims the unused/unreferenced objects. With garbage collection, most of the problems with managing the memory like dangling pointers and memory leaks are gone. Garbage collection is only for memory objects, but there are other resources like network connections that need to be
released when the object is recollected. This is done in the finalize method. C# still supports C++'s destructor syntax, but C# destructors are 'syntactic sugar' for finalizers. ~MyClass(){ // release resources like database connections}
is equivalent to: protected override void Finalizer(){ try{ // release resources like database connections finally{ base.Finalize(); } }
}
which is little tedious to type, and hence the destructor syntax is convenient. The meaning of destructors is not the same in these languages even though the syntax is the same. There is no assurance that the object will be garbage collected or finalizers will be called immediately when there are no more references to that object. If there are important resources like file handles or database connections that are released in C++ destructor code, you shouldn't go for Finalize in C#. Rather, you have to implement the IDisposable interface, override the Dispose method, and write the code for releasing such connections or handles. using System.Runtime.InteropServices; class MyClass : IDisposable{ MyClass(){ // get resources }
}
public void Deallocate(){ // code for releasing resources here
public void Dispose(){ Deallocate(); GC.SuppressFinalize(this); // since Dispose is called, the Finalize method should // not be called... so tell GC to suppress call to // Finalizer method } ~MyClass(){ Deallocate(); } public static void Main(String []args){ MyClass obj = new MyClass(); // use obj;
}
obj.Dispose();
}
To be more precise: it is not possible to determine exactly when the garbage collector will be called, and so C# doesn't have deterministic finalization. To overcome this, you have to implement the IDisposable interface and provide the implementation for the Dispose method. After you use the object, you can release it by calling the Dispose method explicitly. Who is responsible for calling this method for objects that are from various sources? The time honored C++ principle of disposing heap objects applies to this also: 'whoever allocated the memory has to recollect it'. Steps in Converting Existing Code There are cases where systems that are written in C++ need to be ported to C#. The .NET environment can use C++ code directly in two cases: • •
When the classes are written in Managed Extensions to C++ If they are COM components
If the application is written as COM components, then the component can be used directly in .NET. In the case of COM components, you can use the Type Library Importer (tlbimp.exe) utility. It reads the COM type library information and converts it to an equivalent .NET assembly as a proxy class that contains the necessary metadata. However, it should be noted that the code is still unmanaged. 'Managed extensions to C++' (MEC) is a set of extensions to the C++ language provided by Microsoft that can be compiled to code targeting .NET environment. Most of the existing C++ code is not for component programming; so the code cannot be used directly in C#. MEC is new to the programming world and hence there is no possibility that legacy code is written in that. C# provides support for low-level programming and has facilities to make use of legacy code. For example, the methods that are available in the DLLs can be accessed by declaring such methods with the DllImport attribute. You have to declare such methods as extern - it has a similar use as in C++ for accessing methods from other languages. It can be applied only to methods implemented externally. Say, you want to use your favorite MessageBox in traditional Windows programming: [DllImport("User32.dll")] public static extern int MessageBox (int h, string m, string c, int type);
// now you can use it in your C# code
This feature is of great use if yours is a code library or framework and not a fullfledged application. You just need to declare the methods in your C# code and can make use of them by storing them in DLLs. When you want to convert existing C++ code to run under the .NET platform, the following decisions need to be made. If the code is simple enough that it can be rewritten without much effort, then you can go for C#. Practically, C++ code may involve low-level programming like accessing hardware features. Such functionality can be done in C# itself to some extent due to its support of C like structures and allowing restricted use of native pointers. At the level where fullcontrol over resources is required, you can do explicit memory management as well. Such code should be done in 'unsafe' blocks. If it is complex enough that it cannot be handled with the facilities that are available in 'unsafe' then direct conversion could be made from C++ to Managed Extensions to C++. Code written like that is accessible from C# code. All this means that the tested, legacy C++ code need not be discarded and you can still use it under .NET environment, albeit as unmanaged code. Thinking of one-to-one correspondence of functionality leads to poor design and fragile code. Translating C++ code on a line-by-line basis is not feasible as the two languages differ considerably in their functionality and support. Let us illustrate this with an example. In C#, all the functions have to be abstracted inside classes, as no global functions or data is supported. C# doesn't support global variables/functions because it strictly enforces class as the basic abstraction mechanism. So, when you are moving to C#, it is better to stick to the C# mindset - don't think in terms of C++. To illustrate how these ideas materialize, consider the following example of converting the class hierarchies. Converting the class hierarchies Designing class hierarchies differs drastically in C++ and C#. This is because multiple class inheritance is not supported in C#, only public inheritance is. Consider the following hierarchy available in C++: class Base1{ // pure abstract base class } class Base2{ // abstract base class } class Base3{ // concrete class
} class Derived: public Base1, protected Base2, private Base 3{ }
Base1 can be represented as an interface as a C++ pure abstract base, which is equivalent to an interface in C#. The Base2 can be an abstract class in C#. The problem arises here because multiple class inheritance is involved, as there can be only one base class in C#. If possible, try to convert Base2 into an interface. That implementation is available for a few of the methods. In the other cases, those implementations can be provided in the concrete class, thus making Base2 as an interface feasible. The problem arises when there are data members. In that case, having it as an interface is not feasible - moving data members is not advisable. In general, this can be solved by having Base3 inheriting from Base2. Since Base2 is an abstract class, it can better serve as base, rather than Base3 serving as a base class for Base2. The C++ code has private, protected, and public inheritance. How can they be handled in C#? Note that C# supports only public inheritance. So, you are forced to use public inheritance for all the three types of inheritance supported in C++, public, private and protected. Using public inheritance doesn't affect the functionality. The real difference lies in abstraction. In C# solution, all the members are exposed and the hierarchy looks like this: interface IBase1{ } // the naming convention in C# suggests interfaces to use I prefix before name abstract class Base2{ } class Base3 : Base2 { } class Derived: IBase1, Base3{ }
Having the exact C++ hierarchy in C# is not possible. However, this can be achieved to some extent by understanding the inheritance model supported in these languages.
Case Study Review Migrating from C++ to C# is not easy as it may seem. C# is strongly based on C+ +, but the two languages differ in their design. The syntactic similarities between the two languages can be misleading, as there are many semantic and pragmatic
differences. There are many places where the C++ programmer will truly get lost when he starts programming in C#. A C++ programmer needs to have a good understanding of the migration process and should be clear in his/her approach to get best results from such a transition. The two languages differ in many fundamental ways: design approach, memory management, problem solving approach, and the underlying translation technology are just a few differences. To get the best results, it is essential that the programmer has an overall view of such issues. The second section of the case study is not just looking at the differences in features. Rather, it's a discussion of how the transition can be done from C++ to C# by analyzing its features. Naturally, a clear picture emerges of what to expect and what not to expect in such a transition. When there is a necessity to convert the existing code from C++ to C#, a set of decisions needs to be made. If the code is available as COM components, it can be used directly instead of manually converting the code. If the code is a library/ framework available as DLLs, then no conversion needs to be done and it can be used directly in C#. Managed extensions to C++ can be used for minimal changes in the code and the application becomes available in the .NET environment. A decision needs to be made if it is necessary to rewrite the whole code in C#. In that case, line-by-line conversion of code is not feasible and such transition will need significant effort on the programmers part. It will also necessitate a change in design approach and new strategies. All rights reserved. Copyright Jan 2004.