Figure 1-3 : Loading and initializing the CLR
When the compiler/linker creates an executable assembly, the following 6-byte x86 stub function is emitted into the PE file’s .text section:
Because the _CorExeMain function is imported from Microsoft’s MSCorEE.dll dynamic-link library, MSCorEE.dll is referenced in the assembly file’s import (.idata) section. MSCorEE.dll stands for Microsoft Component Object Runtime Execution Engine. When the managed EXE file is invoked, Windows treats it just like any normal (unmanaged) EXE file: the Windows loader loads the file and examines the .idata section to see that MSCorEE.dll should be loaded into the process’s address space. Then the loader obtains the address of the _CorExeMain function inside MSCorEE.dll and fixes up the stub function’s JMP instruction
in the managed EXE file.
The process’s primary thread begins executing this x86 stub function, which immediately jumps to _CorExeMain in MSCorEE.dll. _CorExeMain initializes the CLR and then looks at the executable assembly’s CLR header to determine what managed entry point method should execute. The IL code for the method is then compiled into native CPU instructions, and the CLR jumps to the native code (using the process’s primary thread). At this point, the managed application’s code is running.
The situation is similar for a managed DLL. When building a managed DLL, the compiler/linker emits a similar 6-byte x86 stub function in the PE file’s .text section for a DLL assembly: JMP _ C o r D l l M a i n
The _CorDllMain function is also imported from the MSCorEE.dll, causing the DLL’s .idata section to reference MSCorEE.dll. When Windows loads the DLL, it will automatically load MSCorEE.dll (if it isn’t already loaded), obtain the address of the _CorDllMain function, and fix up the 6-byte x86 JMP stub in the managed DLL. The thread that called LoadLibrary to load the managed DLL now jumps to the x86 stub in the managed DLL assembly, which immediately jumps to the _CorDllMain function in MSCorEE.dll. _CorDllMain initializes the CLR (if it hasn’t already been initialized for the process) and then returns so that the application can continue executing as normal.
These 6-byte x86 stub functions are required to run managed assemblies on Windows 98, Windows 98 Standard Edition, Windows Me, Windows NT 4, and Windows 2000 because all these operating systems shipped long before the CLR became available. Note that the 6- byte stub function is specifically for x86 machines. This stub doesn’t work properly if the CLR is ported to run on other CPU architectures. Because Windows XP and the Windows .NET Server Family support both the x86 and the IA64 CPU architectures, Windows XP and the Windows .NET Server Family loader was modified to look specifically for anaged
On Windows XP and the Windows .NET Server Family, when a managed assembly is invoked (typically via CreateProcess or LoadLibrary), the OS loader detects that the file contains managed code by examining directory entry 14 in the PE file header. (See IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR in WinNT.h.) If this directory entry exists and is not 0, the loader ignores the file’s import (.idata) section and automatically loads MSCorEE.dll into the process’s address space. Once loaded, the OS loader makes the process’s thread jump directly to the correct function in MSCorEE.dll. The 6-byte x86 stub functions are ignored on machines running Windows XP and the Windows .NET Server Family.
One last note on managed PE files: they always use the 32 bit PE file format, not the 64-bit PE file format. On 64-bit Windows systems, the OS loader detects the managed 32-bit PE file and automatically knows to create a 64-bit address space.
As mentioned earlier, managed modules contain both metadata and intermediate language (IL). IL is a CPU-independent machine language created by Microsoft after consultation with several external commercial and academic language/compiler writers. IL is much higher level than most CPU machine languages. IL understands object types and has instructions that create and initialize objects, call virtual methods on objects, and manipulate array elements directly. It even has instructions that throw and catch exceptions for error handling.
You can think of IL as an object-oriented machine language. Usually, developers will program in a high-level language, such as C# or Visual Basic. The compilers for these high-level languages produce IL. However, like any other machine language, IL can be written in assembly language, and Microsoft does provide an IL Assembler, ILAsm.exe. Microsoft also provides an IL Disassembler, ILDasm.exe.
Some people are concerned that IL doesn’t offer enough intellectual property protection for their algorithms. In other words, they think you could build a managed module and someone else could use a tool, such as IL Disassembler, to easily reverse engineer exactly what your application’s code does.
Yes, it’s true that IL code is higher level than most other assembly languages and that, in general, reverse engineering IL code is relatively simple. However, when implementing an XML Web service or a Web Forms application, your managed module
resides on your server. Because no one outside your company can access the module, no one outside your company can use any tool to see the IL—your intellectual property is completely safe.
iIf you don’t feel that an obfuscator offers the kind of intellectual property protection that you desire, you can consider implementing your more sensitive algorithms in some unmanaged module that will contain native CPU instructions instead of IL and metadata.
Then you can use the CLR’s interoperability features to communicate between the managed and unmanaged portions of your application. Of course, this assumes that you’re not worried about people reverse engineering the native CPU instructions in your unmanaged code.
Keep in mind that any high-level language will most likely expose only a subset of the facilities offered by the CLR. However, using IL assembly language allows a developer access to all the CLR’s facilities. So, should your programming language of choice hide a facility the CLR offers that you really want to take advantage of, you can choose to write that portion of your code in IL assembly or perhaps another programming language that exposes the CLR feature you seek.
The only way for you to know what facilities the CLR offers is to read documentation specific to the CLR itself. In this book, I try to concentrate on CLR features and how they are exposed or not exposed by the C# language. I suspect that most other books and articles will present the CLR via a language perspective and that most developers will come to believe that the CLR offers only what the developer’s chosen language exposes. As long as your language allows you to accomplish what you’re trying to get done, this blurred perspective isn’t a bad thing.
I think that this ability to switch programming languages easily with rich integration between languages is an awesome feature of the CLR. Unfortunately, I also believe that developers will often overlook this feature. Programming languages such as C# and Visual Basic are excellent languages for doing I/O operations. APL is a great language for doing advanced engineering or financial calculations. Through the CLR, you can write the I/O portions of your application using C# and then write the engineering calculations part using APL. The CLR offers a level of integration between these languages that is unprecedented and really makes mixed-language programming worthy of consideration for many development projects.
Another important point to keep in mind about IL is that it isn’t tied to any specific CPU platform. This means that a managed module containing IL can run on any CPU platform as long as the operating system running on that CPU platform hosts a version of the CLR. Although the initial release of the CLR runs only on 32-bit Windows platforms, developing an application using managed IL sets up a developer to be more independent of the underlying CPU architecture.