Phoenix Framework | Introduction
Phoenix has collection of components represents the foundation for hosting compilers and tools. It includes fundamental representations, algorithms, targeting, input and output, debugging, threading, garbage collection, and exception handling. With Phoenix, we can develop new tools and compiler elements without the usual requirement to develop a new infrastructure at the same time. New tools, better-performing phases, and more versatile readers all run on the same platform. Instead of IL, central building block of Phoenix is a multi-level intermediate representation (IR). Phoenix accepts IL input from native C/C++ compilers and .NET, and accepts PE input as well. Phoenix can output files in COFF or PE format. Say, with Phoenix we have managed and unmanaged modes.
Managed Mode
Sources are compiled into .NET assemblies (DLL/EXE), containing MSIL, Metadata and Manifest. When run, the MSIL is converted into native code on demand by JIT and executed within CLR. The managed build of the Phoenix framework saves its results into a few DLL assemblies; the main one is phx.dll. If you are building tools using a 'managed' compiler – one that targets the .NET Framework – your tool uses the Phoenix code from these assemblies.
Unmanaged Mode
The unmanaged build saves its results into a few native libraries; the main one is phx.lib. If you are building tools using an 'unmanaged' compiler – one that can target the Windows OS directly – your tool uses the Phoenix code from these libraries.
Note:Visual C++ allows the option of building in either mode.
Managed or Unmanaged?
Phoenix framework supports both managed and unmanaged code but in the current release, plug-ins must be written in managed code. Managed code provides developer productivity and unmanaged code offers the potential of higher performance at the cost of 'manual' memory management (M2).
Memory Management (M2) is critical. C/C++ geeks will argue that M2 can not be left up to the system (memory manager), and maybe C# geeks will say, it can not be left up to developers!!. C/C++ developers will pay more than 50% of their time thinking about M2. M2 includes maintaining allocated objects are freed and pointer problems (dangling bugs etc).
In managed code, allocation, GC and finalization are managed by CLR. How CLR manages it? Too long to explain here in one post. My advice is get the SSCLI source codes, read the codes and consults to the following books:
- Shared Source CLI Essentials, Uncle Geoff Shilling
- Professional .NET Framework 2.0, Uncle Joe Duffy
- CLR via C#, Uncle Jeffrey Richter
- Customizing the .NET Framework CLR, Uncle Steven Pratschner
- Essential .NET, Uncle Don Box
I have all of those books on my desk (at home). Every time I open one of them, I found new thing that I didn’t realize before.
Dynamic memory within Phoenix managed by a Lifetime object. In managed code, responsibility for allocating and freeing memory is given over the CLR memory manager and GC. In unmanaged builds, Phoenix itself provides the code and data structures to take care of dynamic memory. Memory allocations occur over a period of time, but de-allocations can be made at one time – typically, when the work of that compiler phase is done. Lifetimes provide this feature. A Lifetime operates like a broker for dynamic memory. It asks the operating system for chunks of memory as required, and takes care of automatically expanding the pool of memory under its care. Don’t forget, when you finished with a Lifetime object, release its memory to OS by calling Delete method. Each Unit is allocated a Lifetime, which has a back-pointer to its owning Unit. There are several kinds of Lifetime, specified as a parameter during its creation, as follows: {Func, Alias, IR, Graph, SSA, Tmp, Static, Global, TmpString, Phase, Module, Profile, Sym}.
A Lifetime starts life by requesting a few thousand bytes of memory – typically a page – from the OS. Each call to Allocate simply bumps a ‘next-free’ pointer, by the number of bytes requested. When the page is exhausted, the Lifetime automatically requests more pages from the OS (or from a Phoenix internal free list).
Phoenix From High Level Perspective
What we use so far is front-end (FE) compiler. We work with managed FE (CSC.exe, CL.exe /clr, VBC.exe, etc) and native FE (CL) to build our applications. Phoenix on other hand is back-end (BE) compiler C2.EXE. Output of managed FE compiler is assembly (MSIL + metadata + manifest + resources) and output of native FE compiler is objects (say it Common Intermediate Language, CIL). On other hand, the primary output of the C2 compiler is a Common Object File Format (COFF) object file or say, Native code. Like CL or CSC, C2 has compiler options. We can specify compiler options with command line syntax, response file and CL syntax. You can refer to documentation for further details.
Phoenix adopts intermediate format called IR (Intermediate Representative) which is a "linear", assembly-like language. Of course, it's binary format. Phoenix can import a .NET assembly, analyze it through many pipelines, and produces COFF. Phoenix can be used to compile the IR to native code after passed the pipelines. Can that native code run without a .NET Framework? - No. Because even "pure" native programs require runtime support. For example, if your program writes a file, the OS does all the hard work, on your behalf, of finding free blocks on the disk to store that output, and remembering how to find them when you reopen the file tomorrow. And it's the same with .NET programs. They can use features, provided (at runtime), by the OS. But they also use a bunch of extra features provided by the .NET Framework – for example:
a) M2 (allocation, GC and finalization);
b) CAS
c) Threadpool
d) AppDomains
e) Reflection
and many more that sometime, hidden from you. Arguably, the most fundamental is the first - GC - but that's a different discussion.
Phoenix use NGen tool (also known, confusingly, as "PreJIT") to produce native code. Phoenix can provide the IL-to-Native compile engine, but laying out classes, and persisting runtime data structures (thing of vtables, and such-like), etc is done by Ngen. Phoenix can read a .NET assembly, or a native binary, and convert into Phoenix IR (plus symbol table / type table). We can then analyze it (eg build the flowgraphs), inject code and view the info. Programming in today's OO languages is just too much typing - the compiler should figure out 50% of the nonsense we have to supply as programmers. For example, why do I need to tell the compiler the type of every variable? - the compiler can understand the right answer most of the time. CMIIW.
We can add 'features' using Phoenix, that the front-end language does not provide. Think of things like adding code to gather runtime code-coverage by-function/by-block/by-edge/etc. Or all of the features attacked by "Aspect Oriented Programming". Or obfuscation, static analyzer, instrumentation, dependencies analysis, multi-core or multi-thread add-ons, etc. Phoenix is a framework. We decide what we want to do with it. Its another framework and I love it.
For a real-life example of Phoenix's utility, suppose you want to analyze what fraction of your code users actually execute on a daily basis by building an instrumented version of your program that collects and records such data. Traditionally, you'd need to write a program that reads in a Portable Executable (PE) binary file, figures out what's code and what's data, identifies each function, basic block, and instruction, inserts the instrumentation code, recomputes all the addresses to adjust for the added code, and then writes out a valid PE binary. With Phoenix, by contrast, you just need to write a program that adds the instrumentation at the appropriate places in the binary; all the grunt work of reading and writing the PE file, identifying elements in the binary code, and recomputing addresses is done for you. Due to its extensible nature, Phoenix has been attracting the interest of computer scientists involved in programming language research.