diff options
author | aiju <aiju@phicode.de> | 2011-07-18 11:01:22 +0200 |
---|---|---|
committer | aiju <aiju@phicode.de> | 2011-07-18 11:01:22 +0200 |
commit | 8c4c1f39f4e369d7c590c9d119f1150a2215e56d (patch) | |
tree | cd430740860183fc01de1bc1ddb216ceff1f7173 /sys/doc/compiler.ms | |
parent | 11bf57fb2ceb999e314cfbe27a4e123bf846d2c8 (diff) |
added /sys/doc
Diffstat (limited to 'sys/doc/compiler.ms')
-rw-r--r-- | sys/doc/compiler.ms | 1157 |
1 files changed, 1157 insertions, 0 deletions
diff --git a/sys/doc/compiler.ms b/sys/doc/compiler.ms new file mode 100644 index 000000000..205de2d71 --- /dev/null +++ b/sys/doc/compiler.ms @@ -0,0 +1,1157 @@ +.HTML "Plan 9 C Compilers +.TL +Plan 9 C Compilers +.AU +Ken Thompson +ken@plan9.bell-labs.com +.AB +.FS +Originally appeared, in a different form, in +.I +Proceedings of the Summer 1990 UKUUG Conference, +.R +pp. 41-51, +London, 1990. +.FE +This paper describes the overall structure and function of the Plan 9 C compilers. +A more detailed implementation document +for any one of the compilers +is yet to be written. +.AE +.NH +Introduction +.LP +There are many compilers in the series. +Six of the compilers (MIPS 3000, SPARC, Intel 386, Power PC, DEC Alpha, and Motorola 68020) +are considered active and are used to compile +current versions of Plan 9. +Several others (Motorola 68000, Intel 960, ARM 7500, AMD 29000) have had only limited use, such as +to program peripherals or experimental devices. +.NH +Structure +.LP +The compiler is a single program that produces an +object file. +Combined in the compiler are the traditional +roles of preprocessor, lexical analyzer, parser, code generator, +local optimizer, +and first half of the assembler. +The object files are binary forms of assembly +language, +similar to what might be passed between +the first and second passes of an assembler. +.LP +Object files and libraries +are combined by a loader +program to produce the executable binary. +The loader combines the roles of second half +of the assembler, global optimizer, and loader. +The names of the compliers, loaders, and assemblers +are as follows: +.DS +.ta 1.5i +.de Ta +\\$1 \f(CW\\$2\fP \f(CW\\$3\fP \f(CW\\$4\fP +.. +.Ta SPARC kc kl ka +.Ta Power\ PC qc ql qa +.Ta MIPS vc vl va +.Ta Motorola\ 68000 1c 1l 1a +.Ta Motorola\ 68020 2c 2l 2a +.Ta ARM\ 7500 5c 5l 5a +.Ta Intel\ 960 6c 6l 6a +.Ta DEC\ Alpha 7c 7l 7a +.Ta Intel\ 386 8c 8l 8a +.Ta AMD\ 29000 9c 9l 9a +.DE +There is a further breakdown +in the source of the compilers into +object-independent and +object-dependent +parts. +All of the object-independent parts +are combined into source files in the +directory +.CW /sys/src/cmd/cc . +The object-dependent parts are collected +in a separate directory for each compiler, +for example +.CW /sys/src/cmd/vc . +All of the code, +both object-independent and +object-dependent, +is machine-independent +and may be cross-compiled and executed on any +of the architectures. +.NH +The Language +.LP +The compiler implements ANSI C with some +restrictions and extensions +[ANSI90]. +Most of the restrictions are due to +personal preference, while +most of the extensions were to help in +the implementation of Plan 9. +There are other departures from the standard, +particularly in the libraries, +that are beyond the scope of this +paper. +.NH 2 +Register, volatile, const +.LP +The keyword +.CW register +is recognized syntactically +but is semantically ignored. +Thus taking the address of a +.CW register +variable is not diagnosed. +The keyword +.CW volatile +disables all optimizations, in particular registerization, of the corresponding variable. +The keyword +.CW const +generates warnings (if warnings are enabled by the compiler's +.CW -w +option) of non-constant use of the variable, +but does not affect the generated code. +.NH 2 +The preprocessor +.LP +The C preprocessor is probably the +biggest departure from the ANSI standard. +.LP +The preprocessor built into the Plan 9 compilers does not support +.CW #if , +although it does handle +.CW #ifdef +and +.CW #include . +If it is necessary to be more standard, +the source text can first be run through the separate ANSI C +preprocessor, +.CW cpp . +.NH 2 +Unnamed substructures +.LP +The most important and most heavily used of the +extensions is the declaration of an +unnamed substructure or subunion. +For example: +.DS +.CW +.ta .1i .6i 1.1i 1.6i + typedef + struct lock + { + int locked; + } Lock; + + typedef + struct node + { + int type; + union + { + double dval; + float fval; + long lval; + }; + Lock; + } Node; + + Lock* lock; + Node* node; +.DE +The declaration of +.CW Node +has an unnamed substructure of type +.CW Lock +and an unnamed subunion. +One use of this feature allows references to elements of the +subunit to be accessed as if they were in +the outer structure. +Thus +.CW node->dval +and +.CW node->locked +are legitimate references. +.LP +When an outer structure is used +in a context that is only legal for +an unnamed substructure, +the compiler promotes the reference to the +unnamed substructure. +This is true for references to structures and +to references to pointers to structures. +This happens in assignment statements and +in argument passing where prototypes have been +declared. +Thus, continuing with the example, +.DS +.CW +.ta .1i .6i 1.1i 1.6i + lock = node; +.DE +would assign a pointer to the unnamed +.CW Lock +in +the +.CW Node +to the variable +.CW lock . +Another example, +.DS +.CW +.ta .1i .6i 1.1i 1.6i + extern void lock(Lock*); + func(...) + { + ... + lock(node); + ... + } +.DE +will pass a pointer to the +.CW Lock +substructure. +.LP +Finally, in places where context is insufficient to identify the unnamed structure, +the type name (it must be a +.CW typedef ) +of the unnamed structure can be used as an identifier. +In our example, +.CW &node->Lock +gives the address of the anonymous +.CW Lock +structure. +.NH 2 +Structure displays +.LP +A structure cast followed by a list of expressions in braces is +an expression with the type of the structure and elements assigned from +the corresponding list. +Structures are now almost first-class citizens of the language. +It is common to see code like this: +.DS +.CW +.ta .1i + r = (Rectangle){point1, (Point){x,y+2}}; +.DE +.NH 2 +Initialization indexes +.LP +In initializers of arrays, +one may place a constant expression +in square brackets before an initializer. +This causes the next initializer to assign +the indicated element. +For example: +.DS +.CW +.ta .1i .6i 1.6i + enum errors + { + Etoobig, + Ealarm, + Egreg + }; + char* errstrings[] = + { + [Ealarm] "Alarm call", + [Egreg] "Panic: out of mbufs", + [Etoobig] "Arg list too long", + }; +.DE +In the same way, +individual structures members may +be initialized in any order by preceding the initialization with +.CW .tagname . +Both forms allow an optional +.CW = , +to be compatible with a proposed +extension to ANSI C. +.NH 2 +External register +.LP +The declaration +.CW extern +.CW register +will dedicate a register to +a variable on a global basis. +It can be used only under special circumstances. +External register variables must be identically +declared in all modules and +libraries. +The feature is not intended for efficiency, +although it can produce efficient code; +rather it represents a unique storage class that +would be hard to get any other way. +On a shared-memory multi-processor, +an external register is +one-per-processor and neither one-per-procedure (automatic) +or one-per-system (external). +It is used for two variables in the Plan 9 kernel, +.CW u +and +.CW m . +.CW U +is a pointer to the structure representing the currently running process +and +.CW m +is a pointer to the per-machine data structure. +.NH 2 +Long long +.LP +The compilers accept +.CW long +.CW long +as a basic type meaning 64-bit integer. +On all of the machines +this type is synthesized from 32-bit instructions. +.NH 2 +Pragma +.LP +The compilers accept +.CW #pragma +.CW lib +.I libname +and pass the +library name string uninterpreted +to the loader. +The loader uses the library name to +find libraries to load. +If the name contains +.CW %O , +it is replaced with +the single character object type of the compiler +(e.g., +.CW v +for the MIPS). +If the name contains +.CW %M , +it is replaced with +the architecture type for the compiler +(e.g., +.CW mips +for the MIPS). +If the name starts with +.CW / +it is an absolute pathname; +if it starts with +.CW . +then it is searched for in the loader's current directory. +Otherwise, the name is searched from +.CW /%M/lib . +Such +.CW #pragma +statements in header files guarantee that the correct +libraries are always linked with a program without the +need to specify them explicitly at link time. +.LP +They also accept +.CW #pragma +.CW hjdicks +.CW on +(or +.CW yes +or +.CW 1 ) +to cause subsequently declared data, until +.CW #pragma +.CW hjdicks +.CW off +(or +.CW no +or +.CW 0 ), +to be laid out in memory tightly packed in successive bytes, disregarding +the usual alignment rules. +Accessing such data can cause faults. +.LP +Similarly, +.CW #pragma +.CW profile +.CW off +(or +.CW no +or +.CW 0 ) +causes subsequently declared functions, until +.CW #pragma +.CW profile +.CW on +(or +.CW yes +or +.CW 1 ), +to be marked as unprofiled. +Such functions will not be profiled when +profiling is enabled for the rest of the program. +.LP +Two +.CW #pragma +statements allow type-checking of +.CW print -like +functions. +The first, of the form +.P1 +#pragma varargck argpos error 2 +.P2 +tells the compiler that the second argument to +.CW error +is a +.CW print +format string (see the manual page +.I print (2)) +that specifies how to format +.CW error 's +subsequent arguments. +The second, of the form +.P1 +#pragma varargck type "s" char* +.P2 +says that the +.CW print +format verb +.CW s +processes an argument of +type +.CW char* . +If the compiler's +.CW -F +option is enabled, the compiler will use this information +to report type violations in the arguments to +.CW print , +.CW error , +and similar routines. +.NH +Object module conventions +.LP +The overall conventions of the runtime environment +are important +to runtime efficiency. +In this section, +several of these conventions are discussed. +.NH 2 +Register saving +.LP +In the Plan 9 compilers, +the caller of a procedure saves the registers. +With caller-saves, +the leaf procedures can use all the +registers and never save them. +If you spend a lot of time at the leaves, +this seems preferable. +With callee-saves, +the saving of the registers is done +in the single point of entry and return. +If you are interested in space, +this seems preferable. +In both, +there is a degree of uncertainty +about what registers need to be saved. +Callee-saved registers make it difficult to +find variables in registers in debuggers. +Callee-saved registers also complicate +the implementation of +.CW longjmp . +The convincing argument is +that with caller-saves, +the decision to registerize a variable +can include the cost of saving the register +across calls. +For a further discussion of caller- vs. callee-saves, +see the paper by Davidson and Whalley [Dav91]. +.LP +In the Plan 9 operating system, +calls to the kernel look like normal procedure +calls, which means +the caller +has saved the registers and the system +entry does not have to. +This makes system calls considerably faster. +Since this is a potential security hole, +and can lead to non-determinism, +the system may eventually save the registers +on entry, +or more likely clear the registers on return. +.NH 2 +Calling convention +.LP +Older C compilers maintain a frame pointer, which is at a known constant +offset from the stack pointer within each function. +For machines where the stack grows towards zero, +the argument pointer is at a known constant offset +from the frame pointer. +Since the stack grows down in Plan 9, +the Plan 9 compilers +keep neither an +explicit frame pointer nor +an explicit argument pointer; +instead they generate addresses relative to the stack pointer. +.LP +On some architectures, the first argument to a subroutine is passed in a register. +.NH 2 +Functions returning structures +.LP +Structures longer than one word are awkward to implement +since they do not fit in registers and must +be passed around in memory. +Functions that return structures +are particularly clumsy. +The Plan 9 compilers pass the return address of +a structure as the first argument of a +function that has a structure return value. +Thus +.DS +.CW +.ta .1i .6i 1.1i 1.6i + x = f(...) +.DE +is rewritten as +.DS +.CW +.ta .1i .6i 1.1i 1.6i + f(&x, ...)\f1. +.DE +This saves a copy and makes the compilation +much less clumsy. +A disadvantage is that if you call this +function without an assignment, +a dummy location must be invented. +.LP +There is also a danger of calling a function +that returns a structure without declaring +it as such. +With ANSI C function prototypes, +this error need never occur. +.NH +Implementation +.LP +The compiler is divided internally into +four machine-independent passes, +four machine-dependent passes, +and an output pass. +The next nine sections describe each pass in order. +.NH 2 +Parsing +.LP +The first pass is a YACC-based parser +[Joh79]. +Declarations are interpreted immediately, +building a block structured symbol table. +Executable statements are put into a parse tree +and collected, +without interpretation. +At the end of each procedure, +the parse tree for the function is +examined by the other passes of the compiler. +.LP +The input stream of the parser is +a pushdown list of input activations. +The preprocessor +expansions of +macros +and +.CW #include +are implemented as pushdowns. +Thus there is no separate +pass for preprocessing. +.NH 2 +Typing +.LP +The next pass distributes typing information +to every node of the tree. +Implicit operations on the tree are added, +such as type promotions and taking the +address of arrays and functions. +.NH 2 +Machine-independent optimization +.LP +The next pass performs optimizations +and transformations of the tree, such as converting +.CW &*x +and +.CW *&x +into +.CW x . +Constant expressions are converted to constants in this pass. +.NH 2 +Arithmetic rewrites +.LP +This is another machine-independent optimization. +Subtrees of add, subtract, and multiply of integers are +rewritten for easier compilation. +The major transformation is factoring: +.CW 4+8*a+16*b+5 +is transformed into +.CW 9+8*(a+2*b) . +Such expressions arise from address +manipulation and array indexing. +.NH 2 +Addressability +.LP +This is the first of the machine-dependent passes. +The addressability of a processor is defined as the set of +expressions that is legal in the address field +of a machine language instruction. +The addressability of different processors varies widely. +At one end of the spectrum are the 68020 and VAX, +which allow a complex mix of incrementing, +decrementing, +indexing, and relative addressing. +At the other end is the MIPS, +which allows only registers and constant offsets from the +contents of a register. +The addressability can be different for different instructions +within the same processor. +.LP +It is important to the code generator to know when a +subtree represents an address of a particular type. +This is done with a bottom-up walk of the tree. +In this pass, the leaves are labeled with small integers. +When an internal node is encountered, +it is labeled by consulting a table indexed by the +labels on the left and right subtrees. +For example, +on the 68020 processor, +it is possible to address an +offset from a named location. +In C, this is represented by the expression +.CW *(&name+constant) . +This is marked addressable by the following table. +In the table, +a node represented by the left column is marked +with a small integer from the right column. +Marks of the form +.CW A\s-2\di\u\s0 +are addressable while +marks of the form +.CW N\s-2\di\u\s0 +are not addressable. +.DS +.B +.ta .1i 1.1i + Node Marked +.CW + name A\s-2\d1\u\s0 + const A\s-2\d2\u\s0 + &A\s-2\d1\u\s0 A\s-2\d3\u\s0 + A\s-2\d3\u\s0+A\s-2\d1\u\s0 N\s-2\d1\u\s0 \fR(note that this is not addressable)\fP + *N\s-2\d1\u\s0 A\s-2\d4\u\s0 +.DE +Here there is a distinction between +a node marked +.CW A\s-2\d1\u\s0 +and a node marked +.CW A\s-2\d4\u\s0 +because the address operator of an +.CW A\s-2\d4\u\s0 +node is not addressable. +So to extend the table: +.DS +.B +.ta .1i 1.1i + Node Marked +.CW + &A\s-2\d4\u\s0 N\s-2\d2\u\s0 + N\s-2\d2\u\s0+N\s-2\d1\u\s0 N\s-2\d1\u\s0 +.DE +The full addressability of the 68020 is expressed +in 18 rules like this, +while the addressability of the MIPS is expressed +in 11 rules. +When one ports the compiler, +this table is usually initialized +so that leaves are labeled as addressable and nothing else. +The code produced is poor, +but porting is easy. +The table can be extended later. +.LP +This pass also rewrites some complex operators +into procedure calls. +Examples include 64-bit multiply and divide. +.LP +In the same bottom-up pass of the tree, +the nodes are labeled with a Sethi-Ullman complexity +[Set70]. +This number is roughly the number of registers required +to compile the tree on an ideal machine. +An addressable node is marked 0. +A function call is marked infinite. +A unary operator is marked as the +maximum of 1 and the mark of its subtree. +A binary operator with equal marks on its subtrees is +marked with a subtree mark plus 1. +A binary operator with unequal marks on its subtrees is +marked with the maximum mark of its subtrees. +The actual values of the marks are not too important, +but the relative values are. +The goal is to compile the harder +(larger mark) +subtree first. +.NH 2 +Code generation +.LP +Code is generated by recursive +descent. +The Sethi-Ullman complexity completely guides the +order. +The addressability defines the leaves. +The only difficult part is compiling a tree +that has two infinite (function call) +subtrees. +In this case, +one subtree is compiled into the return register +(usually the most convenient place for a function call) +and then stored on the stack. +The other subtree is compiled into the return register +and then the operation is compiled with +operands from the stack and the return register. +.LP +There is a separate boolean code generator that compiles +conditional expressions. +This is fundamentally different from compiling an arithmetic expression. +The result of the boolean code generator is the +position of the program counter and not an expression. +The boolean code generator makes extensive use of De Morgan's rule. +The boolean code generator is an expanded version of that described +in chapter 8 of Aho, Sethi, and Ullman +[Aho87]. +.LP +There is a considerable amount of talk in the literature +about automating this part of a compiler with a machine +description. +Since this code generator is so small +(less than 500 lines of C) +and easy, +it hardly seems worth the effort. +.NH 2 +Registerization +.LP +Up to now, +the compiler has operated on syntax trees +that are roughly equivalent to the original source language. +The previous pass has produced machine language in an internal +format. +The next two passes operate on the internal machine language +structures. +The purpose of the next pass is to reintroduce +registers for heavily used variables. +.LP +All of the variables that can be +potentially registerized within a procedure are +placed in a table. +(Suitable variables are any automatic or external +scalars that do not have their addresses extracted. +Some constants that are hard to reference are also +considered for registerization.) +Four separate data flow equations are evaluated +over the procedure on all of these variables. +Two of the equations are the normal set-behind +and used-ahead +bits that define the life of a variable. +The two new bits tell if a variable life +crosses a function call ahead or behind. +By examining a variable over its lifetime, +it is possible to get a cost +for registerizing. +Loops are detected and the costs are multiplied +by three for every level of loop nesting. +Costs are sorted and the variables +are replaced by available registers on a greedy basis. +.LP +The 68020 has two different +types of registers. +For the 68020, +two different costs are calculated for +each variable life and the register type that +affords the better cost is used. +Ties are broken by counting the number of available +registers of each type. +.LP +Note that externals are registerized together with automatics. +This is done by evaluating the semantics of a ``call'' instruction +differently for externals and automatics. +Since a call goes outside the local procedure, +it is assumed that a call references all externals. +Similarly, +externals are assumed to be set before an ``entry'' instruction +and assumed to be referenced after a ``return'' instruction. +This makes sure that externals are in memory across calls. +.LP +The overall results are satisfactory. +It would be nice to be able to do this processing in +a machine-independent way, +but it is impossible to get all of the costs and +side effects of different choices by examining the parse tree. +.LP +Most of the code in the registerization pass is machine-independent. +The major machine-dependency is in +examining a machine instruction to ask if it sets or references +a variable. +.NH 2 +Machine code optimization +.LP +The next pass walks the machine code +for opportunistic optimizations. +For the most part, +this is highly specific to a particular +processor. +One optimization that is performed +on all of the processors is the +removal of unnecessary ``move'' +instructions. +Ironically, +most of these instructions were inserted by +the previous pass. +There are two patterns that are repetitively +matched and replaced until no more matches are +found. +The first tries to remove ``move'' instructions +by relabeling variables. +.LP +When a ``move'' instruction is encountered, +if the destination variable is set before the +source variable is referenced, +then all of the references to the destination +variable can be renamed to the source and the ``move'' +can be deleted. +This transformation uses the reverse data flow +set up in the previous pass. +.LP +An example of this pattern is depicted in the following +table. +The pattern is in the left column and the +replacement action is in the right column. +.DS +.CW +.ta .1i .6i 1.6i 2.1i 2.6i + MOVE a->b \fR(remove)\fP +.R + (sequence with no mention of \f(CWa\fP) +.CW + USE b USE a +.R + (sequence with no mention of \f(CWa\fP) +.CW + SET b SET b +.DE +.LP +Experiments have shown that it is marginally +worthwhile to rename uses of the destination variable +with uses of the source variable up to +the first use of the source variable. +.LP +The second transform will do relabeling +without deleting instructions. +When a ``move'' instruction is encountered, +if the source variable has been set prior +to the use of the destination variable +then all of the references to the source +variable are replaced by the destination and +the ``move'' is inverted. +Typically, +this transformation will alter two ``move'' +instructions and allow the first transformation +another chance to remove code. +This transformation uses the forward data flow +set up in the previous pass. +.LP +Again, +the following is a depiction of the transformation where +the pattern is in the left column and the +rewrite is in the right column. +.DS +.CW +.ta .1i .6i 1.6i 2.1i 2.6i + SET a SET b +.R + (sequence with no use of \f(CWb\fP) +.CW + USE a USE b +.R + (sequence with no use of \f(CWb\fP) +.CW + MOVE a->b MOVE b->a +.DE +Iterating these transformations +will usually get rid of all redundant ``move'' instructions. +.LP +A problem with this organization is that the costs +of registerization calculated in the previous pass +must depend on how well this pass can detect and remove +redundant instructions. +Often, +a fine candidate for registerization is rejected +because of the cost of instructions that are later +removed. +.NH 2 +Writing the object file +.LP +The last pass walks the internal assembly language +and writes the object file. +The object file is reduced in size by about a factor +of three with simple compression +techniques. +The most important aspect of the object file +format is that it is independent of the compiling machine. +All integer and floating numbers in the object +code are converted to known formats and byte +orders. +.NH +The loader +.LP +The loader is a multiple pass program that +reads object files and libraries and produces +an executable binary. +The loader also does some minimal +optimizations and code rewriting. +Many of the operations performed by the +loader are machine-dependent. +.LP +The first pass of the loader reads the +object modules into an internal data +structure that looks like binary assembly language. +As the instructions are read, +code is reordered to remove +unconditional branch instructions. +Conditional branch instructions are inverted +to prevent the insertion of unconditional branches. +The loader will also make a copy of a few instructions +to remove an unconditional branch. +.LP +The next pass allocates addresses for +all external data. +Typical of processors is the MIPS, +which can reference ±32K bytes from a +register. +The loader allocates the register +.CW R30 +as the static pointer. +The value placed in +.CW R30 +is the base of the data segment plus 32K. +It is then cheap to reference all data in the +first 64K of the data segment. +External variables are allocated to +the data segment +with the smallest variables allocated first. +If all of the data cannot fit into the first +64K of the data segment, +then usually only a few large arrays +need more expensive addressing modes. +.LP +For the MIPS processor, +the loader makes a pass over the internal +structures, +exchanging instructions to try +to fill ``delay slots'' with useful work. +If a useful instruction cannot be found +to fill a delay slot, +the loader will insert +``noop'' +instructions. +This pass is very expensive and does not +do a good job. +About 40% of all instructions are in +delay slots. +About 65% of these are useful instructions and +35% are ``noops.'' +The vendor-supplied assembler does this job +more effectively, +filling about 80% +of the delay slots with useful instructions. +.LP +On the 68020 processor, +branch instructions come in a variety of +sizes depending on the relative distance +of the branch. +Thus the size of branch instructions +can be mutually dependent. +The loader uses a multiple pass algorithm +to resolve the branch lengths +[Szy78]. +Initially, all branches are assumed minimal length. +On each subsequent pass, +the branches are reassessed +and expanded if necessary. +When no more expansions occur, +the locations of the instructions in +the text segment are known. +.LP +On the MIPS processor, +all instructions are one size. +A single pass over the instructions will +determine the locations of all addresses +in the text segment. +.LP +The last pass of the loader produces the +executable binary. +A symbol table and other tables are +produced to help the debugger to +interpret the binary symbolically. +.LP +The loader places absolute source line numbers in the symbol table. +The name and absolute line number of all +.CW #include +files is also placed in the +symbol table so that the debuggers can +associate object code to source files. +.NH +Performance +.LP +The following is a table of the source size of the MIPS +compiler. +.DS +.ta .1i .6i + lines module + \0509 machine-independent headers + 1070 machine-independent YACC source + 6090 machine-independent C source + + \0545 machine-dependent headers + 6532 machine-dependent C source + + \0298 loader headers + 5215 loader C source +.DE +.LP +The following table shows timing +of a test program +that plays checkers, running on a MIPS R4000. +The test program is 26 files totaling 12600 lines of C. +The execution time does not significantly +depend on library implementation. +Since no other compiler runs on Plan 9, +the Plan 9 tests were done with the Plan 9 operating system; +the other tests were done on the vendor's operating system. +The hardware was identical in both cases. +The optimizer in the vendor's compiler +is reputed to be extremely good. +.DS +.ta .1i .9i + \0\04.49s Plan 9 \f(CWvc\fP \f(CW-N\fP compile time (opposite of \f(CW-O\fP) + \0\01.72s Plan 9 \f(CWvc\fP \f(CW-N\fP load time + 148.69s Plan 9 \f(CWvc\fP \f(CW-N\fP run time + + \015.07s Plan 9 \f(CWvc\fP compile time (\f(CW-O\fP implicit) + \0\01.66s Plan 9 \f(CWvc\fP load time + \089.96s Plan 9 \f(CWvc\fP run time + + \014.83s vendor \f(CWcc\fP compile time + \0\00.38s vendor \f(CWcc\fP load time + 104.75s vendor \f(CWcc\fP run time + + \043.59s vendor \f(CWcc\fP \f(CW-O\fP compile time + \0\00.38s vendor \f(CWcc\fP \f(CW-O\fP load time + \076.19s vendor \f(CWcc\fP \f(CW-O\fP run time + + \0\08.19s vendor \f(CWcc\fP \f(CW-O3\fP compile time + \035.97s vendor \f(CWcc\fP \f(CW-O3\fP load time + \071.16s vendor \f(CWcc\fP \f(CW-O3\fP run time +.DE +.LP +To compare the Intel compiler, +a program that is about 40% bit manipulation and +about 60% single precision floating point was +run on the same 33 MHz 486, once under Windows +compiled with the Watcom compiler, version 10.0, +in 16-bit mode and once under +Plan 9 in 32-bit mode. +The Plan 9 execution time was 27 sec while the Windows +execution time was 31 sec. +.NH +Conclusions +.LP +The new compilers compile +quickly, +load slowly, +and produce +medium quality +object code. +The compilers are relatively +portable, +requiring but a couple of weeks' work to +produce a compiler for a different computer. +For Plan 9, +where we needed several compilers +with specialized features and +our own object formats, +this project was indispensable. +It is also necessary for us to +be able to freely distribute our compilers +with the Plan 9 distribution. +.LP +Two problems have come up in retrospect. +The first has to do with the +division of labor between compiler and loader. +Plan 9 runs on multi-processors and as such +compilations are often done in parallel. +Unfortunately, +all compilations must be complete before loading +can begin. +The load is then single-threaded. +With this model, +any shift of work from compile to load +results in a significant increase in real time. +The same is true of libraries that are compiled +infrequently and loaded often. +In the future, +we may try to put some of the loader work +back into the compiler. +.LP +The second problem comes from +the various optimizations performed over several +passes. +Often optimizations in different passes depend +on each other. +Iterating the passes could compromise efficiency, +or even loop. +We see no real solution to this problem. +.NH +References +.LP +[Aho87] A. V. Aho, R. Sethi, and J. D. Ullman, +.I +Compilers \- Principles, Techniques, and Tools, +.R +Addison Wesley, +Reading, MA, +1987. +.LP +[ANSI90] \f2American National Standard for Information Systems \- +Programming Language C\f1, American National Standards Institute, Inc., +New York, 1990. +.LP +[Dav91] J. W. Davidson and D. B. Whalley, +``Methods for Saving and Restoring Register Values across Function Calls'', +.I +Software\-Practice and Experience, +.R +Vol 21(2), pp. 149-165, February 1991. +.LP +[Joh79] S. C. Johnson, +``YACC \- Yet Another Compiler Compiler'', +.I +UNIX Programmer's Manual, Seventh Ed., Vol. 2A, +.R +AT&T Bell Laboratories, +Murray Hill, NJ, +1979. +.LP +[Set70] R. Sethi and J. D. Ullman, +``The Generation of Optimal Code for Arithmetic Expressions'', +.I +Journal of the ACM, +.R +Vol 17(4), pp. 715-728, 1970. +.LP +[Szy78] T. G. Szymanski, +``Assembling Code for Machines with Span-dependent Instructions'', +.I +Communications of the ACM, +.R +Vol 21(4), pp. 300-308, 1978. |