diff options
author | aiju <aiju@phicode.de> | 2011-07-18 11:01:22 +0200 |
---|---|---|
committer | aiju <aiju@phicode.de> | 2011-07-18 11:01:22 +0200 |
commit | 8c4c1f39f4e369d7c590c9d119f1150a2215e56d (patch) | |
tree | cd430740860183fc01de1bc1ddb216ceff1f7173 /sys/doc/acidpaper.ms | |
parent | 11bf57fb2ceb999e314cfbe27a4e123bf846d2c8 (diff) |
added /sys/doc
Diffstat (limited to 'sys/doc/acidpaper.ms')
-rw-r--r-- | sys/doc/acidpaper.ms | 1324 |
1 files changed, 1324 insertions, 0 deletions
diff --git a/sys/doc/acidpaper.ms b/sys/doc/acidpaper.ms new file mode 100644 index 000000000..3203b87a6 --- /dev/null +++ b/sys/doc/acidpaper.ms @@ -0,0 +1,1324 @@ +.HTML "Acid: A Debugger Built From A Language +.TL +Acid: A Debugger Built From A Language +.AU +Phil Winterbottom +philw@plan9.bell-labs.com +.AB +.FS +Originally appeared in +.I +Proc. of the Winter 1994 USENIX Conf., +.R +pp. 211-222, +San Francisco, CA +.FE +Acid is an unusual source-level symbolic debugger for Plan 9. It is implemented +as a language interpreter with specialized primitives that provide +debugger support. Programs written in the language manipulate +one or more target processes; variables in the language represent the +symbols, state, and resources of those processes. +This structure allows complex +interaction between the debugger and the target program and +provides a convenient method of parameterizing differences between +machine architectures. +Although some effort is required to learn +the debugging language, the richness and flexibility of the +debugging environment encourages new ways of reasoning about the way +programs run and the conditions under which they fail. +.AE +.NH +Introduction +.PP +The size and complexity +of programs have increased in proportion to processor speed and memory but +the interface between debugger and programmer has changed little. +Graphical user interfaces have eased some of the tedious +aspects of the interaction. A graphical interface is a convenient +means for navigating through source and data structures but provides +little benefit for process control. +The introduction of a new concurrent language, Alef [Win93], emphasized the +inadequacies of the existing Plan 9 [Pike90] debugger +.I db , +a distant relative of +.I adb , +and made it clear that a new debugger was required. +.PP +Current debuggers like +.I dbx , +.I sdb , +and +.I gdb +are limited to answering only the questions their authors +envisage. As a result, they supply a plethora +of specialized commands, each attempting to anticipate +a specific question a user may ask. +When a debugging situation arises that is beyond the scope +of the command set, the tool is useless. +Further, +it is often tedious or impossible to reproduce an anomalous state +of the program, especially when +the state is embedded in the program's data structures. +.PP +Acid applies some ideas found in CAD software used for +hardware test and simulation. +It is based on the notion that the state and resources of a program +are best represented and manipulated by a language. The state and resources, +such as memory, registers, variables, type information and source code +are represented by variables in the language. +Expressions provide a computation mechanism and control +statements allow repetitive or selective interpretation based +on the result of expression evaluation. +The heart of the Acid debugger is an interpreter for a small typeless +language whose operators mirror the operations +of C and Alef, which in turn correspond well to the basic operations of +the machine. The interpreter itself knows nothing of the underlying +hardware; it deals with the program state and resources +in the abstract. +Fundamental routines to control +processes, read files, and interface to the system are implemented +as builtin functions available to the interpreter. +The actual debugger functionality is coded +in Acid; commands are implemented as Acid functions. +.PP +This language-based approach has several advantages. +Most importantly, programs written in Acid, including most of the +debugger itself, are inherently portable. +Furthermore, Acid avoids the limitations other debuggers impose when +debugging parallel programs. Instead of embedding a fixed +process model in the debugger, Acid allows the +programmer to adapt the debugger to handle an +arbitrary process partitioning or program structure. +The ability to +interact dynamically with an executing process provides clear advantages +over debuggers constrained to probe a static image. +Finally, the Acid language is a powerful vehicle for expressing +assertions about logic, process state, and the contents of data structures. +When combined with dynamic interaction it allows a +limited form of automated program verification without requiring +modification or recompilation of the source code. +The language is also an +excellent vehicle for preserving a test suite for later regression testing. +.PP +The debugger may be customized by its users; standard +functions may be modified or extended to suit a particular application +or preference. +For example, the kernel developers in our group require a +command set supporting assembler-level debugging while the application +programmers prefer source-level functionality. +Although the default library is biased toward assembler-level debugging, +it is easily modified to provide a convenient source-level interface. +The debugger itself does not change; the user combines primitives +and existing Acid functions in different ways to +implement the desired interface. +.NH +Related Work +.PP +DUEL [Gol93], an extension to +.I gdb +[Stal91], proposes using a high level expression evaluator to solve +some of these problems. The evaluator provides iterators to loop over data +structures and conditionals to control evaluation of expressions. +The author shows that complex state queries can be formulated +by combining concise expressions but this only addresses part of the problem. +A program is a dynamic entity; questions asked when the program is in +a static state are meaningful only after the program has been `caught' in +that state. The framework for manipulating the program is still as +primitive as the underlying debugger. While DUEL provides a means to +probe data structures it entirely neglects the most beneficial aspect +of debugging languages: the ability to control processes. Acid is structured +around a thread of control that passes between the interpreter and the +target program. +.PP +The NeD debugger [May92] is a set of extensions to TCL [Ous90] that provide +debugging primitives. The resulting language, NeDtcl, is used to implement +a portable interface between a conventional debugger, pdb [May90], and +a server that executes NeDtcl programs operating on the target program. +Execution of the NeDtcl programs implements the debugging primitives +that pdb expects. +NeD is targeted at multi-process debugging across a network, +and proves the flexibility of a language as a means of +communication between debugging tools. Whereas NeD provides an interface +between a conventional debugger and the process it debugs, Acid is the +debugger itself. While NeD has some of the ideas +found in Acid it is targeted toward a different purpose. Acid seeks to +integrate the manipulation of a program's resources into the debugger +while NeD provides a flexible interconnect between components of +the debugging environment. The choice of TCL is appropriate for its use +in NeD but is not suitable for Acid. Acid relies on the coupling of the type +system with expression evaluation, which are the root of its design, +to provide the debugging primitives. +.PP +Dalek [Ols90] is an event based language extension to gdb. State transitions +in the target program cause events to be queued for processing by the +debugging language. +.PP +Acid has many of the advantages of same process or +.I local +.I agent +debuggers, like Parasight [Aral], without the need for dynamic linking or +shared memory. +Acid improves on the ideas of these other systems by completely integrating +all aspects of the debugging process into the language environment. Of +particular importance is the relationship between Acid variables, +program symbols, source code, registers and type information. This +integration is made possible by the design of the Acid language. +.PP +Interpreted languages such as Lisp and Smalltalk are able to provide +richer debugging environments through more complete information than +their compiled counterparts. Acid is a means to gather and represent +similar information about compiled programs through cooperation +with the compilation tools and library implementers. +.NH +Acid the Language +.PP +Acid is a small interpreted language targeted to its debugging task. +It focuses on representing program state and addressing data rather than +expressing complex computations. Program state is +.I addressable +from an Acid program. +In addition to parsing and executing expressions and providing +an architecture-independent interface to the target process, +the interpreter supplies a mark-and-scan garbage collector +to manage storage. +.PP +Every Acid session begins with the loading of the Acid libraries. +These libraries contain functions, written in Acid, that provide +a standard debugging environment including breakpoint management, +stepping by instruction or statement, stack tracing, and +access to variables, memory, and registers. +The library contains 600 lines of Acid code and provides +functionality similar to +.I dbx . +Following the loading of the system library, Acid loads +user-specified libraries; this load sequence allows the +user to augment or override the standard commands +to customize the debugging environment. When all libraries +are loaded, Acid issues an interactive prompt and begins +evaluating expressions entered by the user. The Acid `commands' +are actually invocations of builtin primitives or previously defined +Acid functions. Acid evaluates each expression as it is entered and +prints the result. +.NH +Types and Variables +.PP +Acid variables are of four basic types: +.I integer , +.I string , +.I float , +and +.I list . +The type of a variable is inferred by the type of the right-hand side of +an assignment expression. +Many of the operators can be applied to more than +one type; for these operators the action of the operator is determined +by the type of its operands. +For example, +the +.CW + +operator adds +.I integer +and +.I float +operands, and concatenates +.I string +and +.I list +operands. +Lists are the only complex type in Acid; there are no arrays, structures +or pointers. Operators provide +.CW head , +.CW tail , +.CW append +and +.CW delete +operations. +Lists can also be indexed like arrays. +.PP +Acid has two levels of scope: global and local. +Function parameters and variables declared in a function body +using the +.CW local +keyword are created at entry to the function and +exist for the lifetime of a function. +Global variables are created by assignment and need not be declared. +All variables and functions in the program +being debugged are entered in the Acid symbol table as global +variables during Acid initialization. +Conflicting variable names are resolved by prefixing enough `$' characters +to make them unique. +Syntactically, Acid variables and target program +symbols are referenced identically. +However, the variables are managed differently in the Acid +symbol table and the user must be aware of this distinction. +The value of an Acid variable is stored in the symbol +table; a reference returns the value. +The symbol table entry for a variable or function in the target +program contains the address of that symbol in the image +of the program. Thus, the value of a program variable is +accessed by indirect reference through the Acid +variable that has the same name; the value of an Acid variable is the +address of the corresponding program variable. +.NH +Control Flow +.PP +The +.CW while +and +.CW loop +statements implement looping. +The former +is similar to the same statement in C. +The latter evaluates starting and ending expressions yielding +integers and iterates while an incrementing loop index +is within the bounds of those expressions. +.P1 +acid: i = 0; loop 1,5 do print(i=i+1) +0x00000001 +0x00000002 +0x00000003 +0x00000004 +0x00000005 +acid: +.P2 +The traditional +.CW if-then-else +statement implements conditional execution. +.NH +Addressing +.PP +Two indirection operators allow Acid to access values in +the program being debugged. +The +.CW * +operator fetches a value from the memory image of an +executing process; +the +.CW @ +operator fetches a value from the text file of the process. +When either operator appears on the left side of an assignment, the value +is written rather than read. +.PP +The indirection operator must know the size of the object +referenced by a variable. +The Plan 9 compilers neglect to include this +information in the program symbol table, so Acid cannot +derive this information implicitly. +Instead Acid variables have formats. +The format is a code +letter specifying the printing style and the effect of some of the +operators on that variable. +The indirection operators look at the format code to determine the +number of bytes to read or write. +The format codes are derived from the format letters used by +.I db . +By default, symbol table variables and numeric constants +are assigned the format code +.CW 'X' +which specifies 32-bit hexadecimal. +Printing such a variable yields output of the form +.CW 0x00123456 . +An indirect reference through the variable fetches 32 bits +of data at the address indicated by the variable. +Other formats specify various data types, for example +.CW i +an instruction, +.CW D +a signed 32 bit decimal, +.CW s +a null-terminated string. +The +.CW fmt +function +allows the user to change the format code of a variable +to control the printing format and +operator side effects. +This function evaluates the expression supplied as the first +argument, attaches the format code supplied as the second +argument to the result and returns that value. +If the result is assigned to a variable, +the new format code applies to +that variable. For convenience, Acid provides the +.CW \e +operator as a shorthand infix form of +.CW fmt . +For example: +.P1 +acid: x=10 +acid: x // print x in hex +0x0000000a +acid: x = fmt(x, 'D') // make x type decimal +acid: print(x, fmt(x, 'X'), x\eX) // print x in decimal & hex +10 0x0000000a 0x0000000a +acid: x // print x in decimal +10 +acid: x\eo // print x in octal +000000000012 +.P2 +The +.CW ++ +and +.CW -- +operators increment or decrement a variable by an amount +determined by its format code. Some formats imply a non-fixed size. +For example, the +.CW i +format code disassembles an instruction into a string. +On a 68020, which has variable length instructions: +.P1 +acid: p=main\ei // p=addr(main), type INST +acid: loop 1,5 do print(p\eX, @p++) // disassemble 5 instr's +0x0000222e LEA 0xffffe948(A7),A7 +0x00002232 MOVL s+0x4(A7),A2 +0x00002236 PEA 0x2f($0) +0x0000223a MOVL A2,-(A7) +0x0000223c BSR utfrrune +acid: +.P2 +Here, +.CW main +is the address of the function of the same name in the program under test. +The loop retrieves the five instructions beginning at that address and +then prints the address and the assembly language representation of each. +Notice that the stride of the increment operator varies with the size of +the instruction: the +.CW MOVL +at +.CW 0x0000223a +is a two byte instruction while all others are four bytes long. +.PP +Registers are treated as normal program variables referenced +by their symbolic assembler language names. +When a +process stops, the register set is saved by the kernel +at a known virtual address in the process memory map. +The Acid variables associated with the registers point +to the saved values and the +.CW * +indirection operator can then be used to read and write the register set. +Since the registers are accessed via Acid variables they may +be used in arbitrary expressions. +.P1 +acid: PC // addr of saved PC +0xc0000f60 +acid: *PC +0x0000623c // contents of PC +acid: *PC\ea +main +acid: *R1=10 // modify R1 +acid: asm(*PC+4) // disassemble @ PC+4 +main+0x4 0x00006240 MOVW R31,0x0(R29) +main+0x8 0x00006244 MOVW $setR30(SB),R30 +main+0x10 0x0000624c MOVW R1,_clock(SB) +.P2 +Here, the saved +.CW PC +is stored at address +.CW 0xc0000f60 ; +its current content is +.CW 0x0000623c . +The +.CW a ' ` +format code converts this value to a string specifying +the address as an offset beyond the nearest symbol. +After setting the value of register +.CW 1 , +the example uses the +.CW asm +command to disassemble a short section of code beginning +at four bytes beyond the current value of the +.CW PC . +.NH +Process Interface +.PP +A program executing under Acid is monitored through the +.I proc +file system interface provided by Plan 9. +Textual messages written to the +.CW ctl +file control the execution of the process. +For example writing +.CW waitstop +to the control file causes the write to block until the target +process enters the kernel and is stopped. When the process is stopped +the write completes. The +.CW startstop +message starts the target process and then does a +.CW waitstop +action. +Synchronization between the debugger and the target process is determined +by the actions of the various messages. Some operate asynchronously to the +target process and always complete immediately, others block until the +action completes. The asynchronous messages allow Acid to control +several processes simultaneously. +.PP +The interpreter has builtin functions named after each of the control +messages. The functions take a process id as argument. +Any time a control message causes the program to execute instructions +the interpreter performs two actions when the control operation has completed. +The Acid variables pointing at the register set are fixed up to point +at the saved registers, and then +the user defined function +.CW stopped +is executed. +The +.CW stopped +function may print the current address, +line of source or instruction and return to interactive mode. Alternatively +it may traverse a complex data structure, gather statistics and then set +the program running again. +.PP +Several Acid variables are maintained by the debugger rather than the +programmer. +These variables allow generic Acid code to deal with the current process, +architecture specifics or the symbol table. +The variable +.CW pid +is the process id of the current process Acid is debugging. +The variable +.CW symbols +contains a list of lists where each sublist contains the symbol +name, its type and the value of the symbol. +The variable +.CW registers +contains a list of the machine-specific register names. Global symbols in the target program +can be referenced directly by name from Acid. Local variables +are referenced using the colon operator as \f(CWfunction:variable\fP. +.NH +Source Level Debugging +.PP +Acid provides several builtin functions to manipulate source code. +The +.CW file +function reads a text file, inserting each line into a list. +The +.CW pcfile +and +.CW pcline +functions each take an address as an argument. +The first +returns a string containing the name of the source file +and the second returns an integer containing the line number +of the source line containing the instruction at the address. +.P1 +acid: pcfile(main) // file containing main +main.c +acid: pcline(main) // line # of main in source +11 +acid: file(pcfile(main))[pcline(main)] // print that line +main(int argc, char *argv[]) +acid: src(*PC) // print statements nearby + 9 + 10 void +>11 main(int argc, char *argv[]) + 12 { + 13 int a; +.P2 +In this example, the three primitives are combined in an expression to print +a line of source code associated with an address. +The +.CW src +function prints a few lines of source +around the address supplied as its argument. A companion routine, +.CW Bsrc , +communicates with the external editor +.CW sam . +Given an address, it loads the corresponding source file into the editor +and highlights the line containing the address. This simple interface +is easily extended to more complex functions. +For example, the +.CW step +function can select the current file and line in the editor +each time the target program stops, giving the user a visual +trace of the execution path of the program. A more complete interface +allowing two way communication between Acid and the +.CW acme +user interface [Pike93] is under construction. A filter between the debugger +and the user interface provides interpretation of results from both +sides of the interface. This allows the programming environment to +interact with the debugger and vice-versa, a capability missing from the +.CW sam +interface. +The +.CW src +and +.CW Bsrc +functions are both written in Acid code using the file and line primitives. +Acid provides library functions to step through source level +statements and functions. Furthermore, addresses in Acid expressions can be +specified by source file and line. +Source code is manipulated in the Acid +.I list +data type. +.NH +The Acid Library +.PP +The following examples define some useful commands and +illustrate the interaction of the debugger and the interpreter. +.P1 +defn bpset(addr) // set breakpoint +{ + if match(addr, bplist) >= 0 then + print("bkpoint already set:", addr\ea, "\en"); + else { + *fmt(addr, bpfmt) = bpinst; // plant it + bplist = append bplist, addr; // add to list + } +} +.P2 +The +.CW bpset +function plants a break point in memory. The function starts by +using the +.CW match +builtin to +search the breakpoint list to determine if a breakpoint is already +set at the address. +The indirection operator, controlled by the format code returned +by the +.CW fmt +primitive, is used to plant the breakpoint in memory. +The variables +.CW bpfmt +and +.CW bpinst +are Acid global variables containing the format code specifying +the size of the breakpoint instruction and the breakpoint instruction +itself. +These +variables are set by architecture-dependent library code +when the debugger first attaches to the executing image. +Finally the address of the breakpoint is +appended to the breakpoint list, +.CW bplist . +.P1 +defn step() // single step +{ + local lst, lpl, addr, bput; + + bput = 0; // sitting on bkpoint + if match(*PC, bplist) >= 0 then { + bput = fmt(*PC, bpfmt); // save current addr + *bput = @bput; // replace it + } + + lst = follow(*PC); // get follow set + + lpl = lst; + while lpl do { // place breakpoints + *(head lpl) = bpinst; + lpl = tail lpl; + } + + startstop(pid); // do the step + + while lst do { // remove breakpoints + addr = fmt(head lst, bpfmt); + *addr = @addr; // replace instr. + lst = tail lst; + } + if bput != 0 then + *bput = bpinst; // restore breakpoint +} +.P2 +The +.CW step +function executes a single assembler instruction. +If the +.CW PC +is sitting +on a breakpoint, the address and size of +the breakpoint are saved. +The breakpoint instruction +is then removed using the +.CW @ +operator to fetch +.CW bpfmt +bytes from the text file and to place it into the memory +of the executing process using the +.CW * +operator. +The +.CW follow +function is an Acid +builtin which returns a follow-set: a list of instruction addresses which +could be executed next. +If the instruction stored at the +.CW PC +is a branch instruction, the +list contains the addresses of the next instruction and +the branch destination; otherwise, it contains only the +address of the next instruction. +The follow-set is then used to replace each possible following +instruction with a breakpoint instruction. The original +instructions need not be saved; they remain +in their unaltered state in the text file. +The +.CW startstop +builtin writes the `startstop' message to the +.I proc +control file for the process named +.CW pid . +The target process executes until some condition causes it to +enter the kernel, in this case, the execution of a breakpoint. +When the process blocks, the debugger regains control and invokes the +Acid library function +.CW stopped +which reports the address and cause of the blockage. +The +.CW startstop +function completes and returns to the +.CW step +function where +the follow-set is used to replace the breakpoints placed earlier. +Finally, if the address of the original +.CW PC +contained a breakpoint, it is replaced. +.PP +Notice that this approach to process control is inherently portable; +the Acid code is shared by the debuggers for all architectures. +Acid variables and builtin functions provide a transparent interface +to architecture-dependent values and functions. Here the breakpoint +value and format are referenced through Acid variables and the +.CW follow +primitive masks the differences in the underlying instruction set. +.PP +The +.CW next +function, similar to the +.I dbx +command of the same name, +is a simpler example. +This function steps through +a single source statement but steps over function calls. +.P1 +defn next() +{ + local sp, bound; + + sp = *SP; // save starting SP + bound = fnbound(*PC); // begin & end of fn. + stmnt(); // step 1 statement + pc = *PC; + if pc >= bound[0] && pc < bound[1] then + return {}; + + while (pc<bound[0] || pc>bound[1]) && sp>=*SP do { + step(); + pc = *PC; + } + src(*PC); +} +.P2 +The +.CW next +function +starts by saving the current stack pointer in a local variable. +It then uses the Acid library function +.CW fnbound +to return the addresses of the first and last instructions in +the current function in a list. +The +.CW stmnt +function executes a single source statement and then uses +.CW src +to print a few lines of source around the new +.CW PC . +If the new value of the +.CW PC +remains in the current function, +.CW next +returns. +When the executed statement is a function call or a return +from a function, the new value of the +.CW PC +is outside the bounds calculated by +.CW fnbound +and the test of the +.CW while +loop is evaluated. +If the statement was a return, the new value of the stack pointer +is greater than the original value and the loop completes without +execution. +Otherwise, the loop is entered and instructions are continually +executed until the value of the +.CW PC +is between the bounds calculated earlier. At that point, execution +ceases and a few lines of source in the vicinity of the +.CW PC +are printed. +.PP +Acid provides concise and elegant expression for control and +manipulation of target programs. These examples demonstrate how a +few well-chosen primitives can be combined to create a rich debugging environment. +.NH +Dealing With Multiple Architectures +.PP +A single binary of Acid may be used to debug a program running on any +of the five processor architectures supported by Plan 9. For example, +Plan 9 allows a user on a MIPS to import the +.I proc +file system from an i486-based PC and remotely debug a program executing +on that processor. +.PP +Two levels of abstraction provide this architecture independence. +On the lowest level, a Plan 9 library supplies functions to +decode the file header of the program being debugged and +select a table of system parameters +and a jump vector of architecture-dependent +functions based on the magic number. +Among these functions are byte-order-independent +access to memory and text files, stack manipulation, disassembly, +and floating point number interpretation. +The second level of abstraction is supplied by Acid. +It consists of primitives and approximately 200 lines +of architecture-dependent Acid library code that interface the +interpreter to the architecture-dependent library. +This layer performs functions such as mapping register names to +memory locations, supplying breakpoint values and sizes, +and converting processor specific data to Acid data types. +An example of the latter is the stack trace function +.CW strace , +which uses the stack traversal functions in the +architecture-dependent library to construct a list of lists describing +the context of a process. The first level of list selects +each function in the trace; subordinate lists contain the +names and values of parameters and local variables of +the functions. Acid commands and library functions that +manipulate and display process state information operate +on the list representation and are independent of the +underlying architecture. +.NH +Alef Runtime +.PP +Alef is a concurrent programming language, +designed specifically for systems programming, which supports both +shared variable and message passing paradigms. +Alef borrows the C expression syntax but implements +a substantially different type system. +The language provides a rich set of +exception handling, process management, and synchronization +primitives, which rely on a runtime system. +Alef program bugs are often deadlocks, synchronization failures, +or non-termination caused by locks being held incorrectly. +In such cases, a process stalls deep +in the runtime code and it is clearly +unreasonable to expect a programmer using the language +to understand the detailed +internal semantics of the runtime support functions. +.PP +Instead, there is an Alef support library, coded in Acid, that +allows the programmer to interpret the program state in terms of +Alef operations. Consider the example of a multi-process program +stalling because of improper synchronization. A stack trace of +the program indicates that it is waiting for an event in some +obscure Alef runtime +synchronization function. +The function itself is irrelevant to the +programmer; of greater importance is the identity of the +unfulfilled event. +Commands in the Alef support library decode +the runtime data structures and program state to report the cause +of the blockage in terms of the high-level operations available to +the Alef programmer. +Here, the Acid language acts +as a communications medium between Alef implementer and Alef user. +.NH +Parallel Debugging +.PP +The central issue in parallel debugging is how the debugger is +multiplexed between the processes comprising +the program. +Acid has no intrinsic model of process partitioning; it +only assumes that parallel programs share a symbol table, +though they need not share memory. +The +.CW setproc +primitive attaches the debugger to a running process +associated with the process ID supplied as its argument +and assigns that value to the global variable +.CW pid , +thereby allowing simple rotation among a group of processes. +Further, the stack trace primitive is driven by parameters +specifying a unique process context, so it is possible to +examine the state of cooperating processes without switching +the debugger focus from the process of interest. +Since Acid is inherently extensible and capable of +dynamic interaction with subordinate processes, the +programmer can define Acid commands to detect and control +complex interactions between processes. +In short, the programmer is free to specify how the debugger reacts +to events generated in specific threads of the program. +.PP +The support for parallel debugging in Acid depends on a crucial kernel +modification: when the text segment of a program is written (usually to +place a breakpoint), the segment is cloned to prevent other threads +from encountering the breakpoint. Although this incurs a slight performance +penalty, it is of little importance while debugging. +.NH +Communication Between Tools +.PP +The Plan 9 Alef and C compilers do not +embed detailed type information in the symbol table of an +executable file. +However, they do accept a command line option causing them to +emit descriptions of complex data types +(e.g., aggregates and abstract data types) +to an auxiliary file. +The vehicle for expressing this information is Acid source code. +When an Acid debugging session is +subsequently started, that file is loaded with the other Acid libraries. +.PP +For each complex object in the program the compiler generates +three pieces of Acid code. +The first is a table describing the size and offset of each +member of the complex data type. Following is an Acid function, +named the same as the object, that formats and prints each member. +Finally, Acid declarations associate the +Alef or C program variables of a type with the functions +to print them. +The three forms of declaration are shown in the following example: +.P1 +struct Bitmap { + Rectangle 0 r; + Rectangle 16 clipr; + 'D' 32 ldepth; + 'D' 36 id; + 'X' 40 cache; +}; +.P2 +.P1 +defn +Bitmap(addr) { + complex Bitmap addr; + print("Rectangle r {\en"); + Rectangle(addr.r); + print("}\en"); + print("Rectangle clipr {\en"); + Rectangle(addr.clipr); + print("}\en"); + print(" ldepth ", addr.ldepth, "\en"); + print(" id ", addr.id, "\en"); + print(" cache ", addr.cache, "\en"); +}; + +complex Bitmap darkgrey; +complex Bitmap Window_settag:b; +.P2 +The +.CW struct +declaration specifies decoding instructions for the complex type named +.CW Bitmap . +Although the syntax is superficially similar to a C structure declaration, +the semantics differ markedly: the C declaration specifies a layout, while +the Acid declaration tells how to decode it. +The declaration specifies a type, an offset, and name for each +member of the complex object. The type is either the name of another +complex declaration, for example, +.CW Rectangle , +or a format code. +The offset is the number of bytes from the start +of the object to the member +and the name is the member's name in the Alef or C declaration. +This type description is a close match for C and Alef, but is simple enough +to be language independent. +.PP +The +.CW Bitmap +function expects the address of a +.CW Bitmap +as its only argument. +It uses the decoding information contained in the +.CW Bitmap +structure declaration to extract, format, and print the +value of each member of the complex object pointed to by +the argument. +The Alef compiler emits code to call other Acid functions +where a member is another complex type; here, +.CW Bitmap +calls +.CW Rectangle +to print its contents. +.PP +The +.CW complex +declarations associate Alef variables with complex types. +In the example, +.CW darkgrey +is the name of a global variable of type +.CW Bitmap +in the program being debugged. +Whenever the name +.CW darkgrey +is evaluated by Acid, it automatically calls the +.CW Bitmap +function with the address of +.CW darkgrey +as the argument. +The second +.CW complex +declaration associates a local variable or parameter named +.CW b +in function +.CW Window_settag +with the +.CW Bitmap +complex data type. +.PP +Acid borrows the C operators +.CW . +and +.CW -> +to access the decoding parameters of a member of a complex type. +Although this representation is sufficiently general for describing +the decoding of both C and Alef complex data types, it may +prove too restrictive for target languages with more complicated +type systems. +Further, the assumption that the compiler can select the proper +Acid format code for each basic type in the language is somewhat +naive. For example, when a member of a complex type is a pointer, +it is assigned a hexadecimal type code; integer members are always +assigned a decimal type code. +This heuristic proves inaccurate when an integer field is a +bit mask or set of bit flags which are more appropriately displayed +in hexadecimal or octal. +.NH +Code Verification +.PP +Acid's ability to interact dynamically with +an executing program allows passive test and +verification of the target program. For example, +a common concern is leak detection in programs using +.CW malloc . +Of interest are two items: finding memory that was allocated +but never freed and detecting bad pointers passed to +.CW free . +An auxiliary Acid library contains Acid functions to +monitor the execution of a program and detect these +faults, either as they happen or in the automated +post-mortem analysis of the memory arena. +In the following example, the +.CW sort +command is run under the control of the +Acid memory leak library. +.P1 +helix% acid -l malloc /bin/sort +/bin/sort: mips plan 9 executable +/lib/acid/port +/lib/acid/mips +/lib/acid/malloc +acid: go() +now +is +the +time +<ctrl-d> +is +now +the +time +27680 : breakpoint _exits+0x4 MOVW $0x8,R1 +acid: +.P2 +The +.CW go +command creates a process and plants +breakpoints at the entry to +.CW malloc +and +.CW free . +The program is then started and continues until it +exits or stops. If the reason for stopping is anything +other than the breakpoints in +.CW malloc +and +.CW free , +Acid prints the usual status information and returns to the +interactive prompt. +.PP +When the process stops on entering +.CW malloc , +the debugger must capture and save the address that +.CW malloc +will return. +After saving a stack +trace so the calling routine can be identified, it places +a breakpoint at the return address and restarts the program. +When +.CW malloc +returns, the breakpoint stops the program, +allowing the debugger +to grab the address of the new memory block from the return register. +The address and stack trace are added to the list of outstanding +memory blocks, the breakpoint is removed from the return point, and +the process is restarted. +.PP +When the process stops at the beginning of +.CW free , +the memory address supplied as the argument is compared to the list +of outstanding memory blocks. If it is not found an error message +and a stack trace of the call is reported; otherwise, the +address is deleted from the list. +.PP +When the program exits, the list of outstanding memory blocks contains +the addresses of all blocks that were allocated but never freed. +The +.CW leak +library function traverses the list producing a report describing +the allocated blocks. +.P1 1m +acid: leak() +Lost a total of 524288 bytes from: + malloc() malloc.c:32 called from dofile+0xe8 sort.c:217 + dofile() sort.c:190 called from main+0xac sort.c:161 + main() sort.c:128 called from _main+0x20 main9.s:10 +Lost a total of 64 bytes from: + malloc() malloc.c:32 called from newline+0xfc sort.c:280 + newline() sort.c:248 called from dofile+0x110 sort.c:222 + dofile() sort.c:190 called from main+0xac sort.c:161 + main() sort.c:128 called from _main+0x20 main9.s:10 +Lost a total of 64 bytes from: + malloc() malloc.c:32 called from realloc+0x14 malloc.c:129 + realloc() malloc.c:123 called from bldkey+0x358 sort.c:1388 + buildkey() sort.c:1345 called from newline+0x150 sort.c:285 + newline() sort.c:248 called from dofile+0x110 sort.c:222 + dofile() sort.c:190 called from main+0xac sort.c:161 + main() sort.c:128 called from _main+0x20 main9.s:10 +acid: refs() +data...bss...stack... +acid: leak() +acid: +.P2 +The presence of a block in the allocation list does not imply +it is there because of a leak; for instance, it may have been +in use when the program terminated. +The +.CW refs() +library function scans the +.I data , +.I bss , +and +.I stack +segments of the process looking for pointers +into the allocated blocks. When one is found, the block is deleted from +the outstanding block list. +The +.CW leak +function is used again to report the +blocks remaining allocated and unreferenced. +This strategy proves effective in detecting +disconnected (but non-circular) data structures. +.PP +The leak detection process is entirely passive. +The program is not +specially compiled and the source code is not required. +As with the Acid support functions for the Alef runtime environment, +the author of the library routines has encapsulated the +functionality of the library interface +in Acid code. +Any programmer may then check a program's use of the +library routines without knowledge of either implementation. +The performance impact of running leak detection is great +(about 10 times slower), +but it has not prevented interactive programs like +.CW sam +and the +.CW 8½ +window system from being tested. +.NH +Code Coverage +.PP +Another common component of software test uses +.I coverage +analysis. +The purpose of the test is to determine which paths through the code have +not been executed while running the test suite. +This is usually +performed by a combination of compiler support and a reporting tool run +on the output generated by statements compiled into the program. +The compiler emits code that +logs the progress of the program as it executes basic blocks and writes the +results to a file. The file is then processed by the reporting tool +to determine which basic blocks have not been executed. +.PP +Acid can perform the same function in a language independent manner without +modifying the source, object or binary of the program. The following example +shows +.CW ls +being run under the control of the Acid coverage library. +.P1 +philw-helix% acid -l coverage /bin/ls +/bin/ls: mips plan 9 executable +/lib/acid/port +/lib/acid/mips +/lib/acid/coverage +acid: coverage() +acid +newstime +profile +tel +wintool +2: (error) msg: pid=11419 startstop: process exited +acid: analyse(ls) +ls.c:102,105 + 102: return 1; + 103: } + 104: if(db[0].qid.path&CHDIR && dflag==0){ + 105: output(); +ls.c:122,126 + 122: memmove(dirbuf+ndir, db, sizeof(Dir)); + 123: dirbuf[ndir].prefix = 0; + 124: p = utfrrune(s, '/'); + 125: if(p){ + 126: dirbuf[ndir].prefix = s; +.P2 +The +.CW coverage +function begins by looping through the text segment placing +breakpoints at the entry to each basic block. The start of each basic +block is found using the Acid builtin function +.CW follow . +If the list generated by +.CW follow +contains more than one +element, then the addresses mark the start of basic blocks. A breakpoint +is placed at each address to detect entry into the block. If the result +of +.CW follow +is a single address then no action is taken, and the next address is +considered. Acid maintains a list of +breakpoints already in place and avoids placing duplicates (an address may be +the destination of several branches). +.PP +After placing the breakpoints the program is set running. +Each time a breakpoint is encountered +Acid deletes the address from the breakpoint list, removes the breakpoint +from memory and then restarts the program. +At any instant the breakpoint list contains the addresses of basic blocks +which have not been executed. +The +.CW analyse +function reports the lines of source code bounded by basic blocks +whose addresses are have not been deleted from the breakpoint list. +These are the basic blocks which have not been executed. +Program performance is almost unaffected since each breakpoint is executed +only once and then removed. +.PP +The library contains a total of 128 lines of Acid code. +An obvious extension of this algorithm could be used to provide basic block +profiling. +.NH +Conclusion +.PP +Acid has two areas of weakness. As with +other language-based tools like +.I awk , +a programmer must learn yet another language to step beyond the normal +debugging functions and use the full power of the debugger. +Second, the command line interface supplied by the +.I yacc +parser is inordinately clumsy. +Part of the problem relates directly to the use of +.I yacc +and could be circumvented with a custom parser. +However, structural problems would remain: Acid often requires +too much typing to execute a simple +command. +A debugger should prostitute itself to its users, doing whatever +is wanted with a minimum of encouragement; commands should be +concise and obvious. The language interface is more consistent than +an ad hoc command interface but is clumsy to use. +Most of these problems are addressed by an Acme interface +which is under construction. This should provide the best of +both worlds: graphical debugging and access to the underlying acid +language when required. +.PP +The name space clash between Acid variables, keywords, program variables, +and functions is unavoidable. +Although it rarely affects a debugging session, it is annoying +when it happens and is sometimes difficult to circumvent. +The current renaming scheme +is too crude; the new names are too hard to remember. +.PP +Acid has proved to be a powerful tool whose applications +have exceeded expectations. +Of its strengths, portability, extensibility and parallel debugging support +were by design and provide the expected utility. +In retrospect, +its use as a tool for code test and verification and as +a medium for communicating type information and encapsulating +interfaces has provided unanticipated benefits and altered our +view of the debugging process. +.NH +Acknowledgments +.PP +Bob Flandrena was the first user and helped prepare the paper. +Rob Pike endured three buggy Alef compilers and a new debugger +in a single sitting. +.NH +References +.LP +[Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey, +``Plan 9 from Bell Labs'', +.I +UKUUG Proc. of the Summer 1990 Conf., +.R +London, England, +1990, +reprinted, in a different form, in this volume. +.LP +[Gol93] M. Golan, D. Hanson, +``DUEL -- A Very High-Level Debugging Language'', +.I +USENIX Proc. of the Winter 1993 Conf., +.R +San Diego, CA, +1993. +.LP +[Lin90] M. A. Linton, +``The Evolution of DBX'', +.I +USENIX Proc. of the Summer 1990 Conf., +.R +Anaheim, CA, +1990. +.LP +[Stal91] R. M. Stallman, R. H. Pesch, +``Using GDB: A guide to the GNU source level debugger'', +Technical Report, Free Software Foundation, +Cambridge, MA, +1991. +.LP +[Win93] P. Winterbottom, +``Alef reference Manual'', +this volume. +.LP +[Pike93] Rob Pike, +``Acme: A User Interface for Programmers'', +.I +USENIX Proc. of the Winter 1994 Conf., +.R +San Francisco, CA, +reprinted in this volume. +.LP +[Ols90] Ronald A. Olsson, Richard H. Crawford, and W. Wilson Ho, +``Dalek: A GNU, improved programmable debugger'', +.I +USENIX Proc. of the Summer 1990 Conf., +.R +Anaheim, CA. +.LP +[May92] Paul Maybee, +``NeD: The Network Extensible Debugger'' +.I +USENIX Proc. of the Summer 1992 Conf., +.R +San Antonio, TX. +.LP +[Aral] Ziya Aral, Ilya Gertner, and Greg Schaffer, +``Efficient debugging primitives for multiprocessors'', +.I +Proceedings of the Third International Conference on Architectural +Support for Programming Languages and Operating Systems, +.R +SIGPLAN notices Nr. 22, May 1989. |