1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
|
.HTML "Acid: A Debugger Built From A Language
.TL
Acid: A Debugger Built From A Language
.AU
Phil Winterbottom
philw@plan9.bell-labs.com
.AB
.FS
Originally appeared in
.I
Proc. of the Winter 1994 USENIX Conf.,
.R
pp. 211-222,
San Francisco, CA
.FE
Acid is an unusual source-level symbolic debugger for Plan 9. It is implemented
as a language interpreter with specialized primitives that provide
debugger support. Programs written in the language manipulate
one or more target processes; variables in the language represent the
symbols, state, and resources of those processes.
This structure allows complex
interaction between the debugger and the target program and
provides a convenient method of parameterizing differences between
machine architectures.
Although some effort is required to learn
the debugging language, the richness and flexibility of the
debugging environment encourages new ways of reasoning about the way
programs run and the conditions under which they fail.
.AE
.NH
Introduction
.PP
The size and complexity
of programs have increased in proportion to processor speed and memory but
the interface between debugger and programmer has changed little.
Graphical user interfaces have eased some of the tedious
aspects of the interaction. A graphical interface is a convenient
means for navigating through source and data structures but provides
little benefit for process control.
The introduction of a new concurrent language, Alef [Win93], emphasized the
inadequacies of the existing Plan 9 [Pike90] debugger
.I db ,
a distant relative of
.I adb ,
and made it clear that a new debugger was required.
.PP
Current debuggers like
.I dbx ,
.I sdb ,
and
.I gdb
are limited to answering only the questions their authors
envisage. As a result, they supply a plethora
of specialized commands, each attempting to anticipate
a specific question a user may ask.
When a debugging situation arises that is beyond the scope
of the command set, the tool is useless.
Further,
it is often tedious or impossible to reproduce an anomalous state
of the program, especially when
the state is embedded in the program's data structures.
.PP
Acid applies some ideas found in CAD software used for
hardware test and simulation.
It is based on the notion that the state and resources of a program
are best represented and manipulated by a language. The state and resources,
such as memory, registers, variables, type information and source code
are represented by variables in the language.
Expressions provide a computation mechanism and control
statements allow repetitive or selective interpretation based
on the result of expression evaluation.
The heart of the Acid debugger is an interpreter for a small typeless
language whose operators mirror the operations
of C and Alef, which in turn correspond well to the basic operations of
the machine. The interpreter itself knows nothing of the underlying
hardware; it deals with the program state and resources
in the abstract.
Fundamental routines to control
processes, read files, and interface to the system are implemented
as builtin functions available to the interpreter.
The actual debugger functionality is coded
in Acid; commands are implemented as Acid functions.
.PP
This language-based approach has several advantages.
Most importantly, programs written in Acid, including most of the
debugger itself, are inherently portable.
Furthermore, Acid avoids the limitations other debuggers impose when
debugging parallel programs. Instead of embedding a fixed
process model in the debugger, Acid allows the
programmer to adapt the debugger to handle an
arbitrary process partitioning or program structure.
The ability to
interact dynamically with an executing process provides clear advantages
over debuggers constrained to probe a static image.
Finally, the Acid language is a powerful vehicle for expressing
assertions about logic, process state, and the contents of data structures.
When combined with dynamic interaction it allows a
limited form of automated program verification without requiring
modification or recompilation of the source code.
The language is also an
excellent vehicle for preserving a test suite for later regression testing.
.PP
The debugger may be customized by its users; standard
functions may be modified or extended to suit a particular application
or preference.
For example, the kernel developers in our group require a
command set supporting assembler-level debugging while the application
programmers prefer source-level functionality.
Although the default library is biased toward assembler-level debugging,
it is easily modified to provide a convenient source-level interface.
The debugger itself does not change; the user combines primitives
and existing Acid functions in different ways to
implement the desired interface.
.NH
Related Work
.PP
DUEL [Gol93], an extension to
.I gdb
[Stal91], proposes using a high level expression evaluator to solve
some of these problems. The evaluator provides iterators to loop over data
structures and conditionals to control evaluation of expressions.
The author shows that complex state queries can be formulated
by combining concise expressions but this only addresses part of the problem.
A program is a dynamic entity; questions asked when the program is in
a static state are meaningful only after the program has been `caught' in
that state. The framework for manipulating the program is still as
primitive as the underlying debugger. While DUEL provides a means to
probe data structures it entirely neglects the most beneficial aspect
of debugging languages: the ability to control processes. Acid is structured
around a thread of control that passes between the interpreter and the
target program.
.PP
The NeD debugger [May92] is a set of extensions to TCL [Ous90] that provide
debugging primitives. The resulting language, NeDtcl, is used to implement
a portable interface between a conventional debugger, pdb [May90], and
a server that executes NeDtcl programs operating on the target program.
Execution of the NeDtcl programs implements the debugging primitives
that pdb expects.
NeD is targeted at multi-process debugging across a network,
and proves the flexibility of a language as a means of
communication between debugging tools. Whereas NeD provides an interface
between a conventional debugger and the process it debugs, Acid is the
debugger itself. While NeD has some of the ideas
found in Acid it is targeted toward a different purpose. Acid seeks to
integrate the manipulation of a program's resources into the debugger
while NeD provides a flexible interconnect between components of
the debugging environment. The choice of TCL is appropriate for its use
in NeD but is not suitable for Acid. Acid relies on the coupling of the type
system with expression evaluation, which are the root of its design,
to provide the debugging primitives.
.PP
Dalek [Ols90] is an event based language extension to gdb. State transitions
in the target program cause events to be queued for processing by the
debugging language.
.PP
Acid has many of the advantages of same process or
.I local
.I agent
debuggers, like Parasight [Aral], without the need for dynamic linking or
shared memory.
Acid improves on the ideas of these other systems by completely integrating
all aspects of the debugging process into the language environment. Of
particular importance is the relationship between Acid variables,
program symbols, source code, registers and type information. This
integration is made possible by the design of the Acid language.
.PP
Interpreted languages such as Lisp and Smalltalk are able to provide
richer debugging environments through more complete information than
their compiled counterparts. Acid is a means to gather and represent
similar information about compiled programs through cooperation
with the compilation tools and library implementers.
.NH
Acid the Language
.PP
Acid is a small interpreted language targeted to its debugging task.
It focuses on representing program state and addressing data rather than
expressing complex computations. Program state is
.I addressable
from an Acid program.
In addition to parsing and executing expressions and providing
an architecture-independent interface to the target process,
the interpreter supplies a mark-and-scan garbage collector
to manage storage.
.PP
Every Acid session begins with the loading of the Acid libraries.
These libraries contain functions, written in Acid, that provide
a standard debugging environment including breakpoint management,
stepping by instruction or statement, stack tracing, and
access to variables, memory, and registers.
The library contains 600 lines of Acid code and provides
functionality similar to
.I dbx .
Following the loading of the system library, Acid loads
user-specified libraries; this load sequence allows the
user to augment or override the standard commands
to customize the debugging environment. When all libraries
are loaded, Acid issues an interactive prompt and begins
evaluating expressions entered by the user. The Acid `commands'
are actually invocations of builtin primitives or previously defined
Acid functions. Acid evaluates each expression as it is entered and
prints the result.
.NH
Types and Variables
.PP
Acid variables are of four basic types:
.I integer ,
.I string ,
.I float ,
and
.I list .
The type of a variable is inferred by the type of the right-hand side of
an assignment expression.
Many of the operators can be applied to more than
one type; for these operators the action of the operator is determined
by the type of its operands.
For example,
the
.CW +
operator adds
.I integer
and
.I float
operands, and concatenates
.I string
and
.I list
operands.
Lists are the only complex type in Acid; there are no arrays, structures
or pointers. Operators provide
.CW head ,
.CW tail ,
.CW append
and
.CW delete
operations.
Lists can also be indexed like arrays.
.PP
Acid has two levels of scope: global and local.
Function parameters and variables declared in a function body
using the
.CW local
keyword are created at entry to the function and
exist for the lifetime of a function.
Global variables are created by assignment and need not be declared.
All variables and functions in the program
being debugged are entered in the Acid symbol table as global
variables during Acid initialization.
Conflicting variable names are resolved by prefixing enough `$' characters
to make them unique.
Syntactically, Acid variables and target program
symbols are referenced identically.
However, the variables are managed differently in the Acid
symbol table and the user must be aware of this distinction.
The value of an Acid variable is stored in the symbol
table; a reference returns the value.
The symbol table entry for a variable or function in the target
program contains the address of that symbol in the image
of the program. Thus, the value of a program variable is
accessed by indirect reference through the Acid
variable that has the same name; the value of an Acid variable is the
address of the corresponding program variable.
.NH
Control Flow
.PP
The
.CW while
and
.CW loop
statements implement looping.
The former
is similar to the same statement in C.
The latter evaluates starting and ending expressions yielding
integers and iterates while an incrementing loop index
is within the bounds of those expressions.
.P1
acid: i = 0; loop 1,5 do print(i=i+1)
0x00000001
0x00000002
0x00000003
0x00000004
0x00000005
acid:
.P2
The traditional
.CW if-then-else
statement implements conditional execution.
.NH
Addressing
.PP
Two indirection operators allow Acid to access values in
the program being debugged.
The
.CW *
operator fetches a value from the memory image of an
executing process;
the
.CW @
operator fetches a value from the text file of the process.
When either operator appears on the left side of an assignment, the value
is written rather than read.
.PP
The indirection operator must know the size of the object
referenced by a variable.
The Plan 9 compilers neglect to include this
information in the program symbol table, so Acid cannot
derive this information implicitly.
Instead Acid variables have formats.
The format is a code
letter specifying the printing style and the effect of some of the
operators on that variable.
The indirection operators look at the format code to determine the
number of bytes to read or write.
The format codes are derived from the format letters used by
.I db .
By default, symbol table variables and numeric constants
are assigned the format code
.CW 'X'
which specifies 32-bit hexadecimal.
Printing such a variable yields output of the form
.CW 0x00123456 .
An indirect reference through the variable fetches 32 bits
of data at the address indicated by the variable.
Other formats specify various data types, for example
.CW i
an instruction,
.CW D
a signed 32 bit decimal,
.CW s
a null-terminated string.
The
.CW fmt
function
allows the user to change the format code of a variable
to control the printing format and
operator side effects.
This function evaluates the expression supplied as the first
argument, attaches the format code supplied as the second
argument to the result and returns that value.
If the result is assigned to a variable,
the new format code applies to
that variable. For convenience, Acid provides the
.CW \e
operator as a shorthand infix form of
.CW fmt .
For example:
.P1
acid: x=10
acid: x // print x in hex
0x0000000a
acid: x = fmt(x, 'D') // make x type decimal
acid: print(x, fmt(x, 'X'), x\eX) // print x in decimal & hex
10 0x0000000a 0x0000000a
acid: x // print x in decimal
10
acid: x\eo // print x in octal
000000000012
.P2
The
.CW ++
and
.CW --
operators increment or decrement a variable by an amount
determined by its format code. Some formats imply a non-fixed size.
For example, the
.CW i
format code disassembles an instruction into a string.
On a 68020, which has variable length instructions:
.P1
acid: p=main\ei // p=addr(main), type INST
acid: loop 1,5 do print(p\eX, @p++) // disassemble 5 instr's
0x0000222e LEA 0xffffe948(A7),A7
0x00002232 MOVL s+0x4(A7),A2
0x00002236 PEA 0x2f($0)
0x0000223a MOVL A2,-(A7)
0x0000223c BSR utfrrune
acid:
.P2
Here,
.CW main
is the address of the function of the same name in the program under test.
The loop retrieves the five instructions beginning at that address and
then prints the address and the assembly language representation of each.
Notice that the stride of the increment operator varies with the size of
the instruction: the
.CW MOVL
at
.CW 0x0000223a
is a two byte instruction while all others are four bytes long.
.PP
Registers are treated as normal program variables referenced
by their symbolic assembler language names.
When a
process stops, the register set is saved by the kernel
at a known virtual address in the process memory map.
The Acid variables associated with the registers point
to the saved values and the
.CW *
indirection operator can then be used to read and write the register set.
Since the registers are accessed via Acid variables they may
be used in arbitrary expressions.
.P1
acid: PC // addr of saved PC
0xc0000f60
acid: *PC
0x0000623c // contents of PC
acid: *PC\ea
main
acid: *R1=10 // modify R1
acid: asm(*PC+4) // disassemble @ PC+4
main+0x4 0x00006240 MOVW R31,0x0(R29)
main+0x8 0x00006244 MOVW $setR30(SB),R30
main+0x10 0x0000624c MOVW R1,_clock(SB)
.P2
Here, the saved
.CW PC
is stored at address
.CW 0xc0000f60 ;
its current content is
.CW 0x0000623c .
The
.CW a ' `
format code converts this value to a string specifying
the address as an offset beyond the nearest symbol.
After setting the value of register
.CW 1 ,
the example uses the
.CW asm
command to disassemble a short section of code beginning
at four bytes beyond the current value of the
.CW PC .
.NH
Process Interface
.PP
A program executing under Acid is monitored through the
.I proc
file system interface provided by Plan 9.
Textual messages written to the
.CW ctl
file control the execution of the process.
For example writing
.CW waitstop
to the control file causes the write to block until the target
process enters the kernel and is stopped. When the process is stopped
the write completes. The
.CW startstop
message starts the target process and then does a
.CW waitstop
action.
Synchronization between the debugger and the target process is determined
by the actions of the various messages. Some operate asynchronously to the
target process and always complete immediately, others block until the
action completes. The asynchronous messages allow Acid to control
several processes simultaneously.
.PP
The interpreter has builtin functions named after each of the control
messages. The functions take a process id as argument.
Any time a control message causes the program to execute instructions
the interpreter performs two actions when the control operation has completed.
The Acid variables pointing at the register set are fixed up to point
at the saved registers, and then
the user defined function
.CW stopped
is executed.
The
.CW stopped
function may print the current address,
line of source or instruction and return to interactive mode. Alternatively
it may traverse a complex data structure, gather statistics and then set
the program running again.
.PP
Several Acid variables are maintained by the debugger rather than the
programmer.
These variables allow generic Acid code to deal with the current process,
architecture specifics or the symbol table.
The variable
.CW pid
is the process id of the current process Acid is debugging.
The variable
.CW symbols
contains a list of lists where each sublist contains the symbol
name, its type and the value of the symbol.
The variable
.CW registers
contains a list of the machine-specific register names. Global symbols in the target program
can be referenced directly by name from Acid. Local variables
are referenced using the colon operator as \f(CWfunction:variable\fP.
.NH
Source Level Debugging
.PP
Acid provides several builtin functions to manipulate source code.
The
.CW file
function reads a text file, inserting each line into a list.
The
.CW pcfile
and
.CW pcline
functions each take an address as an argument.
The first
returns a string containing the name of the source file
and the second returns an integer containing the line number
of the source line containing the instruction at the address.
.P1
acid: pcfile(main) // file containing main
main.c
acid: pcline(main) // line # of main in source
11
acid: file(pcfile(main))[pcline(main)] // print that line
main(int argc, char *argv[])
acid: src(*PC) // print statements nearby
9
10 void
>11 main(int argc, char *argv[])
12 {
13 int a;
.P2
In this example, the three primitives are combined in an expression to print
a line of source code associated with an address.
The
.CW src
function prints a few lines of source
around the address supplied as its argument. A companion routine,
.CW Bsrc ,
communicates with the external editor
.CW sam .
Given an address, it loads the corresponding source file into the editor
and highlights the line containing the address. This simple interface
is easily extended to more complex functions.
For example, the
.CW step
function can select the current file and line in the editor
each time the target program stops, giving the user a visual
trace of the execution path of the program. A more complete interface
allowing two way communication between Acid and the
.CW acme
user interface [Pike93] is under construction. A filter between the debugger
and the user interface provides interpretation of results from both
sides of the interface. This allows the programming environment to
interact with the debugger and vice-versa, a capability missing from the
.CW sam
interface.
The
.CW src
and
.CW Bsrc
functions are both written in Acid code using the file and line primitives.
Acid provides library functions to step through source level
statements and functions. Furthermore, addresses in Acid expressions can be
specified by source file and line.
Source code is manipulated in the Acid
.I list
data type.
.NH
The Acid Library
.PP
The following examples define some useful commands and
illustrate the interaction of the debugger and the interpreter.
.P1
defn bpset(addr) // set breakpoint
{
if match(addr, bplist) >= 0 then
print("bkpoint already set:", addr\ea, "\en");
else {
*fmt(addr, bpfmt) = bpinst; // plant it
bplist = append bplist, addr; // add to list
}
}
.P2
The
.CW bpset
function plants a break point in memory. The function starts by
using the
.CW match
builtin to
search the breakpoint list to determine if a breakpoint is already
set at the address.
The indirection operator, controlled by the format code returned
by the
.CW fmt
primitive, is used to plant the breakpoint in memory.
The variables
.CW bpfmt
and
.CW bpinst
are Acid global variables containing the format code specifying
the size of the breakpoint instruction and the breakpoint instruction
itself.
These
variables are set by architecture-dependent library code
when the debugger first attaches to the executing image.
Finally the address of the breakpoint is
appended to the breakpoint list,
.CW bplist .
.P1
defn step() // single step
{
local lst, lpl, addr, bput;
bput = 0; // sitting on bkpoint
if match(*PC, bplist) >= 0 then {
bput = fmt(*PC, bpfmt); // save current addr
*bput = @bput; // replace it
}
lst = follow(*PC); // get follow set
lpl = lst;
while lpl do { // place breakpoints
*(head lpl) = bpinst;
lpl = tail lpl;
}
startstop(pid); // do the step
while lst do { // remove breakpoints
addr = fmt(head lst, bpfmt);
*addr = @addr; // replace instr.
lst = tail lst;
}
if bput != 0 then
*bput = bpinst; // restore breakpoint
}
.P2
The
.CW step
function executes a single assembler instruction.
If the
.CW PC
is sitting
on a breakpoint, the address and size of
the breakpoint are saved.
The breakpoint instruction
is then removed using the
.CW @
operator to fetch
.CW bpfmt
bytes from the text file and to place it into the memory
of the executing process using the
.CW *
operator.
The
.CW follow
function is an Acid
builtin which returns a follow-set: a list of instruction addresses which
could be executed next.
If the instruction stored at the
.CW PC
is a branch instruction, the
list contains the addresses of the next instruction and
the branch destination; otherwise, it contains only the
address of the next instruction.
The follow-set is then used to replace each possible following
instruction with a breakpoint instruction. The original
instructions need not be saved; they remain
in their unaltered state in the text file.
The
.CW startstop
builtin writes the `startstop' message to the
.I proc
control file for the process named
.CW pid .
The target process executes until some condition causes it to
enter the kernel, in this case, the execution of a breakpoint.
When the process blocks, the debugger regains control and invokes the
Acid library function
.CW stopped
which reports the address and cause of the blockage.
The
.CW startstop
function completes and returns to the
.CW step
function where
the follow-set is used to replace the breakpoints placed earlier.
Finally, if the address of the original
.CW PC
contained a breakpoint, it is replaced.
.PP
Notice that this approach to process control is inherently portable;
the Acid code is shared by the debuggers for all architectures.
Acid variables and builtin functions provide a transparent interface
to architecture-dependent values and functions. Here the breakpoint
value and format are referenced through Acid variables and the
.CW follow
primitive masks the differences in the underlying instruction set.
.PP
The
.CW next
function, similar to the
.I dbx
command of the same name,
is a simpler example.
This function steps through
a single source statement but steps over function calls.
.P1
defn next()
{
local sp, bound;
sp = *SP; // save starting SP
bound = fnbound(*PC); // begin & end of fn.
stmnt(); // step 1 statement
pc = *PC;
if pc >= bound[0] && pc < bound[1] then
return {};
while (pc<bound[0] || pc>bound[1]) && sp>=*SP do {
step();
pc = *PC;
}
src(*PC);
}
.P2
The
.CW next
function
starts by saving the current stack pointer in a local variable.
It then uses the Acid library function
.CW fnbound
to return the addresses of the first and last instructions in
the current function in a list.
The
.CW stmnt
function executes a single source statement and then uses
.CW src
to print a few lines of source around the new
.CW PC .
If the new value of the
.CW PC
remains in the current function,
.CW next
returns.
When the executed statement is a function call or a return
from a function, the new value of the
.CW PC
is outside the bounds calculated by
.CW fnbound
and the test of the
.CW while
loop is evaluated.
If the statement was a return, the new value of the stack pointer
is greater than the original value and the loop completes without
execution.
Otherwise, the loop is entered and instructions are continually
executed until the value of the
.CW PC
is between the bounds calculated earlier. At that point, execution
ceases and a few lines of source in the vicinity of the
.CW PC
are printed.
.PP
Acid provides concise and elegant expression for control and
manipulation of target programs. These examples demonstrate how a
few well-chosen primitives can be combined to create a rich debugging environment.
.NH
Dealing With Multiple Architectures
.PP
A single binary of Acid may be used to debug a program running on any
of the five processor architectures supported by Plan 9. For example,
Plan 9 allows a user on a MIPS to import the
.I proc
file system from an i486-based PC and remotely debug a program executing
on that processor.
.PP
Two levels of abstraction provide this architecture independence.
On the lowest level, a Plan 9 library supplies functions to
decode the file header of the program being debugged and
select a table of system parameters
and a jump vector of architecture-dependent
functions based on the magic number.
Among these functions are byte-order-independent
access to memory and text files, stack manipulation, disassembly,
and floating point number interpretation.
The second level of abstraction is supplied by Acid.
It consists of primitives and approximately 200 lines
of architecture-dependent Acid library code that interface the
interpreter to the architecture-dependent library.
This layer performs functions such as mapping register names to
memory locations, supplying breakpoint values and sizes,
and converting processor specific data to Acid data types.
An example of the latter is the stack trace function
.CW strace ,
which uses the stack traversal functions in the
architecture-dependent library to construct a list of lists describing
the context of a process. The first level of list selects
each function in the trace; subordinate lists contain the
names and values of parameters and local variables of
the functions. Acid commands and library functions that
manipulate and display process state information operate
on the list representation and are independent of the
underlying architecture.
.NH
Alef Runtime
.PP
Alef is a concurrent programming language,
designed specifically for systems programming, which supports both
shared variable and message passing paradigms.
Alef borrows the C expression syntax but implements
a substantially different type system.
The language provides a rich set of
exception handling, process management, and synchronization
primitives, which rely on a runtime system.
Alef program bugs are often deadlocks, synchronization failures,
or non-termination caused by locks being held incorrectly.
In such cases, a process stalls deep
in the runtime code and it is clearly
unreasonable to expect a programmer using the language
to understand the detailed
internal semantics of the runtime support functions.
.PP
Instead, there is an Alef support library, coded in Acid, that
allows the programmer to interpret the program state in terms of
Alef operations. Consider the example of a multi-process program
stalling because of improper synchronization. A stack trace of
the program indicates that it is waiting for an event in some
obscure Alef runtime
synchronization function.
The function itself is irrelevant to the
programmer; of greater importance is the identity of the
unfulfilled event.
Commands in the Alef support library decode
the runtime data structures and program state to report the cause
of the blockage in terms of the high-level operations available to
the Alef programmer.
Here, the Acid language acts
as a communications medium between Alef implementer and Alef user.
.NH
Parallel Debugging
.PP
The central issue in parallel debugging is how the debugger is
multiplexed between the processes comprising
the program.
Acid has no intrinsic model of process partitioning; it
only assumes that parallel programs share a symbol table,
though they need not share memory.
The
.CW setproc
primitive attaches the debugger to a running process
associated with the process ID supplied as its argument
and assigns that value to the global variable
.CW pid ,
thereby allowing simple rotation among a group of processes.
Further, the stack trace primitive is driven by parameters
specifying a unique process context, so it is possible to
examine the state of cooperating processes without switching
the debugger focus from the process of interest.
Since Acid is inherently extensible and capable of
dynamic interaction with subordinate processes, the
programmer can define Acid commands to detect and control
complex interactions between processes.
In short, the programmer is free to specify how the debugger reacts
to events generated in specific threads of the program.
.PP
The support for parallel debugging in Acid depends on a crucial kernel
modification: when the text segment of a program is written (usually to
place a breakpoint), the segment is cloned to prevent other threads
from encountering the breakpoint. Although this incurs a slight performance
penalty, it is of little importance while debugging.
.NH
Communication Between Tools
.PP
The Plan 9 Alef and C compilers do not
embed detailed type information in the symbol table of an
executable file.
However, they do accept a command line option causing them to
emit descriptions of complex data types
(e.g., aggregates and abstract data types)
to an auxiliary file.
The vehicle for expressing this information is Acid source code.
When an Acid debugging session is
subsequently started, that file is loaded with the other Acid libraries.
.PP
For each complex object in the program the compiler generates
three pieces of Acid code.
The first is a table describing the size and offset of each
member of the complex data type. Following is an Acid function,
named the same as the object, that formats and prints each member.
Finally, Acid declarations associate the
Alef or C program variables of a type with the functions
to print them.
The three forms of declaration are shown in the following example:
.P1
struct Bitmap {
Rectangle 0 r;
Rectangle 16 clipr;
'D' 32 ldepth;
'D' 36 id;
'X' 40 cache;
};
.P2
.P1
defn
Bitmap(addr) {
complex Bitmap addr;
print("Rectangle r {\en");
Rectangle(addr.r);
print("}\en");
print("Rectangle clipr {\en");
Rectangle(addr.clipr);
print("}\en");
print(" ldepth ", addr.ldepth, "\en");
print(" id ", addr.id, "\en");
print(" cache ", addr.cache, "\en");
};
complex Bitmap darkgrey;
complex Bitmap Window_settag:b;
.P2
The
.CW struct
declaration specifies decoding instructions for the complex type named
.CW Bitmap .
Although the syntax is superficially similar to a C structure declaration,
the semantics differ markedly: the C declaration specifies a layout, while
the Acid declaration tells how to decode it.
The declaration specifies a type, an offset, and name for each
member of the complex object. The type is either the name of another
complex declaration, for example,
.CW Rectangle ,
or a format code.
The offset is the number of bytes from the start
of the object to the member
and the name is the member's name in the Alef or C declaration.
This type description is a close match for C and Alef, but is simple enough
to be language independent.
.PP
The
.CW Bitmap
function expects the address of a
.CW Bitmap
as its only argument.
It uses the decoding information contained in the
.CW Bitmap
structure declaration to extract, format, and print the
value of each member of the complex object pointed to by
the argument.
The Alef compiler emits code to call other Acid functions
where a member is another complex type; here,
.CW Bitmap
calls
.CW Rectangle
to print its contents.
.PP
The
.CW complex
declarations associate Alef variables with complex types.
In the example,
.CW darkgrey
is the name of a global variable of type
.CW Bitmap
in the program being debugged.
Whenever the name
.CW darkgrey
is evaluated by Acid, it automatically calls the
.CW Bitmap
function with the address of
.CW darkgrey
as the argument.
The second
.CW complex
declaration associates a local variable or parameter named
.CW b
in function
.CW Window_settag
with the
.CW Bitmap
complex data type.
.PP
Acid borrows the C operators
.CW .
and
.CW ->
to access the decoding parameters of a member of a complex type.
Although this representation is sufficiently general for describing
the decoding of both C and Alef complex data types, it may
prove too restrictive for target languages with more complicated
type systems.
Further, the assumption that the compiler can select the proper
Acid format code for each basic type in the language is somewhat
naive. For example, when a member of a complex type is a pointer,
it is assigned a hexadecimal type code; integer members are always
assigned a decimal type code.
This heuristic proves inaccurate when an integer field is a
bit mask or set of bit flags which are more appropriately displayed
in hexadecimal or octal.
.NH
Code Verification
.PP
Acid's ability to interact dynamically with
an executing program allows passive test and
verification of the target program. For example,
a common concern is leak detection in programs using
.CW malloc .
Of interest are two items: finding memory that was allocated
but never freed and detecting bad pointers passed to
.CW free .
An auxiliary Acid library contains Acid functions to
monitor the execution of a program and detect these
faults, either as they happen or in the automated
post-mortem analysis of the memory arena.
In the following example, the
.CW sort
command is run under the control of the
Acid memory leak library.
.P1
helix% acid -l malloc /bin/sort
/bin/sort: mips plan 9 executable
/lib/acid/port
/lib/acid/mips
/lib/acid/malloc
acid: go()
now
is
the
time
<ctrl-d>
is
now
the
time
27680 : breakpoint _exits+0x4 MOVW $0x8,R1
acid:
.P2
The
.CW go
command creates a process and plants
breakpoints at the entry to
.CW malloc
and
.CW free .
The program is then started and continues until it
exits or stops. If the reason for stopping is anything
other than the breakpoints in
.CW malloc
and
.CW free ,
Acid prints the usual status information and returns to the
interactive prompt.
.PP
When the process stops on entering
.CW malloc ,
the debugger must capture and save the address that
.CW malloc
will return.
After saving a stack
trace so the calling routine can be identified, it places
a breakpoint at the return address and restarts the program.
When
.CW malloc
returns, the breakpoint stops the program,
allowing the debugger
to grab the address of the new memory block from the return register.
The address and stack trace are added to the list of outstanding
memory blocks, the breakpoint is removed from the return point, and
the process is restarted.
.PP
When the process stops at the beginning of
.CW free ,
the memory address supplied as the argument is compared to the list
of outstanding memory blocks. If it is not found an error message
and a stack trace of the call is reported; otherwise, the
address is deleted from the list.
.PP
When the program exits, the list of outstanding memory blocks contains
the addresses of all blocks that were allocated but never freed.
The
.CW leak
library function traverses the list producing a report describing
the allocated blocks.
.P1 1m
acid: leak()
Lost a total of 524288 bytes from:
malloc() malloc.c:32 called from dofile+0xe8 sort.c:217
dofile() sort.c:190 called from main+0xac sort.c:161
main() sort.c:128 called from _main+0x20 main9.s:10
Lost a total of 64 bytes from:
malloc() malloc.c:32 called from newline+0xfc sort.c:280
newline() sort.c:248 called from dofile+0x110 sort.c:222
dofile() sort.c:190 called from main+0xac sort.c:161
main() sort.c:128 called from _main+0x20 main9.s:10
Lost a total of 64 bytes from:
malloc() malloc.c:32 called from realloc+0x14 malloc.c:129
realloc() malloc.c:123 called from bldkey+0x358 sort.c:1388
buildkey() sort.c:1345 called from newline+0x150 sort.c:285
newline() sort.c:248 called from dofile+0x110 sort.c:222
dofile() sort.c:190 called from main+0xac sort.c:161
main() sort.c:128 called from _main+0x20 main9.s:10
acid: refs()
data...bss...stack...
acid: leak()
acid:
.P2
The presence of a block in the allocation list does not imply
it is there because of a leak; for instance, it may have been
in use when the program terminated.
The
.CW refs()
library function scans the
.I data ,
.I bss ,
and
.I stack
segments of the process looking for pointers
into the allocated blocks. When one is found, the block is deleted from
the outstanding block list.
The
.CW leak
function is used again to report the
blocks remaining allocated and unreferenced.
This strategy proves effective in detecting
disconnected (but non-circular) data structures.
.PP
The leak detection process is entirely passive.
The program is not
specially compiled and the source code is not required.
As with the Acid support functions for the Alef runtime environment,
the author of the library routines has encapsulated the
functionality of the library interface
in Acid code.
Any programmer may then check a program's use of the
library routines without knowledge of either implementation.
The performance impact of running leak detection is great
(about 10 times slower),
but it has not prevented interactive programs like
.CW sam
and the
.CW 8½
window system from being tested.
.NH
Code Coverage
.PP
Another common component of software test uses
.I coverage
analysis.
The purpose of the test is to determine which paths through the code have
not been executed while running the test suite.
This is usually
performed by a combination of compiler support and a reporting tool run
on the output generated by statements compiled into the program.
The compiler emits code that
logs the progress of the program as it executes basic blocks and writes the
results to a file. The file is then processed by the reporting tool
to determine which basic blocks have not been executed.
.PP
Acid can perform the same function in a language independent manner without
modifying the source, object or binary of the program. The following example
shows
.CW ls
being run under the control of the Acid coverage library.
.P1
philw-helix% acid -l coverage /bin/ls
/bin/ls: mips plan 9 executable
/lib/acid/port
/lib/acid/mips
/lib/acid/coverage
acid: coverage()
acid
newstime
profile
tel
wintool
2: (error) msg: pid=11419 startstop: process exited
acid: analyse(ls)
ls.c:102,105
102: return 1;
103: }
104: if(db[0].qid.path&CHDIR && dflag==0){
105: output();
ls.c:122,126
122: memmove(dirbuf+ndir, db, sizeof(Dir));
123: dirbuf[ndir].prefix = 0;
124: p = utfrrune(s, '/');
125: if(p){
126: dirbuf[ndir].prefix = s;
.P2
The
.CW coverage
function begins by looping through the text segment placing
breakpoints at the entry to each basic block. The start of each basic
block is found using the Acid builtin function
.CW follow .
If the list generated by
.CW follow
contains more than one
element, then the addresses mark the start of basic blocks. A breakpoint
is placed at each address to detect entry into the block. If the result
of
.CW follow
is a single address then no action is taken, and the next address is
considered. Acid maintains a list of
breakpoints already in place and avoids placing duplicates (an address may be
the destination of several branches).
.PP
After placing the breakpoints the program is set running.
Each time a breakpoint is encountered
Acid deletes the address from the breakpoint list, removes the breakpoint
from memory and then restarts the program.
At any instant the breakpoint list contains the addresses of basic blocks
which have not been executed.
The
.CW analyse
function reports the lines of source code bounded by basic blocks
whose addresses are have not been deleted from the breakpoint list.
These are the basic blocks which have not been executed.
Program performance is almost unaffected since each breakpoint is executed
only once and then removed.
.PP
The library contains a total of 128 lines of Acid code.
An obvious extension of this algorithm could be used to provide basic block
profiling.
.NH
Conclusion
.PP
Acid has two areas of weakness. As with
other language-based tools like
.I awk ,
a programmer must learn yet another language to step beyond the normal
debugging functions and use the full power of the debugger.
Second, the command line interface supplied by the
.I yacc
parser is inordinately clumsy.
Part of the problem relates directly to the use of
.I yacc
and could be circumvented with a custom parser.
However, structural problems would remain: Acid often requires
too much typing to execute a simple
command.
A debugger should prostitute itself to its users, doing whatever
is wanted with a minimum of encouragement; commands should be
concise and obvious. The language interface is more consistent than
an ad hoc command interface but is clumsy to use.
Most of these problems are addressed by an Acme interface
which is under construction. This should provide the best of
both worlds: graphical debugging and access to the underlying acid
language when required.
.PP
The name space clash between Acid variables, keywords, program variables,
and functions is unavoidable.
Although it rarely affects a debugging session, it is annoying
when it happens and is sometimes difficult to circumvent.
The current renaming scheme
is too crude; the new names are too hard to remember.
.PP
Acid has proved to be a powerful tool whose applications
have exceeded expectations.
Of its strengths, portability, extensibility and parallel debugging support
were by design and provide the expected utility.
In retrospect,
its use as a tool for code test and verification and as
a medium for communicating type information and encapsulating
interfaces has provided unanticipated benefits and altered our
view of the debugging process.
.NH
Acknowledgments
.PP
Bob Flandrena was the first user and helped prepare the paper.
Rob Pike endured three buggy Alef compilers and a new debugger
in a single sitting.
.NH
References
.LP
[Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey,
``Plan 9 from Bell Labs'',
.I
UKUUG Proc. of the Summer 1990 Conf.,
.R
London, England,
1990,
reprinted, in a different form, in this volume.
.LP
[Gol93] M. Golan, D. Hanson,
``DUEL -- A Very High-Level Debugging Language'',
.I
USENIX Proc. of the Winter 1993 Conf.,
.R
San Diego, CA,
1993.
.LP
[Lin90] M. A. Linton,
``The Evolution of DBX'',
.I
USENIX Proc. of the Summer 1990 Conf.,
.R
Anaheim, CA,
1990.
.LP
[Stal91] R. M. Stallman, R. H. Pesch,
``Using GDB: A guide to the GNU source level debugger'',
Technical Report, Free Software Foundation,
Cambridge, MA,
1991.
.LP
[Win93] P. Winterbottom,
``Alef reference Manual'',
this volume.
.LP
[Pike93] Rob Pike,
``Acme: A User Interface for Programmers'',
.I
USENIX Proc. of the Winter 1994 Conf.,
.R
San Francisco, CA,
reprinted in this volume.
.LP
[Ols90] Ronald A. Olsson, Richard H. Crawford, and W. Wilson Ho,
``Dalek: A GNU, improved programmable debugger'',
.I
USENIX Proc. of the Summer 1990 Conf.,
.R
Anaheim, CA.
.LP
[May92] Paul Maybee,
``NeD: The Network Extensible Debugger''
.I
USENIX Proc. of the Summer 1992 Conf.,
.R
San Antonio, TX.
.LP
[Aral] Ziya Aral, Ilya Gertner, and Greg Schaffer,
``Efficient debugging primitives for multiprocessors'',
.I
Proceedings of the Third International Conference on Architectural
Support for Programming Languages and Operating Systems,
.R
SIGPLAN notices Nr. 22, May 1989.
|