1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
|
.HTML "How to Use the Plan 9 C Compiler
.TL
How to Use the Plan 9 C Compiler
.AU
Rob Pike
rob@plan9.bell-labs.com
.SH
Introduction
.PP
The C compiler on Plan 9 is a wholly new program; in fact
it was the first piece of software written for what would
eventually become Plan 9 from Bell Labs.
Programmers familiar with existing C compilers will find
a number of differences in both the language the Plan 9 compiler
accepts and in how the compiler is used.
.PP
The compiler is really a set of compilers, one for each
architecture \(em MIPS, SPARC, Motorola 68020, Intel 386, etc. \(em
that accept a dialect of ANSI C and efficiently produce
fairly good code for the target machine.
There is a packaging of the compiler that accepts strict ANSI C for
a POSIX environment, but this document focuses on the
native Plan 9 environment, that in which all the system source and
almost all the utilities are written.
.SH
Source
.PP
The language accepted by the compilers is the core ANSI C language
with some modest extensions,
a greatly simplified preprocessor,
a smaller library that includes system calls and related facilities,
and a completely different structure for include files.
.PP
Official ANSI C accepts the old (K&R) style of declarations for
functions; the Plan 9 compilers
are more demanding.
Without an explicit run-time flag
.CW -B ) (
whose use is discouraged, the compilers insist
on new-style function declarations, that is, prototypes for
function arguments.
The function declarations in the libraries' include files are
all in the new style so the interfaces are checked at compile time.
For C programmers who have not yet switched to function prototypes
the clumsy syntax may seem repellent but the payoff in stronger typing
is substantial.
Those who wish to import existing software to Plan 9 are urged
to use the opportunity to update their code.
.PP
The compilers include an integrated preprocessor that accepts the familiar
.CW #include ,
.CW #define
for macros both with and without arguments,
.CW #undef ,
.CW #line ,
.CW #ifdef ,
.CW #ifndef ,
and
.CW #endif .
It
supports neither
.CW #if
nor
.CW ## ,
although it does
honor a few
.CW #pragmas .
The
.CW #if
directive was omitted because it greatly complicates the
preprocessor, is never necessary, and is usually abused.
Conditional compilation in general makes code hard to understand;
the Plan 9 source uses it sparingly.
Also, because the compilers remove dead code, regular
.CW if
statements with constant conditions are more readable equivalents to many
.CW #ifs .
To compile imported code ineluctably fouled by
.CW #if
there is a separate command,
.CW /bin/cpp ,
that implements the complete ANSI C preprocessor specification.
.PP
Include files fall into two groups: machine-dependent and machine-independent.
The machine-independent files occupy the directory
.CW /sys/include ;
the others are placed in a directory appropriate to the machine, such as
.CW /mips/include .
The compiler searches for include files
first in the machine-dependent directory and then
in the machine-independent directory.
At the time of writing there are thirty-one machine-independent include
files and two (per machine) machine-dependent ones:
.CW <ureg.h>
and
.CW <u.h> .
The first describes the layout of registers on the system stack,
for use by the debugger.
The second defines some
architecture-dependent types such as
.CW jmp_buf
for
.CW setjmp
and the
.CW va_arg
and
.CW va_list
macros for handling arguments to variadic functions,
as well as a set of
.CW typedef
abbreviations for
.CW unsigned
.CW short
and so on.
.PP
Here is an excerpt from
.CW /68020/include/u.h :
.P1
#define nil ((void*)0)
typedef unsigned short ushort;
typedef unsigned char uchar;
typedef unsigned long ulong;
typedef unsigned int uint;
typedef signed char schar;
typedef long long vlong;
typedef long jmp_buf[2];
#define JMPBUFSP 0
#define JMPBUFPC 1
#define JMPBUFDPC 0
.P2
Plan 9 programs use
.CW nil
for the name of the zero-valued pointer.
The type
.CW vlong
is the largest integer type available; on most architectures it
is a 64-bit value.
A couple of other types in
.CW <u.h>
are
.CW u32int ,
which is guaranteed to have exactly 32 bits (a possibility on all the supported architectures) and
.CW mpdigit ,
which is used by the multiprecision math package
.CW <mp.h> .
The
.CW #define
constants permit an architecture-independent (but compiler-dependent)
implementation of stack-switching using
.CW setjmp
and
.CW longjmp .
.PP
Every Plan 9 C program begins
.P1
#include <u.h>
.P2
because all the other installed header files use the
.CW typedefs
declared in
.CW <u.h> .
.PP
In strict ANSI C, include files are grouped to collect related functions
in a single file: one for string functions, one for memory functions,
one for I/O, and none for system calls.
Each include file is protected by an
.CW #ifdef
to guarantee its contents are seen by the compiler only once.
Plan 9 takes a different approach. Other than a few include
files that define external formats such as archives, the files in
.CW /sys/include
correspond to
.I libraries.
If a program is using a library, it includes the corresponding header.
The default C library comprises string functions, memory functions, and
so on, largely as in ANSI C, some formatted I/O routines,
plus all the system calls and related functions.
To use these functions, one must
.CW #include
the file
.CW <libc.h> ,
which in turn must follow
.CW <u.h> ,
to define their prototypes for the compiler.
Here is the complete source to the traditional first C program:
.P1
#include <u.h>
#include <libc.h>
void
main(void)
{
print("hello world\en");
exits(0);
}
.P2
The
.CW print
routine and its relatives
.CW fprint
and
.CW sprint
resemble the similarly-named functions in Standard I/O but are not
attached to a specific I/O library.
In Plan 9
.CW main
is not integer-valued; it should call
.CW exits ,
which takes a string argument (or null; here ANSI C promotes the 0 to a
.CW char* ).
All these functions are, of course, documented in the Programmer's Manual.
.PP
To use
.CW printf ,
.CW <stdio.h>
must be included to define the function prototype for
.CW printf :
.P1
#include <u.h>
#include <libc.h>
#include <stdio.h>
void
main(int argc, char *argv[])
{
printf("%s: hello world; argc = %d\en", argv[0], argc);
exits(0);
}
.P2
In practice, Standard I/O is not used much in Plan 9. I/O libraries are
discussed in a later section of this document.
.PP
There are libraries for handling regular expressions, raster graphics,
windows, and so on, and each has an associated include file.
The manual for each library states which include files are needed.
The files are not protected against multiple inclusion and themselves
contain no nested
.CW #includes .
Instead the
programmer is expected to sort out the requirements
and to
.CW #include
the necessary files once at the top of each source file. In practice this is
trivial: this way of handling include files is so straightforward
that it is rare for a source file to contain more than half a dozen
.CW #includes .
.PP
The compilers do their own register allocation so the
.CW register
keyword is ignored.
For different reasons,
.CW volatile
and
.CW const
are also ignored.
.PP
To make it easier to share code with other systems, Plan 9 has a version
of the compiler,
.CW pcc ,
that provides the standard ANSI C preprocessor, headers, and libraries
with POSIX extensions.
.CW Pcc
is recommended only
when broad external portability is mandated. It compiles slower,
produces slower code (it takes extra work to simulate POSIX on Plan 9),
eliminates those parts of the Plan 9 interface
not related to POSIX, and illustrates the clumsiness of an environment
designed by committee.
.CW Pcc
is described in more detail in
.I
APE\(emThe ANSI/POSIX Environment,
.R
by Howard Trickey.
.SH
Process
.PP
Each CPU architecture supported by Plan 9 is identified by a single,
arbitrary, alphanumeric character:
.CW k
for SPARC,
.CW q
for Motorola Power PC 630 and 640,
.CW v
for MIPS,
.CW 0
for little-endian MIPS,
.CW 1
for Motorola 68000,
.CW 2
for Motorola 68020 and 68040,
.CW 5
for Acorn ARM 7500,
.CW 6
for AMD 64,
.CW 7
for DEC Alpha,
.CW 8
for Intel 386, and
.CW 9
for AMD 29000.
The character labels the support tools and files for that architecture.
For instance, for the 68020 the compiler is
.CW 2c ,
the assembler is
.CW 2a ,
the link editor/loader is
.CW 2l ,
the object files are suffixed
.CW \&.2 ,
and the default name for an executable file is
.CW 2.out .
Before we can use the compiler we therefore need to know which
machine we are compiling for.
The next section explains how this decision is made; for the moment
assume we are building 68020 binaries and make the mental substitution for
.CW 2
appropriate to the machine you are actually using.
.PP
To convert source to an executable binary is a two-step process.
First run the compiler,
.CW 2c ,
on the source, say
.CW file.c ,
to generate an object file
.CW file.2 .
Then run the loader,
.CW 2l ,
to generate an executable
.CW 2.out
that may be run (on a 680X0 machine):
.P1
2c file.c
2l file.2
2.out
.P2
The loader automatically links with whatever libraries the program
needs, usually including the standard C library as defined by
.CW <libc.h> .
Of course the compiler and loader have lots of options, both familiar and new;
see the manual for details.
The compiler does not generate an executable automatically;
the output of the compiler must be given to the loader.
Since most compilation is done under the control of
.CW mk
(see below), this is rarely an inconvenience.
.PP
The distribution of work between the compiler and loader is unusual.
The compiler integrates preprocessing, parsing, register allocation,
code generation and some assembly.
Combining these tasks in a single program is part of the reason for
the compiler's efficiency.
The loader does instruction selection, branch folding,
instruction scheduling,
and writes the final executable.
There is no separate C preprocessor and no assembler in the usual pipeline.
Instead the intermediate object file
(here a
.CW \&.2
file) is a type of binary assembly language.
The instructions in the intermediate format are not exactly those in
the machine. For example, on the 68020 the object file may specify
a MOVE instruction but the loader will decide just which variant of
the MOVE instruction \(em MOVE immediate, MOVE quick, MOVE address,
etc. \(em is most efficient.
.PP
The assembler,
.CW 2a ,
is just a translator between the textual and binary
representations of the object file format.
It is not an assembler in the traditional sense. It has limited
macro capabilities (the same as the integral C preprocessor in the compiler),
clumsy syntax, and minimal error checking. For instance, the assembler
will accept an instruction (such as memory-to-memory MOVE on the MIPS) that the
machine does not actually support; only when the output of the assembler
is passed to the loader will the error be discovered.
The assembler is intended only for writing things that need access to instructions
invisible from C,
such as the machine-dependent
part of an operating system;
very little code in Plan 9 is in assembly language.
.PP
The compilers take an option
.CW -S
that causes them to print on their standard output the generated code
in a format acceptable as input to the assemblers.
This is of course merely a formatting of the
data in the object file; therefore the assembler is just
an
ASCII-to-binary converter for this format.
Other than the specific instructions, the input to the assemblers
is largely architecture-independent; see
``A Manual for the Plan 9 Assembler'',
by Rob Pike,
for more information.
.PP
The loader is an integral part of the compilation process.
Each library header file contains a
.CW #pragma
that tells the loader the name of the associated archive; it is
not necessary to tell the loader which libraries a program uses.
The C run-time startup is found, by default, in the C library.
The loader starts with an undefined
symbol,
.CW _main ,
that is resolved by pulling in the run-time startup code from the library.
(The loader undefines
.CW _mainp
when profiling is enabled, to force loading of the profiling start-up
instead.)
.PP
Unlike its counterpart on other systems, the Plan 9 loader rearranges
data to optimize access. This means the order of variables in the
loaded program is unrelated to its order in the source.
Most programs don't care, but some assume that, for example, the
variables declared by
.P1
int a;
int b;
.P2
will appear at adjacent addresses in memory. On Plan 9, they won't.
.SH
Heterogeneity
.PP
When the system starts or a user logs in the environment is configured
so the appropriate binaries are available in
.CW /bin .
The configuration process is controlled by an environment variable,
.CW $cputype ,
with value such as
.CW mips ,
.CW 68020 ,
.CW 386 ,
or
.CW sparc .
For each architecture there is a directory in the root,
with the appropriate name,
that holds the binary and library files for that architecture.
Thus
.CW /mips/lib
contains the object code libraries for MIPS programs,
.CW /mips/include
holds MIPS-specific include files, and
.CW /mips/bin
has the MIPS binaries.
These binaries are attached to
.CW /bin
at boot time by binding
.CW /$cputype/bin
to
.CW /bin ,
so
.CW /bin
always contains the correct files.
.PP
The MIPS compiler,
.CW vc ,
by definition
produces object files for the MIPS architecture,
regardless of the architecture of the machine on which the compiler is running.
There is a version of
.CW vc
compiled for each architecture:
.CW /mips/bin/vc ,
.CW /68020/bin/vc ,
.CW /sparc/bin/vc ,
and so on,
each capable of producing MIPS object files regardless of the native
instruction set.
If one is running on a SPARC,
.CW /sparc/bin/vc
will compile programs for the MIPS;
if one is running on machine
.CW $cputype ,
.CW /$cputype/bin/vc
will compile programs for the MIPS.
.PP
Because of the bindings that assemble
.CW /bin ,
the shell always looks for a command, say
.CW date ,
in
.CW /bin
and automatically finds the file
.CW /$cputype/bin/date .
Therefore the MIPS compiler is known as just
.CW vc ;
the shell will invoke
.CW /bin/vc
and that is guaranteed to be the version of the MIPS compiler
appropriate for the machine running the command.
Regardless of the architecture of the compiling machine,
.CW /bin/vc
is
.I always
the MIPS compiler.
.PP
Also, the output of
.CW vc
and
.CW vl
is completely independent of the machine type on which they are executed:
.CW \&.v
files compiled (with
.CW vc )
on a SPARC may be linked (with
.CW vl )
on a 386.
(The resulting
.CW v.out
will run, of course, only on a MIPS.)
Similarly, the MIPS libraries in
.CW /mips/lib
are suitable for loading with
.CW vl
on any machine; there is only one set of MIPS libraries, not one
set for each architecture that supports the MIPS compiler.
.SH
Heterogeneity and \f(CWmk\fP
.PP
Most software on Plan 9 is compiled under the control of
.CW mk ,
a descendant of
.CW make
that is documented in the Programmer's Manual.
A convention used throughout the
.CW mkfiles
makes it easy to compile the source into binary suitable for any architecture.
.PP
The variable
.CW $cputype
is advisory: it reports the architecture of the current environment, and should
not be modified. A second variable,
.CW $objtype ,
is used to set which architecture is being
.I compiled
for.
The value of
.CW $objtype
can be used by a
.CW mkfile
to configure the compilation environment.
.PP
In each machine's root directory there is a short
.CW mkfile
that defines a set of macros for the compiler, loader, etc.
Here is
.CW /mips/mkfile :
.P1
</sys/src/mkfile.proto
CC=vc
LD=vl
O=v
AS=va
.P2
The line
.P1
</sys/src/mkfile.proto
.P2
causes
.CW mk
to include the file
.CW /sys/src/mkfile.proto ,
which contains general definitions:
.P1
#
# common mkfile parameters shared by all architectures
#
OS=v486xq7
CPUS=mips 386 power alpha
CFLAGS=-FVw
LEX=lex
YACC=yacc
MK=/bin/mk
.P2
.CW CC
is obviously the compiler,
.CW AS
the assembler, and
.CW LD
the loader.
.CW O
is the suffix for the object files and
.CW CPUS
and
.CW OS
are used in special rules described below.
.PP
Here is a
.CW mkfile
to build the installed source for
.CW sam :
.P1
</$objtype/mkfile
OBJ=sam.$O address.$O buffer.$O cmd.$O disc.$O error.$O \e
file.$O io.$O list.$O mesg.$O moveto.$O multi.$O \e
plan9.$O rasp.$O regexp.$O string.$O sys.$O xec.$O
$O.out: $OBJ
$LD $OBJ
install: $O.out
cp $O.out /$objtype/bin/sam
installall:
for(objtype in $CPUS) mk install
%.$O: %.c
$CC $CFLAGS $stem.c
$OBJ: sam.h errors.h mesg.h
address.$O cmd.$O parse.$O xec.$O unix.$O: parse.h
clean:V:
rm -f [$OS].out *.[$OS] y.tab.?
.P2
(The actual
.CW mkfile
imports most of its rules from other secondary files, but
this example works and is not misleading.)
The first line causes
.CW mk
to include the contents of
.CW /$objtype/mkfile
in the current
.CW mkfile .
If
.CW $objtype
is
.CW mips ,
this inserts the MIPS macro definitions into the
.CW mkfile .
In this case the rule for
.CW $O.out
uses the MIPS tools to build
.CW v.out .
The
.CW %.$O
rule in the file uses
.CW mk 's
pattern matching facilities to convert the source files to the object
files through the compiler.
(The text of the rules is passed directly to the shell,
.CW rc ,
without further translation.
See the
.CW mk
manual if any of this is unfamiliar.)
Because the default rule builds
.CW $O.out
rather than
.CW sam ,
it is possible to maintain binaries for multiple machines in the
same source directory without conflict.
This is also, of course, why the output files from the various
compilers and loaders
have distinct names.
.PP
The rest of the
.CW mkfile
should be easy to follow; notice how the rules for
.CW clean
and
.CW installall
(that is, install versions for all architectures) use other macros
defined in
.CW /$objtype/mkfile .
In Plan 9,
.CW mkfiles
for commands conventionally contain rules to
.CW install
(compile and install the version for
.CW $objtype ),
.CW installall
(compile and install for all
.CW $objtypes ),
and
.CW clean
(remove all object files, binaries, etc.).
.PP
The
.CW mkfile
is easy to use. To build a MIPS binary,
.CW v.out :
.P1
% objtype=mips
% mk
.P2
To build and install a MIPS binary:
.P1
% objtype=mips
% mk install
.P2
To build and install all versions:
.P1
% mk installall
.P2
These conventions make cross-compilation as easy to manage
as traditional native compilation.
Plan 9 programs compile and run without change on machines from
large multiprocessors to laptops. For more information about this process, see
``Plan 9 Mkfiles'',
by Bob Flandrena.
.SH
Portability
.PP
Within Plan 9, it is painless to write portable programs, programs whose
source is independent of the machine on which they execute.
The operating system is fixed and the compiler, headers and libraries
are constant so most of the stumbling blocks to portability are removed.
Attention to a few details can avoid those that remain.
.PP
Plan 9 is a heterogeneous environment, so programs must
.I expect
that external files will be written by programs on machines of different
architectures.
The compilers, for instance, must handle without confusion
object files written by other machines.
The traditional approach to this problem is to pepper the source with
.CW #ifdefs
to turn byte-swapping on and off.
Plan 9 takes a different approach: of the handful of machine-dependent
.CW #ifdefs
in all the source, almost all are deep in the libraries.
Instead programs read and write files in a defined format,
either (for low volume applications) as formatted text, or
(for high volume applications) as binary in a known byte order.
If the external data were written with the most significant
byte first, the following code reads a 4-byte integer correctly
regardless of the architecture of the executing machine (assuming
an unsigned long holds 4 bytes):
.P1
ulong
getlong(void)
{
ulong l;
l = (getchar()&0xFF)<<24;
l |= (getchar()&0xFF)<<16;
l |= (getchar()&0xFF)<<8;
l |= (getchar()&0xFF)<<0;
return l;
}
.P2
Note that this code does not `swap' the bytes; instead it just reads
them in the correct order.
Variations of this code will handle any binary format
and also avoid problems
involving how structures are padded, how words are aligned,
and other impediments to portability.
Be aware, though, that extra care is needed to handle floating point data.
.PP
Efficiency hounds will argue that this method is unnecessarily slow and clumsy
when the executing machine has the same byte order (and padding and alignment)
as the data.
The CPU cost of I/O processing
is rarely the bottleneck for an application, however,
and the gain in simplicity of porting and maintaining the code greatly outweighs
the minor speed loss from handling data in this general way.
This method is how the Plan 9 compilers, the window system, and even the file
servers transmit data between programs.
.PP
To port programs beyond Plan 9, where the system interface is more variable,
it is probably necessary to use
.CW pcc
and hope that the target machine supports ANSI C and POSIX.
.SH
I/O
.PP
The default C library, defined by the include file
.CW <libc.h> ,
contains no buffered I/O package.
It does have several entry points for printing formatted text:
.CW print
outputs text to the standard output,
.CW fprint
outputs text to a specified integer file descriptor, and
.CW sprint
places text in a character array.
To access library routines for buffered I/O, a program must
explicitly include the header file associated with an appropriate library.
.PP
The recommended I/O library, used by most Plan 9 utilities, is
.CW bio
(buffered I/O), defined by
.CW <bio.h> .
There also exists an implementation of ANSI Standard I/O,
.CW stdio .
.PP
.CW Bio
is small and efficient, particularly for buffer-at-a-time or
line-at-a-time I/O.
Even for character-at-a-time I/O, however, it is significantly faster than
the Standard I/O library,
.CW stdio .
Its interface is compact and regular, although it lacks a few conveniences.
The most noticeable is that one must explicitly define buffers for standard
input and output;
.CW bio
does not predefine them. Here is a program to copy input to output a byte
at a time using
.CW bio :
.P1
#include <u.h>
#include <libc.h>
#include <bio.h>
Biobuf bin;
Biobuf bout;
main(void)
{
int c;
Binit(&bin, 0, OREAD);
Binit(&bout, 1, OWRITE);
while((c=Bgetc(&bin)) != Beof)
Bputc(&bout, c);
exits(0);
}
.P2
For peak performance, we could replace
.CW Bgetc
and
.CW Bputc
by their equivalent in-line macros
.CW BGETC
and
.CW BPUTC
but
the performance gain would be modest.
For more information on
.CW bio ,
see the Programmer's Manual.
.PP
Perhaps the most dramatic difference in the I/O interface of Plan 9 from other
systems' is that text is not ASCII.
The format for
text in Plan 9 is a byte-stream encoding of 16-bit characters.
The character set is based on the Unicode Standard and is backward compatible with
ASCII:
characters with value 0 through 127 are the same in both sets.
The 16-bit characters, called
.I runes
in Plan 9, are encoded using a representation called
UTF,
an encoding that is becoming accepted as a standard.
(ISO calls it UTF-8;
throughout Plan 9 it's just called
UTF.)
UTF
defines multibyte sequences to
represent character values from 0 to 65535.
In
UTF,
character values up to 127 decimal, 7F hexadecimal, represent themselves,
so straight
ASCII
files are also valid
UTF.
Also,
UTF
guarantees that bytes with values 0 to 127 (NUL to DEL, inclusive)
will appear only when they represent themselves, so programs that read bytes
looking for plain ASCII characters will continue to work.
Any program that expects a one-to-one correspondence between bytes and
characters will, however, need to be modified.
An example is parsing file names.
File names, like all text, are in
UTF,
so it is incorrect to search for a character in a string by
.CW strchr(filename,
.CW c)
because the character might have a multi-byte encoding.
The correct method is to call
.CW utfrune(filename,
.CW c) ,
defined in
.I rune (2),
which interprets the file name as a sequence of encoded characters
rather than bytes.
In fact, even when you know the character is a single byte
that can represent only itself,
it is safer to use
.CW utfrune
because that assumes nothing about the character set
and its representation.
.PP
The library defines several symbols relevant to the representation of characters.
Any byte with unsigned value less than
.CW Runesync
will not appear in any multi-byte encoding of a character.
.CW Utfrune
compares the character being searched against
.CW Runesync
to see if it is sufficient to call
.CW strchr
or if the byte stream must be interpreted.
Any byte with unsigned value less than
.CW Runeself
is represented by a single byte with the same value.
Finally, when errors are encountered converting
to runes from a byte stream, the library returns the rune value
.CW Runeerror
and advances a single byte. This permits programs to find runes
embedded in binary data.
.PP
.CW Bio
includes routines
.CW Bgetrune
and
.CW Bputrune
to transform the external byte stream
UTF
format to and from
internal 16-bit runes.
Also, the
.CW %s
format to
.CW print
accepts
UTF;
.CW %c
prints a character after narrowing it to 8 bits.
The
.CW %S
format prints a null-terminated sequence of runes;
.CW %C
prints a character after narrowing it to 16 bits.
For more information, see the Programmer's Manual, in particular
.I utf (6)
and
.I rune (2),
and the paper,
``Hello world, or
Καλημέρα κόσμε, or\
\f(Jpこんにちは 世界\f1'',
by Rob Pike and
Ken Thompson;
there is not room for the full story here.
.PP
These issues affect the compiler in several ways.
First, the C source is in
UTF.
ANSI says C variables are formed from
ASCII
alphanumerics, but comments and literal strings may contain any characters
encoded in the native encoding, here
UTF.
The declaration
.P1
char *cp = "abcÿ";
.P2
initializes the variable
.CW cp
to point to an array of bytes holding the
UTF
representation of the characters
.CW abcÿ.
The type
.CW Rune
is defined in
.CW <u.h>
to be
.CW ushort ,
which is also the `wide character' type in the compiler.
Therefore the declaration
.P1
Rune *rp = L"abcÿ";
.P2
initializes the variable
.CW rp
to point to an array of unsigned short integers holding the 16-bit
values of the characters
.CW abcÿ .
Note that in both these declarations the characters in the source
that represent
.CW "abcÿ"
are the same; what changes is how those characters are represented
in memory in the program.
The following two lines:
.P1
print("%s\en", "abcÿ");
print("%S\en", L"abcÿ");
.P2
produce the same
UTF
string on their output, the first by copying the bytes, the second
by converting from runes to bytes.
.PP
In C, character constants are integers but narrowed through the
.CW char
type.
The Unicode character
.CW ÿ
has value 255, so if the
.CW char
type is signed,
the constant
.CW 'ÿ'
has value \-1 (which is equal to EOF).
On the other hand,
.CW L'ÿ'
narrows through the wide character type,
.CW ushort ,
and therefore has value 255.
.PP
Finally, although it's not ANSI C, the Plan 9 C compilers
assume any character with value above
.CW Runeself
is an alphanumeric,
so α is a legal, if non-portable, variable name.
.SH
Arguments
.PP
Some macros are defined
in
.CW <libc.h>
for parsing the arguments to
.CW main() .
They are described in
.I ARG (2)
but are fairly self-explanatory.
There are four macros:
.CW ARGBEGIN
and
.CW ARGEND
are used to bracket a hidden
.CW switch
statement within which
.CW ARGC
returns the current option character (rune) being processed and
.CW ARGF
returns the argument to the option, as in the loader option
.CW -o
.CW file .
Here, for example, is the code at the beginning of
.CW main()
in
.CW ramfs.c
(see
.I ramfs (1))
that cracks its arguments:
.P1
void
main(int argc, char *argv[])
{
char *defmnt;
int p[2];
int mfd[2];
int stdio = 0;
defmnt = "/tmp";
ARGBEGIN{
case 'i':
defmnt = 0;
stdio = 1;
mfd[0] = 0;
mfd[1] = 1;
break;
case 's':
defmnt = 0;
break;
case 'm':
defmnt = ARGF();
break;
default:
usage();
}ARGEND
.P2
.SH
Extensions
.PP
The compiler has several extensions to ANSI C, all of which are used
extensively in the system source.
First,
.I structure
.I displays
permit
.CW struct
expressions to be formed dynamically.
Given these declarations:
.P1
typedef struct Point Point;
typedef struct Rectangle Rectangle;
struct Point
{
int x, y;
};
struct Rectangle
{
Point min, max;
};
Point p, q, add(Point, Point);
Rectangle r;
int x, y;
.P2
this assignment may appear anywhere an assignment is legal:
.P1
r = (Rectangle){add(p, q), (Point){x, y+3}};
.P2
The syntax is the same as for initializing a structure but with
a leading cast.
.PP
If an
.I anonymous
.I structure
or
.I union
is declared within another structure or union, the members of the internal
structure or union are addressable without prefix in the outer structure.
This feature eliminates the clumsy naming of nested structures and,
particularly, unions.
For example, after these declarations,
.P1
struct Lock
{
int locked;
};
struct Node
{
int type;
union{
double dval;
double fval;
long lval;
}; /* anonymous union */
struct Lock; /* anonymous structure */
} *node;
void lock(struct Lock*);
.P2
one may refer to
.CW node->type ,
.CW node->dval ,
.CW node->fval ,
.CW node->lval ,
and
.CW node->locked .
Moreover, the address of a
.CW struct
.CW Node
may be used without a cast anywhere that the address of a
.CW struct
.CW Lock
is used, such as in argument lists.
The compiler automatically promotes the type and adjusts the address.
Thus one may invoke
.CW lock(node) .
.PP
Anonymous structures and unions may be accessed by type name
if (and only if) they are declared using a
.CW typedef
name.
For example, using the above declaration for
.CW Point ,
one may declare
.P1
struct
{
int type;
Point;
} p;
.P2
and refer to
.CW p.Point .
.PP
In the initialization of arrays, a number in square brackets before an
element sets the index for the initialization. For example, to initialize
some elements in
a table of function pointers indexed by
ASCII
character,
.P1
void percent(void), slash(void);
void (*func[128])(void) =
{
['%'] percent,
['/'] slash,
};
.P2
.LP
A similar syntax allows one to initialize structure elements:
.P1
Point p =
{
.y 100,
.x 200
};
.P2
These initialization syntaxes were later added to ANSI C, with the addition of an
equals sign between the index or tag and the value.
The Plan 9 compiler accepts either form.
.PP
Finally, the declaration
.P1
extern register reg;
.P2
.I this "" (
appearance of the register keyword is not ignored)
allocates a global register to hold the variable
.CW reg .
External registers must be used carefully: they need to be declared in
.I all
source files and libraries in the program to guarantee the register
is not allocated temporarily for other purposes.
Especially on machines with few registers, such as the i386,
it is easy to link accidentally with code that has already usurped
the global registers and there is no diagnostic when this happens.
Used wisely, though, external registers are powerful.
The Plan 9 operating system uses them to access per-process and
per-machine data structures on a multiprocessor. The storage class they provide
is hard to create in other ways.
.SH
The compile-time environment
.PP
The code generated by the compilers is `optimized' by default:
variables are placed in registers and peephole optimizations are
performed.
The compiler flag
.CW -N
disables these optimizations.
Registerization is done locally rather than throughout a function:
whether a variable occupies a register or
the memory location identified in the symbol
table depends on the activity of the variable and may change
throughout the life of the variable.
The
.CW -N
flag is rarely needed;
its main use is to simplify debugging.
There is no information in the symbol table to identify the
registerization of a variable, so
.CW -N
guarantees the variable is always where the symbol table says it is.
.PP
Another flag,
.CW -w ,
turns
.I on
warnings about portability and problems detected in flow analysis.
Most code in Plan 9 is compiled with warnings enabled;
these warnings plus the type checking offered by function prototypes
provide most of the support of the Unix tool
.CW lint
more accurately and with less chatter.
Two of the warnings,
`used and not set' and `set and not used', are almost always accurate but
may be triggered spuriously by code with invisible control flow,
such as in routines that call
.CW longjmp .
The compiler statements
.P1
SET(v1);
USED(v2);
.P2
decorate the flow graph to silence the compiler.
Either statement accepts a comma-separated list of variables.
Use them carefully: they may silence real errors.
For the common case of unused parameters to a function,
leaving the name off the declaration silences the warnings.
That is, listing the type of a parameter but giving it no
associated variable name does the trick.
.SH
Debugging
.PP
There are two debuggers available on Plan 9.
The first, and older, is
.CW db ,
a revision of Unix
.CW adb .
The other,
.CW acid ,
is a source-level debugger whose commands are statements in
a true programming language.
.CW Acid
is the preferred debugger, but since it
borrows some elements of
.CW db ,
notably the formats for displaying values, it is worth knowing a little bit about
.CW db .
.PP
Both debuggers support multiple architectures in a single program; that is,
the programs are
.CW db
and
.CW acid ,
not for example
.CW vdb
and
.CW vacid .
They also support cross-architecture debugging comfortably:
one may debug a 68020 binary on a MIPS.
.PP
Imagine a program has crashed mysteriously:
.P1
% X11/X
Fatal server bug!
failed to create default stipple
X 106: suicide: sys: trap: fault read addr=0x0 pc=0x00105fb8
%
.P2
When a process dies on Plan 9 it hangs in the `broken' state
for debugging.
Attach a debugger to the process by naming its process id:
.P1
% acid 106
/proc/106/text:mips plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/mips
acid:
.P2
The
.CW acid
function
.CW stk()
reports the stack traceback:
.P1
acid: stk()
At pc:0x105fb8:abort+0x24 /sys/src/ape/lib/ap/stdio/abort.c:6
abort() /sys/src/ape/lib/ap/stdio/abort.c:4
called from FatalError+#4e
/sys/src/X/mit/server/dix/misc.c:421
FatalError(s9=#e02, s8=#4901d200, s7=#2, s6=#72701, s5=#1,
s4=#7270d, s3=#6, s2=#12, s1=#ff37f1c, s0=#6, f=#7270f)
/sys/src/X/mit/server/dix/misc.c:416
called from gnotscreeninit+#4ce
/sys/src/X/mit/server/ddx/gnot/gnot.c:792
gnotscreeninit(snum=#0, sc=#80db0)
/sys/src/X/mit/server/ddx/gnot/gnot.c:766
called from AddScreen+#16e
/n/bootes/sys/src/X/mit/server/dix/main.c:610
AddScreen(pfnInit=0x0000129c,argc=0x00000001,argv=0x7fffffe4)
/sys/src/X/mit/server/dix/main.c:530
called from InitOutput+0x80
/sys/src/X/mit/server/ddx/brazil/brddx.c:522
InitOutput(argc=0x00000001,argv=0x7fffffe4)
/sys/src/X/mit/server/ddx/brazil/brddx.c:511
called from main+0x294
/sys/src/X/mit/server/dix/main.c:225
main(argc=0x00000001,argv=0x7fffffe4)
/sys/src/X/mit/server/dix/main.c:136
called from _main+0x24
/sys/src/ape/lib/ap/mips/main9.s:8
.P2
The function
.CW lstk()
is similar but
also reports the values of local variables.
Note that the traceback includes full file names; this is a boon to debugging,
although it makes the output much noisier.
.PP
To use
.CW acid
well you will need to learn its input language; see the
``Acid Manual'',
by Phil Winterbottom,
for details. For simple debugging, however, the information in the manual page is
sufficient. In particular, it describes the most useful functions
for examining a process.
.PP
The compiler does not place
information describing the types of variables in the executable,
but a compile-time flag provides crude support for symbolic debugging.
The
.CW -a
flag to the compiler suppresses code generation
and instead emits source text in the
.CW acid
language to format and display data structure types defined in the program.
The easiest way to use this feature is to put a rule in the
.CW mkfile :
.P1
syms: main.$O
$CC -a main.c > syms
.P2
Then from within
.CW acid ,
.P1
acid: include("sourcedirectory/syms")
.P2
to read in the relevant definitions.
(For multi-file source, you need to be a little fancier;
see
.I 2c (1)).
This text includes, for each defined compound
type, a function with that name that may be called with the address of a structure
of that type to display its contents.
For example, if
.CW rect
is a global variable of type
.CW Rectangle ,
one may execute
.P1
Rectangle(*rect)
.P2
to display it.
The
.CW *
(indirection) operator is necessary because
of the way
.CW acid
works: each global symbol in the program is defined as a variable by
.CW acid ,
with value equal to the
.I address
of the symbol.
.PP
Another common technique is to write by hand special
.CW acid
code to define functions to aid debugging, initialize the debugger, and so on.
Conventionally, this is placed in a file called
.CW acid
in the source directory; it has a line
.P1
include("sourcedirectory/syms");
.P2
to load the compiler-produced symbols. One may edit the compiler output directly but
it is wiser to keep the hand-generated
.CW acid
separate from the machine-generated.
.PP
To make things simple, the default rules in the system
.CW mkfiles
include entries to make
.CW foo.acid
from
.CW foo.c ,
so one may use
.CW mk
to automate the production of
.CW acid
definitions for a given C source file.
.PP
There is much more to say here. See
.CW acid
manual page, the reference manual, or the paper
``Acid: A Debugger Built From A Language'',
also by Phil Winterbottom.
|