diff options
author | cinap_lenrek <cinap_lenrek@felloff.net> | 2017-03-12 17:15:03 +0100 |
---|---|---|
committer | cinap_lenrek <cinap_lenrek@felloff.net> | 2017-03-12 17:15:03 +0100 |
commit | 963cfc9a6f6e721f52aa949e6d1af0c3e8dc2ecc (patch) | |
tree | 749b74875dbc49bcf6ed0776648b8f0ef9417407 /sys/doc | |
parent | 8177d20fb2709ba9290dfd41308b8e5bee4e00f8 (diff) |
merging erik quanstros nupas
Diffstat (limited to 'sys/doc')
-rw-r--r-- | sys/doc/mkfile | 1 | ||||
-rw-r--r-- | sys/doc/nupas/macros.ms | 92 | ||||
-rw-r--r-- | sys/doc/nupas/mkfile | 20 | ||||
-rw-r--r-- | sys/doc/nupas/nupas.ms | 354 |
4 files changed, 467 insertions, 0 deletions
diff --git a/sys/doc/mkfile b/sys/doc/mkfile index 0226f35c0..0775878bc 100644 --- a/sys/doc/mkfile +++ b/sys/doc/mkfile @@ -40,6 +40,7 @@ ALL=\ spin\ port\ colophon\ + nupas/nupas\ ALLPS=${ALL:%=%.ps} HTML=${ALL:%=%.html} release3.html release4.html diff --git a/sys/doc/nupas/macros.ms b/sys/doc/nupas/macros.ms new file mode 100644 index 000000000..330626da3 --- /dev/null +++ b/sys/doc/nupas/macros.ms @@ -0,0 +1,92 @@ +.de F1 +.nr OI \\n(.iu +.nr PW 1v +.KF +.sp 0.3v +.. +.de T1 +.F1 +.. +.de F2 +.ds Fp Figure\ \\n(Fi +.ds Fn Figure\ \\n+(Fi +.ds Fq \\*(Fp +.F0 +.. +.de T2 +.ds Tp Table\ \\n(Ti +.ds Tn Table\ \\n+(Ti +.ds Tq \\*(Tp +.T0 +.. +.de F0 +.nr BD 1 +.if t .ps \\n(PS-1 +.ie \\n(VS>=41 .vs \\n(VSu-1p +.el .vs \\n(VSp-1p +.ft 1 +.di DD +.ll \\n(.lu*3u/4u +.in 0 +.fi +.ad b +.sp 0.5v +\f3\\*(Fq\f1\ \ \c +.. +.de T0 +.nr BD 1 +.if t .ps \\n(PS-1 +.ie \\n(VS>=41 .vs \\n(VSu-1p +.el .vs \\n(VSp-1p +.ft 1 +.di DD +.ll \\n(.lu*3u/4u +.in 0 +.fi +.ad b +.sp 0.5v +\f3\\*(Tq\f1\ \ \c +.. +.de F3 +.sp 0.5v +.di +.br +.ll \\n(.lu*4u/3u +.if \\n(dl>\\n(BD .nr BD \\n(dl +.if \\n(BD<\\n(.l .in (\\n(.lu-\\n(BDu)/2u +.nf +.DD +.in \\n(OIu +.nr BD 0 +.fi +.KE +.ie \\n(VS>=41 .vs \\n(VSu +.el .vs \\n(VSp +.. +.de T3 +.F3 +.. +.de EX +.P1 +\s-4 +.. +.de EE +\s+4 +.P2 +.. +.nr Fi 1 +1 +.nr Ti 1 +1 +.ds Fn Figure\ \\n(Fi +.ds Tn Table\ \\n(Ti +.nr XP 2 \" delta point size for program +.nr XV 2p \" delta vertical for programs +.nr XT 4 \" delta tab stop for programs +.nr DV .5v \" space before start of program +.FP lucidasans +.nr PS 11 +.nr VS 13 +.nr LL 6.6i +.nr PI 0 \" paragraph indent +.nr PD 4p \" extra space between paragraphs +.pl 11i +.rm CH diff --git a/sys/doc/nupas/mkfile b/sys/doc/nupas/mkfile new file mode 100644 index 000000000..85f629785 --- /dev/null +++ b/sys/doc/nupas/mkfile @@ -0,0 +1,20 @@ +TARG=nupas.ps nupas.pdf + +all:V: $TARG + +%.ps:DQ: %.ms + eval `{doctype macros.ms $stem.ms} | \ + lp -m.9 -dstdout >$target + +%.pdf:DQ: %.ps + cat ../docfonts $stem.ps >_$stem.ps + ps2pdf _$stem.ps $stem.pdf && rm -f _$stem.ps + +%.show:VQ: %.ps + page -w $stem.ps + +#install:VQ: fs4.man fs.man fsrecover.man fsconfig.man +# cp fs4.man /sys/man/4/fs +# cp fs.man /sys/man/8/fs +# cp fsconfig.man /sys/man/8/fsconfig +# cp fsrecover.man /sys/man/8/fsrecover diff --git a/sys/doc/nupas/nupas.ms b/sys/doc/nupas/nupas.ms new file mode 100644 index 000000000..a00b60eca --- /dev/null +++ b/sys/doc/nupas/nupas.ms @@ -0,0 +1,354 @@ +.EQ +delim $$ +.EN +.TL +Scaling Upas +.AU +Erik Quanstrom +quanstro@coraid.com +.AB +The Plan 9 email system, Upas, uses traditional methods of delivery to +.UX +mail boxes while using a user-level file system, Upas/fs, to +translate mail boxes of various formats into a single, convenient format for access. +Unfortunately, it does not do so efficiently. Upas/fs +reads entire folders into core. When deleting email from mail boxes, +the entire mail box is rewritten. I describe how Upas/fs has been +adapted to use caching, indexing and a new mail box format (mdir) to +limit I/O, reduce core size and eliminate the need to rewrite mail +boxes. +.AE +.NH +Introduction +.LP +.DS I +Chained at his root two scion demons dwell +.br + – Erasmus Darwin, The Botanic Garden +.DE +.LP +At Coraid, email is the largest resource user in the system by orders +of magnitude. As of July, 2007, rewriting mail boxes was using +300MB/day on the WORM and several users required more than 400MB of +core. As of July, 2008, rewriting mail boxes was using 800MB/day on +the WORM and several users required more than 1.2GB of core to read +email. Clearly these are difficult to sustain levels of growth, even +without growth of the company. We needed to limit the amount of disk +space used and, more urgently, reduce Upas/fs' core size. +.LP +The techniques employed are simple. Mail is now stored in a directory +with one message per file. This eliminates the need to rewrite mail +boxes. Upas/fs now maintains an index which allows it to present +complete message summaries without reading indexed messages. +Combining the two techniques allows Upas/fs to read only new or +referenced messages. Finally, caching limits both the total number of +in-core messages and their total size. +.NH +Mdir Format +.LP +In addition to meeting our urgent operational requirements of reducing +memory and disk footprint, to meet the expectations of our users we +require a solution that is able to handle folders up to ten thousand +messages, open folders quickly, list the contents of folders quickly +and support the current set of mail readers. +.LP +There are several potential styles of mail boxes. The maildir[1] format +has some attractive properties. Mail can be delivered to or deleted +from a mail box without locking. New mail or deleted mail may be +detected with a directory scan. When used with WORM storage, the +amount of storage required is no more than the size of new mail +received. Mbox format can require that a new copy of the inbox be +stored every day. Even with storage that coalesces duplicate blocks +such as Venti, deleting a message will generally require new storage +since messages are not disk-block aligned. Maildir does not reduce +the cost of the common task of a summary listing of mail such as +generated by acme Mail. +.LP +The mails[2] format proposes a directory per mail. A copy of +the mail as delivered is stored and each mime part is decoded +in such a way that a mail reader could display the file directly. +Command line tools in the style of MH[3] are used to display and +process mail. Upas/fs is not necessary for reading local mail. +Mails has the potential to reduce memory footprint below that +offered by mdirs for native email reading. However all of the +largest mail boxes at our site are served exclusively through IMAP. +The preformatting by mails would be unnecessary for such accounts. +.LP +Other mail servers such as Lotus Notes[4] store email in a custom +database format which allows for fielded and full-text searching +of mail folders. Such a format provides very quick mail +listings and good search capabilities. Such a solution would not +lend itself well to a tool-based environment, nor would it be simple. +.LP +Maildir format seemed the best basic format but its particulars are +tied to the +.UX +environment; mdir is a descendant. A mdir folder +is a directory with the name of the folder. Messages in the mdir +folder are stored in a file named +.I "utime.seq" . +.I Utime +is defined as the decimal +.UX +seconds when the message was added to +the folder. For the inbox, this time will correspond to the +.UX +“From ” line. +.I Seq +is a two-digit sequence number starting with +.CW "00." +The lowest available sequence number is used to store the message. +Thus, the first email possible would be named +.CW "0.00." +To prevent accidents, message files are stored with +the append-only and exclusive-access bits turned on. +The message is stored in the same format it would be in mbox +format; each message is a valid mbox folder with a single message. +.NH +Indexing +.LP +When upas/fs finds an unindexed message, it is added to the index. +The index is a file named +.I "foldername" .idx +and consists a signature and one line per MIME part. Each line +contains the SHA1 checksum of the message (or a place holder for +subparts), one field per entry in the +.I "messageid/info" +file, flags and the number of subparts. The flags are currently a +superset of the standard IMAP flags. They provide the similar +functionality to maildir's modified file names. Thus the `S' +(answered) flag remains set between invocations of mail readers. +Other mutable information about a message may be stored in a similar +way. +.LP +Since the +.I info +file is read by all the mail readers to produce mail listings, +mail boxes may be listed without opening any mail files when no new +mail has arrived. Similarly, opening a new mail box requires reading +the index and checking new mail. Index files are typically between +0.5% and 5% the size of the full mail box. Each time the index is +generated, it is fully rewritten. +.NH +Caching +.LP +Upas/fs stores each message in a +.CW "Message" +structure. To enable caching, this structure was split +into four parts: The +.CW "Idx" +(or index), message subparts, information on the cache state of the +message and a set of pointers into the processed header and body. +Only the pointers to the processed header and body are subject to +caching. The available cache states are +.CW "Cidx" , +.CW "Cheader" +and +.CW "Cbody" . +.LP +When the header and body are not present, the average message with +subparts takes roughly 3KB of memory. Thus a 10,000 message mail box +would require roughly 30MB of core in addition to any cached +messages. Reads of the +.CW "info" +or +.CW "subject" +files can be satisfied from the information in the +.CW "Idx" +structure. +.LP +Since there are a fair number of very large messages, requests that +can be satisfied by reading the message headers do not result in the +full message being read. Reads of the +.CW "header" +or +.CW "rawheader" +files of top-level messages are satisfied in this way. Reading the +same files for subparts, however, results in the entire message being +read. Caching the header results in the +.CW "Cheader" +cache state. +.LP +Similarly, reading the +.CW "body" +requires the body to be read, processed and results in +the +.CW "Cbody" +cache state. Reading from MIME subparts also results +in the +.CW "Cbody" +cache state. +.LP +The cache has a simple LRU replacement policy. Each time a cached +member of a message is accessed, it is moved to the head of the list. +The cache contains a maximum number of messages and a maximum size. +While the maximum number of messages may not be exceeded, the maximum +cache size may be exceeded if the sum of all the currently referenced +messages is greater than the size of the cache. In this case all +unreferenced messages will be freed. When removing a message +from the cache all of the cacheable information is freed. +.NH +Collateral damage +.LP +.DS I +Each new user of a new system uncovers a new class of bugs. +.br + — Brian Kernighan +.DE +.LP +In addition to upas/fs, programs that have assumptions about how +mail boxes are structured needed to be modified. Programs which +deliver mail to mail boxes (deliver, marshal, ml, smtp) and append messages to +folders were given a common (nedmail) function to call. Since this +was done by modifying functions in the Upas common library, this +presented a problem for programs not traditionally part of Upas +such as acme Mail and imap4d. Rather than fold these programs +into Upas, a new program, mbappend, was added to Upas. +.LP +Imap4d also requires the ability to rename and remove folders. +While an external program would work for this as well, that +approach has some drawbacks. Most importantly, IMAP folders +can't be moved or renamed in the same way without reimplementing +functionality that is already in upas/fs. It also emphasises the +asymmetry between reading and deleting email and other folder +actions. Folder renaming and removal were added to upas/fs. +It is intended that mbappend will be removed soon +and replaced with equivalent upas/fs functionality — +at least for non-delivery programs. +.LP +Mdirs also expose an oddity about file permissions. An append-only +file that is mode +.CW 0622 +may be appended to by anyone, but is readable only by the owner. +With a directory, such a setup is not directly possible as write permission +to a directory implies permission to remove. There are a number of +solutions to this problem. Delivery could be made asymmetrical—incoming +files could be written to a mbox. Or, following the example of the outbound +mail queue, each user could deliver to a directory owned by that user. +In many BSD-derived +.UX +systems, the “sticky bit” on directories is used to modify +the meaning of the +.CW w +bit for users matching only the other bits. For them, the +.CW w +bit gives permission to create but not to remove. +.LP +While this is somewhat of a one-off situation, I chose to implement +a version of the “sticky bit” using the existing append-only bit on our +file server. This was implemented as an extra permission check when +removing files. Fewer than 10 lines of code were required. +.NH +Performance +.LP +A representative local mail box was used to generate some rough +performance numbers. The mail box is 110MB and contains 868 messages. +These figures are shown in table 1. In the worse case—an unindexed +mail box—the new upas/fs uses 18% of the memory of the original while +using 13% more cpu. In the best case, it uses only 5% of the memory +while using only 13% of the cpu. Clearly, a larger mail box will make +these ratios more attractive. In the two months since the snapshot was +taken, that same mail box has grown to 220MB and contains 1814 +messages. +.ps -2 +.DS C +.TS +box, tab(:); +c s s s s +c | c | c | c | c +a | n | n | n | n. +Table 1 – Performance +_ +action:user:system:real:core size: +:s:s:s:MB: +_ +old fs read:1.69:0.84:6.07:135 +_ +initial read:1.65:0.90:6.90:25 +_ +indexed read:0.64:0.03:0.77:6.5 +.TE +.DE +.NL +.NH +Future Work +.LP +While Upas' memory usage has been drastically reduced, +it is still a work-in-progress. Caching and indexing are +adequate but primitive. Upas/fs is still inconsistently +bypassed for appending messages to mail boxes. There +are also some features which remain incomplete. Finally, +the small increase in scale brings some new questions about +the organization of email. +.LP +It may be useful for mail boxes with very large numbers +of messages to divide the index into fixed-size chunks. +Then messages could be read into a fixed-sized pool of +structures as needed. However it is currently hard to +see how clients could easily interface a mail box large +enough for this technique to be useful. Currently, all +clients assume that it is reasonable to allocate an +in-core data structure for each message in a mail box. +To take advantage of a chunked index, clients (or the +server) would need a way of limiting the number of +messages considered at a time. Also, for such large +mail boxes, it would be important to separate the +incoming messages from older messages to limit the work +required to scan for new messages. +.LP +Caching is particularly unsatisfactory. Files should +be read in fixed-sized buffers so maximum memory usage +does not depend on the size of the largest file in the +mail box. Unfortunately, current data structures do not readily +support this. In practice, this limitation has not yet +been noticeable. +.LP +There are also a few features that need to be completed. +Tracking of references has been added to marshal and +upas/fs. In addition, the index provides a place to store +mutable information about a message. These capabilities +should be built upon to provide general threading and +tagging capabilities. +.NH +Speculation +.LP +Freed from the limitation that all messages in a +mail box must be read and stored in memory before a +single message may be accessed, it is interesting to +speculate on a few further possibilites. +.LP +For example, it may be +useful to replace separate mail boxes with a single +collection of messages assigned to one or more virtual +mail boxes. The association between a message and a +mail box would be a “tag.” A message could be added to +or removed from one or more mail boxes without modifying +the mdir file. If threads were implemented by tagging +each message with its references, it would be possible +to follow threads across mail boxes, even to messages +removed from all mail boxes, provided the underlying +file were not also removed. If a facility for adding +arbitrary, automatic tags were enabled, it would be +possible to tag messages with the email address in +the SMTP From line. +.NH +References +.IP [1] +D. Bernstein, “Using maildir format”, +published online at +.br +http://cr.yp.to/proto/maildir.html +.IP [2] +F. Ballesteros +.IR mails (1), +published online at +http://lsub.org/magic/man2html/1/mails +.IP [3] +MH Wikipedia entry, +http://en.wikipedia.org/wiki/MH_Message_Handling_System +.IP [4] +Lotus Notes Wikipedia entry, +http://en.wikipedia.org/wiki/Lotus_Notes +.IP [5] +D. Presotto, “Upas—a Simpler Approach to Network Mail”, +Proceedings of the 10th Usenix conference, 1985. |