summaryrefslogtreecommitdiff
path: root/sys/doc
diff options
context:
space:
mode:
authorcinap_lenrek <cinap_lenrek@felloff.net>2017-03-12 17:15:03 +0100
committercinap_lenrek <cinap_lenrek@felloff.net>2017-03-12 17:15:03 +0100
commit963cfc9a6f6e721f52aa949e6d1af0c3e8dc2ecc (patch)
tree749b74875dbc49bcf6ed0776648b8f0ef9417407 /sys/doc
parent8177d20fb2709ba9290dfd41308b8e5bee4e00f8 (diff)
merging erik quanstros nupas
Diffstat (limited to 'sys/doc')
-rw-r--r--sys/doc/mkfile1
-rw-r--r--sys/doc/nupas/macros.ms92
-rw-r--r--sys/doc/nupas/mkfile20
-rw-r--r--sys/doc/nupas/nupas.ms354
4 files changed, 467 insertions, 0 deletions
diff --git a/sys/doc/mkfile b/sys/doc/mkfile
index 0226f35c0..0775878bc 100644
--- a/sys/doc/mkfile
+++ b/sys/doc/mkfile
@@ -40,6 +40,7 @@ ALL=\
spin\
port\
colophon\
+ nupas/nupas\
ALLPS=${ALL:%=%.ps}
HTML=${ALL:%=%.html} release3.html release4.html
diff --git a/sys/doc/nupas/macros.ms b/sys/doc/nupas/macros.ms
new file mode 100644
index 000000000..330626da3
--- /dev/null
+++ b/sys/doc/nupas/macros.ms
@@ -0,0 +1,92 @@
+.de F1
+.nr OI \\n(.iu
+.nr PW 1v
+.KF
+.sp 0.3v
+..
+.de T1
+.F1
+..
+.de F2
+.ds Fp Figure\ \\n(Fi
+.ds Fn Figure\ \\n+(Fi
+.ds Fq \\*(Fp
+.F0
+..
+.de T2
+.ds Tp Table\ \\n(Ti
+.ds Tn Table\ \\n+(Ti
+.ds Tq \\*(Tp
+.T0
+..
+.de F0
+.nr BD 1
+.if t .ps \\n(PS-1
+.ie \\n(VS>=41 .vs \\n(VSu-1p
+.el .vs \\n(VSp-1p
+.ft 1
+.di DD
+.ll \\n(.lu*3u/4u
+.in 0
+.fi
+.ad b
+.sp 0.5v
+\f3\\*(Fq\f1\ \ \c
+..
+.de T0
+.nr BD 1
+.if t .ps \\n(PS-1
+.ie \\n(VS>=41 .vs \\n(VSu-1p
+.el .vs \\n(VSp-1p
+.ft 1
+.di DD
+.ll \\n(.lu*3u/4u
+.in 0
+.fi
+.ad b
+.sp 0.5v
+\f3\\*(Tq\f1\ \ \c
+..
+.de F3
+.sp 0.5v
+.di
+.br
+.ll \\n(.lu*4u/3u
+.if \\n(dl>\\n(BD .nr BD \\n(dl
+.if \\n(BD<\\n(.l .in (\\n(.lu-\\n(BDu)/2u
+.nf
+.DD
+.in \\n(OIu
+.nr BD 0
+.fi
+.KE
+.ie \\n(VS>=41 .vs \\n(VSu
+.el .vs \\n(VSp
+..
+.de T3
+.F3
+..
+.de EX
+.P1
+\s-4
+..
+.de EE
+\s+4
+.P2
+..
+.nr Fi 1 +1
+.nr Ti 1 +1
+.ds Fn Figure\ \\n(Fi
+.ds Tn Table\ \\n(Ti
+.nr XP 2 \" delta point size for program
+.nr XV 2p \" delta vertical for programs
+.nr XT 4 \" delta tab stop for programs
+.nr DV .5v \" space before start of program
+.FP lucidasans
+.nr PS 11
+.nr VS 13
+.nr LL 6.6i
+.nr PI 0 \" paragraph indent
+.nr PD 4p \" extra space between paragraphs
+.pl 11i
+.rm CH
diff --git a/sys/doc/nupas/mkfile b/sys/doc/nupas/mkfile
new file mode 100644
index 000000000..85f629785
--- /dev/null
+++ b/sys/doc/nupas/mkfile
@@ -0,0 +1,20 @@
+TARG=nupas.ps nupas.pdf
+
+all:V: $TARG
+
+%.ps:DQ: %.ms
+ eval `{doctype macros.ms $stem.ms} | \
+ lp -m.9 -dstdout >$target
+
+%.pdf:DQ: %.ps
+ cat ../docfonts $stem.ps >_$stem.ps
+ ps2pdf _$stem.ps $stem.pdf && rm -f _$stem.ps
+
+%.show:VQ: %.ps
+ page -w $stem.ps
+
+#install:VQ: fs4.man fs.man fsrecover.man fsconfig.man
+# cp fs4.man /sys/man/4/fs
+# cp fs.man /sys/man/8/fs
+# cp fsconfig.man /sys/man/8/fsconfig
+# cp fsrecover.man /sys/man/8/fsrecover
diff --git a/sys/doc/nupas/nupas.ms b/sys/doc/nupas/nupas.ms
new file mode 100644
index 000000000..a00b60eca
--- /dev/null
+++ b/sys/doc/nupas/nupas.ms
@@ -0,0 +1,354 @@
+.EQ
+delim $$
+.EN
+.TL
+Scaling Upas
+.AU
+Erik Quanstrom
+quanstro@coraid.com
+.AB
+The Plan 9 email system, Upas, uses traditional methods of delivery to
+.UX
+mail boxes while using a user-level file system, Upas/fs, to
+translate mail boxes of various formats into a single, convenient format for access.
+Unfortunately, it does not do so efficiently. Upas/fs
+reads entire folders into core. When deleting email from mail boxes,
+the entire mail box is rewritten. I describe how Upas/fs has been
+adapted to use caching, indexing and a new mail box format (mdir) to
+limit I/O, reduce core size and eliminate the need to rewrite mail
+boxes.
+.AE
+.NH
+Introduction
+.LP
+.DS I
+Chained at his root two scion demons dwell
+.br
+ – Erasmus Darwin, The Botanic Garden
+.DE
+.LP
+At Coraid, email is the largest resource user in the system by orders
+of magnitude. As of July, 2007, rewriting mail boxes was using
+300MB/day on the WORM and several users required more than 400MB of
+core. As of July, 2008, rewriting mail boxes was using 800MB/day on
+the WORM and several users required more than 1.2GB of core to read
+email. Clearly these are difficult to sustain levels of growth, even
+without growth of the company. We needed to limit the amount of disk
+space used and, more urgently, reduce Upas/fs' core size.
+.LP
+The techniques employed are simple. Mail is now stored in a directory
+with one message per file. This eliminates the need to rewrite mail
+boxes. Upas/fs now maintains an index which allows it to present
+complete message summaries without reading indexed messages.
+Combining the two techniques allows Upas/fs to read only new or
+referenced messages. Finally, caching limits both the total number of
+in-core messages and their total size.
+.NH
+Mdir Format
+.LP
+In addition to meeting our urgent operational requirements of reducing
+memory and disk footprint, to meet the expectations of our users we
+require a solution that is able to handle folders up to ten thousand
+messages, open folders quickly, list the contents of folders quickly
+and support the current set of mail readers.
+.LP
+There are several potential styles of mail boxes. The maildir[1] format
+has some attractive properties. Mail can be delivered to or deleted
+from a mail box without locking. New mail or deleted mail may be
+detected with a directory scan. When used with WORM storage, the
+amount of storage required is no more than the size of new mail
+received. Mbox format can require that a new copy of the inbox be
+stored every day. Even with storage that coalesces duplicate blocks
+such as Venti, deleting a message will generally require new storage
+since messages are not disk-block aligned. Maildir does not reduce
+the cost of the common task of a summary listing of mail such as
+generated by acme Mail.
+.LP
+The mails[2] format proposes a directory per mail. A copy of
+the mail as delivered is stored and each mime part is decoded
+in such a way that a mail reader could display the file directly.
+Command line tools in the style of MH[3] are used to display and
+process mail. Upas/fs is not necessary for reading local mail.
+Mails has the potential to reduce memory footprint below that
+offered by mdirs for native email reading. However all of the
+largest mail boxes at our site are served exclusively through IMAP.
+The preformatting by mails would be unnecessary for such accounts.
+.LP
+Other mail servers such as Lotus Notes[4] store email in a custom
+database format which allows for fielded and full-text searching
+of mail folders. Such a format provides very quick mail
+listings and good search capabilities. Such a solution would not
+lend itself well to a tool-based environment, nor would it be simple.
+.LP
+Maildir format seemed the best basic format but its particulars are
+tied to the
+.UX
+environment; mdir is a descendant. A mdir folder
+is a directory with the name of the folder. Messages in the mdir
+folder are stored in a file named
+.I "utime.seq" .
+.I Utime
+is defined as the decimal
+.UX
+seconds when the message was added to
+the folder. For the inbox, this time will correspond to the
+.UX
+“From ” line.
+.I Seq
+is a two-digit sequence number starting with
+.CW "00."
+The lowest available sequence number is used to store the message.
+Thus, the first email possible would be named
+.CW "0.00."
+To prevent accidents, message files are stored with
+the append-only and exclusive-access bits turned on.
+The message is stored in the same format it would be in mbox
+format; each message is a valid mbox folder with a single message.
+.NH
+Indexing
+.LP
+When upas/fs finds an unindexed message, it is added to the index.
+The index is a file named
+.I "foldername" .idx
+and consists a signature and one line per MIME part. Each line
+contains the SHA1 checksum of the message (or a place holder for
+subparts), one field per entry in the
+.I "messageid/info"
+file, flags and the number of subparts. The flags are currently a
+superset of the standard IMAP flags. They provide the similar
+functionality to maildir's modified file names. Thus the `S'
+(answered) flag remains set between invocations of mail readers.
+Other mutable information about a message may be stored in a similar
+way.
+.LP
+Since the
+.I info
+file is read by all the mail readers to produce mail listings,
+mail boxes may be listed without opening any mail files when no new
+mail has arrived. Similarly, opening a new mail box requires reading
+the index and checking new mail. Index files are typically between
+0.5% and 5% the size of the full mail box. Each time the index is
+generated, it is fully rewritten.
+.NH
+Caching
+.LP
+Upas/fs stores each message in a
+.CW "Message"
+structure. To enable caching, this structure was split
+into four parts: The
+.CW "Idx"
+(or index), message subparts, information on the cache state of the
+message and a set of pointers into the processed header and body.
+Only the pointers to the processed header and body are subject to
+caching. The available cache states are
+.CW "Cidx" ,
+.CW "Cheader"
+and
+.CW "Cbody" .
+.LP
+When the header and body are not present, the average message with
+subparts takes roughly 3KB of memory. Thus a 10,000 message mail box
+would require roughly 30MB of core in addition to any cached
+messages. Reads of the
+.CW "info"
+or
+.CW "subject"
+files can be satisfied from the information in the
+.CW "Idx"
+structure.
+.LP
+Since there are a fair number of very large messages, requests that
+can be satisfied by reading the message headers do not result in the
+full message being read. Reads of the
+.CW "header"
+or
+.CW "rawheader"
+files of top-level messages are satisfied in this way. Reading the
+same files for subparts, however, results in the entire message being
+read. Caching the header results in the
+.CW "Cheader"
+cache state.
+.LP
+Similarly, reading the
+.CW "body"
+requires the body to be read, processed and results in
+the
+.CW "Cbody"
+cache state. Reading from MIME subparts also results
+in the
+.CW "Cbody"
+cache state.
+.LP
+The cache has a simple LRU replacement policy. Each time a cached
+member of a message is accessed, it is moved to the head of the list.
+The cache contains a maximum number of messages and a maximum size.
+While the maximum number of messages may not be exceeded, the maximum
+cache size may be exceeded if the sum of all the currently referenced
+messages is greater than the size of the cache. In this case all
+unreferenced messages will be freed. When removing a message
+from the cache all of the cacheable information is freed.
+.NH
+Collateral damage
+.LP
+.DS I
+Each new user of a new system uncovers a new class of bugs.
+.br
+ — Brian Kernighan
+.DE
+.LP
+In addition to upas/fs, programs that have assumptions about how
+mail boxes are structured needed to be modified. Programs which
+deliver mail to mail boxes (deliver, marshal, ml, smtp) and append messages to
+folders were given a common (nedmail) function to call. Since this
+was done by modifying functions in the Upas common library, this
+presented a problem for programs not traditionally part of Upas
+such as acme Mail and imap4d. Rather than fold these programs
+into Upas, a new program, mbappend, was added to Upas.
+.LP
+Imap4d also requires the ability to rename and remove folders.
+While an external program would work for this as well, that
+approach has some drawbacks. Most importantly, IMAP folders
+can't be moved or renamed in the same way without reimplementing
+functionality that is already in upas/fs. It also emphasises the
+asymmetry between reading and deleting email and other folder
+actions. Folder renaming and removal were added to upas/fs.
+It is intended that mbappend will be removed soon
+and replaced with equivalent upas/fs functionality —
+at least for non-delivery programs.
+.LP
+Mdirs also expose an oddity about file permissions. An append-only
+file that is mode
+.CW 0622
+may be appended to by anyone, but is readable only by the owner.
+With a directory, such a setup is not directly possible as write permission
+to a directory implies permission to remove. There are a number of
+solutions to this problem. Delivery could be made asymmetrical—incoming
+files could be written to a mbox. Or, following the example of the outbound
+mail queue, each user could deliver to a directory owned by that user.
+In many BSD-derived
+.UX
+systems, the “sticky bit” on directories is used to modify
+the meaning of the
+.CW w
+bit for users matching only the other bits. For them, the
+.CW w
+bit gives permission to create but not to remove.
+.LP
+While this is somewhat of a one-off situation, I chose to implement
+a version of the “sticky bit” using the existing append-only bit on our
+file server. This was implemented as an extra permission check when
+removing files. Fewer than 10 lines of code were required.
+.NH
+Performance
+.LP
+A representative local mail box was used to generate some rough
+performance numbers. The mail box is 110MB and contains 868 messages.
+These figures are shown in table 1. In the worse case—an unindexed
+mail box—the new upas/fs uses 18% of the memory of the original while
+using 13% more cpu. In the best case, it uses only 5% of the memory
+while using only 13% of the cpu. Clearly, a larger mail box will make
+these ratios more attractive. In the two months since the snapshot was
+taken, that same mail box has grown to 220MB and contains 1814
+messages.
+.ps -2
+.DS C
+.TS
+box, tab(:);
+c s s s s
+c | c | c | c | c
+a | n | n | n | n.
+Table 1 – Performance
+_
+action:user:system:real:core size:
+:s:s:s:MB:
+_
+old fs read:1.69:0.84:6.07:135
+_
+initial read:1.65:0.90:6.90:25
+_
+indexed read:0.64:0.03:0.77:6.5
+.TE
+.DE
+.NL
+.NH
+Future Work
+.LP
+While Upas' memory usage has been drastically reduced,
+it is still a work-in-progress. Caching and indexing are
+adequate but primitive. Upas/fs is still inconsistently
+bypassed for appending messages to mail boxes. There
+are also some features which remain incomplete. Finally,
+the small increase in scale brings some new questions about
+the organization of email.
+.LP
+It may be useful for mail boxes with very large numbers
+of messages to divide the index into fixed-size chunks.
+Then messages could be read into a fixed-sized pool of
+structures as needed. However it is currently hard to
+see how clients could easily interface a mail box large
+enough for this technique to be useful. Currently, all
+clients assume that it is reasonable to allocate an
+in-core data structure for each message in a mail box.
+To take advantage of a chunked index, clients (or the
+server) would need a way of limiting the number of
+messages considered at a time. Also, for such large
+mail boxes, it would be important to separate the
+incoming messages from older messages to limit the work
+required to scan for new messages.
+.LP
+Caching is particularly unsatisfactory. Files should
+be read in fixed-sized buffers so maximum memory usage
+does not depend on the size of the largest file in the
+mail box. Unfortunately, current data structures do not readily
+support this. In practice, this limitation has not yet
+been noticeable.
+.LP
+There are also a few features that need to be completed.
+Tracking of references has been added to marshal and
+upas/fs. In addition, the index provides a place to store
+mutable information about a message. These capabilities
+should be built upon to provide general threading and
+tagging capabilities.
+.NH
+Speculation
+.LP
+Freed from the limitation that all messages in a
+mail box must be read and stored in memory before a
+single message may be accessed, it is interesting to
+speculate on a few further possibilites.
+.LP
+For example, it may be
+useful to replace separate mail boxes with a single
+collection of messages assigned to one or more virtual
+mail boxes. The association between a message and a
+mail box would be a “tag.” A message could be added to
+or removed from one or more mail boxes without modifying
+the mdir file. If threads were implemented by tagging
+each message with its references, it would be possible
+to follow threads across mail boxes, even to messages
+removed from all mail boxes, provided the underlying
+file were not also removed. If a facility for adding
+arbitrary, automatic tags were enabled, it would be
+possible to tag messages with the email address in
+the SMTP From line.
+.NH
+References
+.IP [1]
+D. Bernstein, “Using maildir format”,
+published online at
+.br
+http://cr.yp.to/proto/maildir.html
+.IP [2]
+F. Ballesteros
+.IR mails (1),
+published online at
+http://lsub.org/magic/man2html/1/mails
+.IP [3]
+MH Wikipedia entry,
+http://en.wikipedia.org/wiki/MH_Message_Handling_System
+.IP [4]
+Lotus Notes Wikipedia entry,
+http://en.wikipedia.org/wiki/Lotus_Notes
+.IP [5]
+D. Presotto, “Upas—a Simpler Approach to Network Mail”,
+Proceedings of the 10th Usenix conference, 1985.