summaryrefslogtreecommitdiff
path: root/sys/doc/fs/p6
diff options
context:
space:
mode:
authorcinap_lenrek <cinap_lenrek@centraldogma>2011-07-19 05:12:01 +0200
committercinap_lenrek <cinap_lenrek@centraldogma>2011-07-19 05:12:01 +0200
commitb6eee91029e9b7ed76d872d18aa88dc4d85a7e56 (patch)
treeb187989a64eedab41bc32ade5400325389bcecba /sys/doc/fs/p6
parent3b8c921bfa982bcdf287bb34f7a6f1b96c4b5ec8 (diff)
parent8c4c1f39f4e369d7c590c9d119f1150a2215e56d (diff)
merge
Diffstat (limited to 'sys/doc/fs/p6')
-rw-r--r--sys/doc/fs/p6255
1 files changed, 255 insertions, 0 deletions
diff --git a/sys/doc/fs/p6 b/sys/doc/fs/p6
new file mode 100644
index 000000000..0e767ff7a
--- /dev/null
+++ b/sys/doc/fs/p6
@@ -0,0 +1,255 @@
+.SH
+Cache/WORM Driver
+.PP
+The cache/WORM (cw) driver is by far the
+largest and most complicated device driver in the file server.
+There are four devices involved in the cw driver.
+It implements a read/write pseudo-device (the cw-device)
+and a read-only pseudo-device (the dump device)
+by performing operations on its two constituent devices
+the read-write c-device and the write-once-read-many
+w-device.
+The block numbers on the four devices are distinct,
+although the
+.I cw
+addresses,
+dump addresses,
+and the
+.I w
+addresses are
+highly correlated.
+.PP
+The cw-driver uses the w-device as the
+stable storage of the file system at the time of the
+last dump.
+All newly written and a large number of recently used
+exact copies of blocks of the w-device are kept on the c-device.
+The c-device is much smaller than the w-device and
+so the subset of w-blocks that are kept on the c-device are
+mapped through a hash table kept on a partition of the c-device.
+.PP
+The map portion of the c-device consists of blocks of buckets of entries.
+The declarations follow.
+.Ex
+ enum
+ {
+ BKPERBLK = 10,
+ CEPERBK = (BUFSIZE - BKPERBLK*sizeof(Off)) /
+ (sizeof(Centry)*BKPERBLK),
+ };
+.Ee
+.Ex
+ typedef
+ struct
+ {
+ ushort age;
+ short state;
+ Off waddr;
+ } Centry;
+.Ee
+.Ex
+ typedef
+ struct
+ {
+ long agegen;
+ Centry entry[CEPERBK];
+ } Bucket;
+.Ee
+.Ex
+ Bucket bucket[BKPERBLK];
+.Ee
+There is exactly one entry structure for each block in the
+data partition of the c-device.
+A bucket contains all of the w-addresses that have
+the same hash code.
+There are as many buckets as will fit
+in a block and enough blocks to have the required
+number of entries.
+The entries in the bucket are maintained
+in FIFO order with an age variable and an incrementing age generator.
+When the age generator is about to overflow,
+all of the ages in the bucket are rescaled
+from zero.
+.PP
+The following steps go into converting a w-address into a c-address.
+The bucket is found by
+.Ex
+ bucket_number = w-address % total_buckets;
+ getbuf(c-device, bucket_offset + bucket_number/BKPERBLK);
+.Ee
+After the desired bucket is found,
+the desired entry is found by a linear search within the bucket for the
+entry with the desired
+.CW waddr .
+.PP
+The state variable in the entry is
+one of the following.
+.Ex
+ enum
+ {
+ Cnone = 0,
+ Cdirty,
+ Cdump,
+ Cread,
+ Cwrite,
+ Cdump1,
+ };
+.Ee
+Every w-address has a state.
+Blocks that are not in the
+c-device have the implied
+state
+.CW Cnone .
+The
+.CW Cread
+state is for blocks that have the
+same data as the corresponding block in
+the w-device.
+Since the c-device is much faster than the
+w-device,
+.CW Cread
+blocks are kept as long as possible and
+used in preference to reading the w-device.
+.CW Cread
+blocks may be discarded from the c-device
+when the space is needed for newer data.
+The
+.CW Cwrite
+state is when the c-device contains newer data
+than the corresponding block on the w-device.
+This happens when a
+.CW Cnone ,
+.CW Cread ,
+or
+.CW Cwrite
+block is written.
+The
+.CW Cdirty
+state
+is when the c-device contains
+new data and the corresponding block
+on the w-device has never been written.
+This happens when a new block has been
+allocated from the free space on the w-device.
+.PP
+The
+.CW Cwrite
+and
+.CW Cdirty
+blocks are created and never removed.
+Unless something is done to
+convert these blocks,
+the c-device will gradually
+fill up and stop functioning.
+Once a day,
+or by command,
+a
+.I dump
+of the cw-device
+is taken.
+The purpose of
+a dump is to queue the writes that
+have been shunted to the c-device
+to be written to the w-device.
+Since the w-device is a WORM,
+blocks cannot be rewritten.
+Blocks that have already been written to the WORM must be
+relocated to the unused portion of the w-device.
+These are precisely the
+blocks with
+.CW Cwrite
+state.
+.PP
+The dump algorithm is as follows:
+.IP a)
+The tree on the cw-device is walked
+as long as the blocks visited have been
+modified since the last dump.
+These are the blocks with state
+.CW Cwrite
+and
+.CW Cdirty .
+It is possible to restrict the search
+to within these blocks
+since the directory containing a modified
+file must have been accessed to modify the
+file and accessing a directory will set its
+modified time thus causing the block containing it
+to be written.
+The directory containing that directory must be
+modified for the same reason.
+The tree walk is thus drastically restrained and the
+tree walk does not take much time.
+.IP b)
+All
+.CW Cwrite
+blocks found in the tree search
+are relocated to new blank blocks on the w-device
+and converted to
+.CW Cdump
+state.
+All
+.CW Cdirty
+blocks are converted to
+.CW Cdump
+state without relocation.
+At this point,
+all modified blocks in the cw-device
+have w-addresses that point to unwritten
+WORM blocks.
+These blocks are marked for later
+writing to the w-device
+with the state
+.CW Cdump .
+.IP c)
+All open files that were pointing to modified
+blocks are reopened to point at the corresponding
+reallocated blocks.
+This causes the directories leading to the
+open files to be modified.
+Thus the invariant discussed in a) is maintained.
+.IP d)
+The background dumping process will slowly
+go through the map of the c-device and write out
+all blocks with
+.CW Cdump
+state.
+.PP
+The dump takes a few minutes to walk the tree
+and mark the blocks.
+It can take hours to write the marked blocks
+to the WORM.
+If a marked block is rewritten before the old
+copy has been written to the WORM,
+it must be forced to the WORM before it is rewritten.
+There is no problem if another dump is taken before the first one
+is finished.
+The newly marked blocks are just added to the marked blocks
+left from the first dump.
+.PP
+If there is an error writing a marked block
+to the WORM
+then the
+.CW dump
+state is converted to
+.CW Cdump1
+and manual intervention is needed.
+(See the
+.CW cwcmd
+.CW mvstate
+command in
+.I fs (8)).
+These blocks can be disposed of by converting
+their state back to
+.CW Cdump
+so that they will be written again.
+They can also be converted to
+.CW Cwrite
+state so that they will be allocated new
+addresses at the next dump.
+In most other respects,
+a
+.CW Cdump1
+block behaves like a
+.CW Cwrite
+block.