diff options
author | cinap_lenrek <cinap_lenrek@centraldogma> | 2011-07-19 05:12:01 +0200 |
---|---|---|
committer | cinap_lenrek <cinap_lenrek@centraldogma> | 2011-07-19 05:12:01 +0200 |
commit | b6eee91029e9b7ed76d872d18aa88dc4d85a7e56 (patch) | |
tree | b187989a64eedab41bc32ade5400325389bcecba /sys/doc/fs/p6 | |
parent | 3b8c921bfa982bcdf287bb34f7a6f1b96c4b5ec8 (diff) | |
parent | 8c4c1f39f4e369d7c590c9d119f1150a2215e56d (diff) |
merge
Diffstat (limited to 'sys/doc/fs/p6')
-rw-r--r-- | sys/doc/fs/p6 | 255 |
1 files changed, 255 insertions, 0 deletions
diff --git a/sys/doc/fs/p6 b/sys/doc/fs/p6 new file mode 100644 index 000000000..0e767ff7a --- /dev/null +++ b/sys/doc/fs/p6 @@ -0,0 +1,255 @@ +.SH +Cache/WORM Driver +.PP +The cache/WORM (cw) driver is by far the +largest and most complicated device driver in the file server. +There are four devices involved in the cw driver. +It implements a read/write pseudo-device (the cw-device) +and a read-only pseudo-device (the dump device) +by performing operations on its two constituent devices +the read-write c-device and the write-once-read-many +w-device. +The block numbers on the four devices are distinct, +although the +.I cw +addresses, +dump addresses, +and the +.I w +addresses are +highly correlated. +.PP +The cw-driver uses the w-device as the +stable storage of the file system at the time of the +last dump. +All newly written and a large number of recently used +exact copies of blocks of the w-device are kept on the c-device. +The c-device is much smaller than the w-device and +so the subset of w-blocks that are kept on the c-device are +mapped through a hash table kept on a partition of the c-device. +.PP +The map portion of the c-device consists of blocks of buckets of entries. +The declarations follow. +.Ex + enum + { + BKPERBLK = 10, + CEPERBK = (BUFSIZE - BKPERBLK*sizeof(Off)) / + (sizeof(Centry)*BKPERBLK), + }; +.Ee +.Ex + typedef + struct + { + ushort age; + short state; + Off waddr; + } Centry; +.Ee +.Ex + typedef + struct + { + long agegen; + Centry entry[CEPERBK]; + } Bucket; +.Ee +.Ex + Bucket bucket[BKPERBLK]; +.Ee +There is exactly one entry structure for each block in the +data partition of the c-device. +A bucket contains all of the w-addresses that have +the same hash code. +There are as many buckets as will fit +in a block and enough blocks to have the required +number of entries. +The entries in the bucket are maintained +in FIFO order with an age variable and an incrementing age generator. +When the age generator is about to overflow, +all of the ages in the bucket are rescaled +from zero. +.PP +The following steps go into converting a w-address into a c-address. +The bucket is found by +.Ex + bucket_number = w-address % total_buckets; + getbuf(c-device, bucket_offset + bucket_number/BKPERBLK); +.Ee +After the desired bucket is found, +the desired entry is found by a linear search within the bucket for the +entry with the desired +.CW waddr . +.PP +The state variable in the entry is +one of the following. +.Ex + enum + { + Cnone = 0, + Cdirty, + Cdump, + Cread, + Cwrite, + Cdump1, + }; +.Ee +Every w-address has a state. +Blocks that are not in the +c-device have the implied +state +.CW Cnone . +The +.CW Cread +state is for blocks that have the +same data as the corresponding block in +the w-device. +Since the c-device is much faster than the +w-device, +.CW Cread +blocks are kept as long as possible and +used in preference to reading the w-device. +.CW Cread +blocks may be discarded from the c-device +when the space is needed for newer data. +The +.CW Cwrite +state is when the c-device contains newer data +than the corresponding block on the w-device. +This happens when a +.CW Cnone , +.CW Cread , +or +.CW Cwrite +block is written. +The +.CW Cdirty +state +is when the c-device contains +new data and the corresponding block +on the w-device has never been written. +This happens when a new block has been +allocated from the free space on the w-device. +.PP +The +.CW Cwrite +and +.CW Cdirty +blocks are created and never removed. +Unless something is done to +convert these blocks, +the c-device will gradually +fill up and stop functioning. +Once a day, +or by command, +a +.I dump +of the cw-device +is taken. +The purpose of +a dump is to queue the writes that +have been shunted to the c-device +to be written to the w-device. +Since the w-device is a WORM, +blocks cannot be rewritten. +Blocks that have already been written to the WORM must be +relocated to the unused portion of the w-device. +These are precisely the +blocks with +.CW Cwrite +state. +.PP +The dump algorithm is as follows: +.IP a) +The tree on the cw-device is walked +as long as the blocks visited have been +modified since the last dump. +These are the blocks with state +.CW Cwrite +and +.CW Cdirty . +It is possible to restrict the search +to within these blocks +since the directory containing a modified +file must have been accessed to modify the +file and accessing a directory will set its +modified time thus causing the block containing it +to be written. +The directory containing that directory must be +modified for the same reason. +The tree walk is thus drastically restrained and the +tree walk does not take much time. +.IP b) +All +.CW Cwrite +blocks found in the tree search +are relocated to new blank blocks on the w-device +and converted to +.CW Cdump +state. +All +.CW Cdirty +blocks are converted to +.CW Cdump +state without relocation. +At this point, +all modified blocks in the cw-device +have w-addresses that point to unwritten +WORM blocks. +These blocks are marked for later +writing to the w-device +with the state +.CW Cdump . +.IP c) +All open files that were pointing to modified +blocks are reopened to point at the corresponding +reallocated blocks. +This causes the directories leading to the +open files to be modified. +Thus the invariant discussed in a) is maintained. +.IP d) +The background dumping process will slowly +go through the map of the c-device and write out +all blocks with +.CW Cdump +state. +.PP +The dump takes a few minutes to walk the tree +and mark the blocks. +It can take hours to write the marked blocks +to the WORM. +If a marked block is rewritten before the old +copy has been written to the WORM, +it must be forced to the WORM before it is rewritten. +There is no problem if another dump is taken before the first one +is finished. +The newly marked blocks are just added to the marked blocks +left from the first dump. +.PP +If there is an error writing a marked block +to the WORM +then the +.CW dump +state is converted to +.CW Cdump1 +and manual intervention is needed. +(See the +.CW cwcmd +.CW mvstate +command in +.I fs (8)). +These blocks can be disposed of by converting +their state back to +.CW Cdump +so that they will be written again. +They can also be converted to +.CW Cwrite +state so that they will be allocated new +addresses at the next dump. +In most other respects, +a +.CW Cdump1 +block behaves like a +.CW Cwrite +block. |