Age | Commit message (Collapse) | Author |
|
The idea is that when we reboot, we zero out
memory written by processes that have the private
flag set (such as factotum and keyfs), and also
clear the secrmem pool, which contains TLS keys
and the state of the random number generator.
This is so the newly booted kernel or firmware
will not find these secret keys in memory.
|
|
investigating
|
|
when many processes go to sleep, our old timer would
slow to a crawl; this new implementation does not.
|
|
postnotepg()
Half NERR stack to 32.
When posing a note to a large group, avoid allocating Notes
for each individual process, but post the reference instread.
factor out process interruption into procinterrupt().
Avoid allocation of notes in alarmkproc, just posting the
same note to everyone.
|
|
de-bloat the proc structure by allocating notes
with on the heap instead of embedding them in
the proc structure.
This saves around 640 bytes per process.
|
|
Handlin notes is common for all architectures
except how the note has to be pushed on the user
stack.
This change adds a popnote() function that returns
only the note string or nil if the process should
not be notified (no notes or user notes hold off).
Popnote() also handles common errors like notify
during note handling or missing note handler and
will suicide the process in that case.
|
|
When probing a timer, we were running in our own kproc,
and not in an interrupt context, which meant that we didn't
have any access to anything worth sampling, so we didn't
give any data back.
This moves the probe to the hzclock interrupt, and returns
the pc in the probe.
|
|
|
|
In a few places, we where using a fixed buffer of sizeof(Dir)+100
size for stat. This is not correct and fails if the name returned
in stat is long.
This results in being unable to seek to the end of file with a
long filename.
The kernel should do the same thing as dirfstat() from libc;
handling the conversion and buffer allocation and returning a
freeable Dir* pointer.
For this, a new dirchanstat() function was added.
The fstat syscall was not rewriting the name to the last path
element; fix it.
In addition, gracefully handle the mountfix case, reallocating
the buffer to accomidate the required stat length plus
size of the new name so dirsetname() does not fail.
|
|
|
|
The Mhead structures have two sources of references to them:
- from Pgrp.mnthash hash-table
- from a channels Chan.umh pointer as returned by namec() for a union directory
Unless one holds the Mhead.lock RWLock, the Mhead.mount chain
can be mutated by eigther cmount(), cunmount() or closepgrp().
Readers, skipping acquiering the lock where:
mountfix(): responsible for rewriting directory entries for
union directory reads; was walking the Mhead.mount chain to
detect if the passed channel itself appears in the mount list.
cmount(): had a check and copy when "new" chan was a union itself
and if the MCREATE flag is set and would copy the mount table.
All this needs to be done with Mhead read-locked while copying
the mount entries.
devproc(): in the handler for reading /proc/n/ns file.
namec(): while checking if the Chan->umh should be initialized.
In addition to this, cmount() is changed to do the mountfree()
of the original mount chain when MREPL is done after releasing
the locks.
Also, some cosmetic changes...
|
|
behaviour in a.out(6)
For 64-bit architectures, the a.out header has the HDR_MAGIC flag set
in the magic and is expanded by 8 bytes containing the 64-bit virtual
address of the programs entry point. While Exec.entry contains physical
address for kernel images.
Our sysexec() would always use Exec.entry, even for 64-bit a.out binaries,
which worked because PADDR(entry) == entry for userspace pointers.
This change fixes it, having the kernel use the 64-bit entry point
and document the behaviour in the manpage.
|
|
We can take advantage of the fact that xinit() allocates
kernel memory from conf.mem[] banks always at the beginning
of a bank, so the separate palloc.mem[] array can be eleminated
as we can calculate the amount of non-kernel memory like:
upages = cm->npage - (PGROUND(cm->klimit - cm->kbase)/BY2PG)
for the number of reserved kernel pages,
we provide the new function: ulong nkpages(Confmem*)
This eleminates the error case of running out of slots in
the array and avoids wasting memory in ports that have simple
memory configurations (compared to pc/pc64).
|
|
Previously, mmurelease() was always called with
palloc spinlock held.
This is unneccesary for some mmurelease()
implementations as they wont release pages
to the palloc pool.
This change removes pagechainhead() and
pagechaindone() and replaces them with just
freepages() call, which aquires the palloc
lock internally as needed.
freepages() avoids holding the palloc lock
while walking the linked list of pages,
avoding some lock contention.
|
|
we might as well handle the per process cycle
counter in the portable part instead of duplicating the code
in every arch and have inconsistent implementations.
we now have a portable kenter() and kexit() function,
that is ment to be used in trap/syscall from user,
which updates the counters.
some kernels missed initializing Mach.cyclefreq.
|
|
|
|
opening /fd, /srv and /shr
The OCEXEC flag used to be maintained per channel,
making it shared between all the file desciptors.
This has a unexpected side effects with regard to
channel passing drivers such as devdup (/fd),
devsrv (/srv) and devshr (/shr).
For example, opening a /srv file with OCEXEC
makes it impossible to be remounted by exportfs
as it internally does a exec() to mount and
re-export it. There is no way to reset the flag.
This change makes the OCEXEC flag per file descriptor,
so a open with the OCEXEC flag only affects the fd
group of the calling process, and not the channel
itself.
On rfork(RFFDG), the per file descriptor flags get
copied.
On dup(), the per file descriptor flags are reset.
The second modification is that /fd, /srv and /shr
should reject the ORCLOSE flag, as the files that
are returned have already been opend.
|
|
port/iomap.c
With some newer UEFI firmware, not all pci bars get
programmed and we have to assign them ourselfs.
This was already done for memory bars. This change
adds the same for i/o port space, by providing a
ioreservewin() function which can be used to allocate
port space within the parent pci-pci bridge window.
Also, the pci code now allocates the pci config
space i/o ports 0xCF8/0xCFC so userspace needs to
use devpnp to access pci config space now. (see
latest realemu change).
Also, this moves the ioalloc()/iofree() code out
of devarch into port/iomap.c as it can be shared
with the ppc mtx kernel.
|
|
when reclaiming pages from an image, always reclaim all
the hash chains equally. that way, we avoid being biased
towards the chains at the start of the Image.pghash[] array.
images can be in two states: active or inactive. inactive
images are the ones which are not used by program while
active ones aare.
when reclaiming pages, we should try to reclaim pages
from inactive images first and only if that set becomes
exhausted attempt to release text pages and attempt to
reclaim pages from active images.
when we run out of Image structures, it makes only sense
to reclaim pages from inactive images, as reclaiming pages
from active ones will never free any Image structures.
change putimage() to require a image already locked and
make it unlock the image. this avoids many pointless
unlock()/lock() sequences as all callers of putimage()
already had the image locked.
|
|
|
|
This is a generic memory map for physical addresses. Entries
can be added with memmapadd() giving a range and a type.
Ranges can be allocated and freed from the map. The code
automatically resolves overlapping ranges by type priority.
|
|
devproc assumes that when we hold the Proc.debug qlock,
the process will be prevented from exiting. but there is
another race where the process has already exited and
the Proc* slot gets reused. to solve this, on process
creation we also have to acquire the debug qlock while
initializing the fields of the process. this also means
newproc() should only initialize fields *not* protected
by the debug qlock.
always acquire the Proc.debug qlock when changing strings
in the proc structure to avoid doublefree on concurrent
update. for changing the user string, we add a procsetuser()
function that does this for auth.c and devcap.
remove pgrpnote() from pgrp.c and replace by static
postnotepg() in devproc.
avoid the assumption that the Proc* entries returned by
proctab() are continuous.
fixed devproc permission issues:
- make sure only eve can access /proc/trace
- none should only be allowed to read its own /proc/n/text
- move Proc.kp checks into procopen()
pid reuse was not handled correctly, as we where only
checking if a pid had a living process, but there still
could be processes expecting a particular parentpid or
noteid.
this is now addressed with reference counted Pid
structures which are organized in a hash table.
read access to the hash table does not require locks
which will be usefull for dtracy later.
|
|
replace machine specific userinit() by a portable
implemntation that uses kproc() to create the first
process. the initcode text is mapped using kmap(),
so there is no need for machine specific tmpmap()
functions.
initcode stack preparation should be done in init0()
where the stack is mapped and can be accessed directly.
replacing the machine specific userinit() allows some
big simplifications as sysrfork() and kproc() are now
the only callers of newproc() and we can avoid initializing
fields that we know are being initialized by these
callers.
rename autogenerated init.h and reboot.h headers.
the initcode[] and rebootcode[] blobs are now in *.i
files and hex generation was moved to portmkfile. the
machine specific mkfile only needs to specify how to
build rebootcode.out and initcode.out.
|
|
comparing m with MACHP() is wrong as m is a constant on 386.
add procflushothers(), which flushes all processes except up
using common procflushmmu() routine.
|
|
keeps handling of devproc's note and notepg files similar and in the
same place and reduces stack usage.
|
|
fault() now has an additional pc argument that is
used to detect fault on a non-executable segment.
that is, we check on read fault if the segment
has the SG_NOEXEC attribute and the program counter
is within faulting page.
|
|
was only implemented by the pc kernel. does not
account pages used by the mount cache.
|
|
|
|
always start the pager kproc in swapinit(), simplifying kickpager().
allow zero conf.nswap and conf.nswppo. avoid allocating the reference
map and iolist arrays in that case.
use ulong for ioptr and iolist indices.
don't panic when writing pages out to the swapfile fails. just
requeue the page in the io transaction list so we will try
again next time executeio() is run or just free the page when
the swap reference was dropped.
remove unused pagersummary() function.
|
|
this driver makes regions of physical memory accessible as a disk.
to use it, ramdiskinit() has to be called before confinit(), so
that conf.mem[] banks can be reserved. currently, only pc and pc64
kernel use it, but otherwise the implementation is portable.
ramdisks are not zeroed when allocated, so that the contents are
preserved across warm reboots.
to not waste memory, physical segments do not allocate Page structures
or populate the segment pte's anymore. theres also a new SG_CHACHED
attribute.
|
|
reclaimable pages are user pages that are used for
caches like the image cache, mount cache and swap cache.
|
|
|
|
|
|
timerdel() did not make sure that the timer function
is not active (on another cpu). just acquiering the
Timer lock in the timer function only blocks the caller
of timerdel()/timeradd() but not the other way arround
(on a multiprocessor).
this changes the timer code to track activity of
the timer function, having timerdel() wait until
the timer has finished executing.
|
|
change the qid.vers on wstat
introducing new ctrunc() function that invalidates any caches
for the passed in chan, invoked when handling wstat with a
specified file length or on file creation/truncation.
test program to reproduce the problem:
#include <u.h>
#include <libc.h>
#include <libsec.h>
void
main(int argc, char *argv[])
{
int fd;
Dir *d, nd;
fd = create("xxx", ORDWR, 0666);
write(fd, "1234", 4);
d = dirstat("xxx");
assert(d->length == 4);
nulldir(&nd);
nd.length = 0;
dirwstat("xxx", &nd);
d = dirstat("xxx");
assert(d->length == 0);
fd = open("xxx", OREAD);
assert(read(fd, (void*)&d, 4) == 0);
}
|
|
|
|
remove bl2mem(), it is broken. a fault while copying to memory
yields a partially freed block list. it can be simply replaced
by readblist() and freeblist(), which we also use for qcopy()
now.
remove mem2bl(), and handle putting back remainer from a short
read internally (splitblock()) avoiding the releasing and re-
acquiering of the ilock.
always attempt to free blocks outside of the ilock.
have qaddlist() return the number of bytes enqueued, which
avoids walking the block list twice.
|
|
to avoid copying in padblock() when adding cryptographics macs to a block
in devtls/devssl/esp we reserve 16 extra bytes to the allocation.
remove qio ixsummary() function and add acid function qiostats() to
/sys/lib/acid/kernel
simplify iallocb(), remove iallocsummary() statitics.
|
|
the kernels custom rand() and nrand() functions where not working
as specified in rand(2). now we just use libc's rand() and nrand()
functions but provide a custom lrand() impelmenting the xoroshiro128+
algorithm as proposed by aiju.
|
|
|
|
The kernel needs to keep cryptographic keys and cipher states
confidential. secalloc() allocates memory from the secret pool
which is protected from debuggers reading the memory thru devproc.
secfree() releases the memory, overriding the data with garbage.
|
|
- return distinct error message when attempting to create Globalseg with physseg name
- copy directory name to up->genbuf so it stays valid after we unlock(&glogalseglock)
- cleanup wstat() handling, allow changing uid
- make sure global segment size is below SEGMAXSIZE
- move isoverlap() check from globalsegattach() into segattach()
- remove Proc* argument from globalsegattach(), segattach() and isoverlap()
- make Physseg.attr and segattach attr parameter an int for consistency
|
|
access to the axi segment hangs the machine when the fpga
is not programmed yet. to prevent access, we introduce a
new SG_FAULT flag, that when set on the Segment.type or
Physseg.attr, causes the fault handler to immidiately
return with an error (as if the segment would not be mapped).
during programming, we temporarily set the SG_FAULT flag
on the axi physseg, flush all processes tlb's that have
the segment mapped and when programming is done, we clear
the flag again.
|
|
|
|
|
|
introduce cpushutdown() function that does the common
operation of initiating shutdown, returning once all
cpu's got the message and are about to shutdown. this
avoids duplicated code which isnt really machine specific.
automatic reboot on panic only when *debug= is not set
and the machine is a cpu server or has no display,
otherwise just hang.
|
|
instead of ordering the source mount list, order the new destination
list which has the advantage that we do not need to wlock the source
namespace, so copying can be done in parallel and we do not need the
copy forward pointer in the Mount structure.
the Mhead back pointer in the Mount strcture was unused, removed.
|
|
|
|
special case in namec()
we already export mntauth() and mntversion(), so why not stop
being sneaky and just export mntattach() so bindmount() and
devshr can just call it directly with proper arguments being
checked.
we can also avoid handling #M attach specially in namec()
by having the devmnt's attach function do error(Enoattach).
|
|
this changes devmnt adding mntrahread() function and some helpers
for it to do pipelined sequential read ahead for the mount cache.
basically, cread() calls mntrahread() with Mntrah structure and it
figures out if we where reading sequentially and if thats the case
issues reads of c->iounit size in advance.
the read ahead state (Mntrah) is kept in the mount cache so we can
handle (read ahead) cache invalidation in the presence of writes.
|