Age | Commit message (Collapse) | Author |
|
postnotepg()
Half NERR stack to 32.
When posing a note to a large group, avoid allocating Notes
for each individual process, but post the reference instread.
factor out process interruption into procinterrupt().
Avoid allocation of notes in alarmkproc, just posting the
same note to everyone.
|
|
de-bloat the proc structure by allocating notes
with on the heap instead of embedding them in
the proc structure.
This saves around 640 bytes per process.
|
|
Treallocate the small data structures around procs eagerly,
but use malloc to allocate the large proc data structures
when we need them, which allows us to scale to many more procs.
There are still many scalability bottlenecks, so we only crank
up the nproc limit by a little bit this time around, and crank
it up more as we optimize more.
|
|
The Mhead structures have two sources of references to them:
- from Pgrp.mnthash hash-table
- from a channels Chan.umh pointer as returned by namec() for a union directory
Unless one holds the Mhead.lock RWLock, the Mhead.mount chain
can be mutated by eigther cmount(), cunmount() or closepgrp().
Readers, skipping acquiering the lock where:
mountfix(): responsible for rewriting directory entries for
union directory reads; was walking the Mhead.mount chain to
detect if the passed channel itself appears in the mount list.
cmount(): had a check and copy when "new" chan was a union itself
and if the MCREATE flag is set and would copy the mount table.
All this needs to be done with Mhead read-locked while copying
the mount entries.
devproc(): in the handler for reading /proc/n/ns file.
namec(): while checking if the Chan->umh should be initialized.
In addition to this, cmount() is changed to do the mountfree()
of the original mount chain when MREPL is done after releasing
the locks.
Also, some cosmetic changes...
|
|
|
|
allow reading the control file of a process and return
its pid number. if the process has exited, return an error.
this can be usefull as a way to test if a process is
still alive. and also makes it behave similar to
network protocol directories.
another side effect is that processes who erroneously
open the ctl file ORDWR would be allowed todo so as
along as they have write permission and the process is
not a kernel process.
|
|
|
|
|
|
the user buffer could be changed while we parse it resulting
in a different number of watchpoints than initially calculated.
so add a check to the parse loop so we wont overflow the
watchpoint array.
|
|
writes to /proc/n/notepg and /proc/n/note should be able to write
at ERRMAX-1 bytes, not ERRMAX-2.
simplify write to /proc/n/args by just copying to local buf first
and then doing a kstrdup(). the value of Proc.nargs does not matter
when Proc.setargs is 1.
|
|
devproc assumes that when we hold the Proc.debug qlock,
the process will be prevented from exiting. but there is
another race where the process has already exited and
the Proc* slot gets reused. to solve this, on process
creation we also have to acquire the debug qlock while
initializing the fields of the process. this also means
newproc() should only initialize fields *not* protected
by the debug qlock.
always acquire the Proc.debug qlock when changing strings
in the proc structure to avoid doublefree on concurrent
update. for changing the user string, we add a procsetuser()
function that does this for auth.c and devcap.
remove pgrpnote() from pgrp.c and replace by static
postnotepg() in devproc.
avoid the assumption that the Proc* entries returned by
proctab() are continuous.
fixed devproc permission issues:
- make sure only eve can access /proc/trace
- none should only be allowed to read its own /proc/n/text
- move Proc.kp checks into procopen()
pid reuse was not handled correctly, as we where only
checking if a pid had a living process, but there still
could be processes expecting a particular parentpid or
noteid.
this is now addressed with reference counted Pid
structures which are organized in a hash table.
read access to the hash table does not require locks
which will be usefull for dtracy later.
|
|
the locking in proctext() is wrong. we have to acquire Proc.seglock
when reading segments from Proc.seg[] as segments do not
have a private freelist and can therefore be reused for other
data structures.
once we have Proc.seglock acquired, check that the process pid
is still valid so we wont accidentally read some other processes
segments. (for both proctext() and procctlmemio()). this also
should give better error message to distinguish the case when
the process did segdetach() the segment in question before we
could acquire Proc.seglock.
declare private functions as static.
|
|
|
|
keeps handling of devproc's note and notepg files similar and in the
same place and reduces stack usage.
|
|
|
|
segclock() has to be called from hzclock(), otherwise
only processes running on cpu0 would catche the interrupt
and the time delta would be wrong.
lock the segment when allocating Seg->profile as
profile ctl might be issued from multiple processes.
Proc->debug qlock is not sufficient.
Seg->profile can never be freed or reallocated once
set as the timer interrupt accesses it without any
locking.
|
|
devdir internally replicates the qid in ther perm stat field
already and the practice of explicitely passing just causing
confusion when done inconsistently.
|
|
does support quoting
|
|
specific fpu handling
introducing the PFPU structue which allows the machine specific
code some flexibility on how to handle the FPU process state.
for example, in the pc and pc64 kernel, the FPsave structure is
arround 512 bytes. with avx512, it could grow up to 2K. instead
of embedding that into the Proc strucutre, it is more effective
to allocate it on first use of the fpu, as most processes do not
use simd or floating point in the first place. also, the FPsave
structure has special 16 byte alignment constraint, which further
favours dynamic allocation.
this gets rid of the memmoves in pc/pc64 kernels for the aligment.
there is also devproc, which is now checking if the fpsave area
is actually valid before reading it, avoiding debuggers to see
garbage data.
the Notsave structure is gone now, as it was not used on any
machine.
|
|
|
|
up->debug
|
|
|
|
the problem is that segio doesnt check segment attributes
and it can't really in case of SG_FAULT which can be
inherited from pseg and toggle at any time.
so instead of returning -1 from fault into the fault$cputype
handler which then panics when fault happend kernel mode,
we jump into segio's waserror() block just like in the
demand load i/o error case (faulterror()).
|
|
|
|
this code isnt time critical and process TReal delta can become
very long, so use tk2ms() which is less prone to overflow.
|
|
|
|
The kernel needs to keep cryptographic keys and cipher states
confidential. secalloc() allocates memory from the secret pool
which is protected from debuggers reading the memory thru devproc.
secfree() releases the memory, overriding the data with garbage.
|
|
|
|
|
|
devproc's procctlmemio() did not handle physical segment
types correctly, as it assumed it can just kmap() the page
in question and write to it. physical segments however
need to be mapped uncached but kmap() will always map
cached as it assumes normal memory. on some machines with
aliasing memory with different cache attributes
leads to undefined behaviour!
we borrow the code from devsegment and provide a generic
segio() function to read and write user segments which
handles all the cases without using kmap by just spawning
a kproc that attaches the segment that needs to be read
from or written to. fault() will setup the right mmu
attributes for us. it will also properly flush pages for
segments that maintain instruction cache when written.
however, tlb's have to be flushed separately.
segio() is used for devsegment and devproc now, which
also allows for simplification of fixfault() as there is no
special error handling case anymore as fixfault() is now
called from faulting process *only*.
reads from /proc/$pid/mem can now span multiple pages.
|
|
fixed segments are continuous in physical memory but
allocated in user pages. unlike shared segments, they
are not allocated on demand but the pages are allocated
on creation time (devsegment). fixed segments are
never swapped out, segfreed or resized and can only be
destroyed as a whole.
the physical base address can be discovered by userspace
reading the ctl file in devsegment.
|
|
there are no kernels currently that do page coloring,
so the only use of cachectl[] is flushing the icache
(on arm and ppc).
on pc64, cachectl consumes 32 bytes in each page resulting
in over 200 megabytes of overhead for 32gb of ram with 4K
pages.
this change removes cachectl[] and adds txtflush ulong
that is set to ~0 by pio() to instruct putmmu() to flush
the icache.
|
|
to allow bytewise access to /proc/#/fd, the contents of the file where
recreated on each call. if fd's had been closed or reassigned between
the reads, the offset would be inconsistent and a read could start off
in the middle of a line. this happens when you cat /proc/#/fd file of
a busy process that mutates its filedescriptor table.
to fix this, we now return one line record at a time. if the line
fits in the read size, then this means the next read will always start
at the beginning of the next line record. we remember the consumed
byte count in Chan.mrock and the current record in Chan.nrock. (these
fields are free to usefor non-directory files)
if a read comes in and the offset is the same as c->mrock, we do not
need to regenerate the file and just render the next c->nrock's record.
for reads smaller than the line count, we have to regenerate the content
up to the offset and the race is still possible, but this should not
be the common case.
the same algorithm is now used for /proc/#/ns file, allowing a simpler
reimplementation and getting rid of Mntwalk state strcture.
|
|
theres a race where procstopwait() is interrupted by a note,
setting p->pdbg to nil *before* acquiering the lock and
and pexit() and procctl() accessing it assuming it doesnt
change under them while they are holding the lock.
|
|
noswap flag on exec
|
|
|
|
|
|
|
|
dont kill the calling process when demand load fails if fixfault()
is called from devproc. this happens when you delete the binary
of a running process and try to debug the process accessing uncached
pages thru /proc/$pid/mem file.
fixes to procctlmemio():
- fix missed unlock as txt2data() can error
- make sure the segment isnt freed by taking a reference (under p->seglock)
- access the page with segment locked (see comment)
- get rid of the segment stealer lock
other stuff:
- move txt2data() and data2txt() to segment.c
- add procpagecount() function
- make return type mcounseg() to ulong
|
|
Lock
make the Page stucture less than half its original size by getting rid of
the Lock and the lru.
The Lock was required to coordinate the unchaining of pages that where
both cached and on the lru freelist.
now pages have a single next pointer that is used for palloc.head
freelist xor for page cache hash chains in Image.pghash[].
cached pages are not on the freelist anymore, but will be reclaimed
from images by the pager when the freelist runs out of pages.
each Image has its own 512 hash chains for cached page lookup. That is
2MB worth of pages and there should be no collisions for most text images.
page reclaiming can be done without holding palloc.lock as the Image is
the owner of the page hash chains protected by the Image's lock.
reclaiming Image structures can be done quickly by only reclaiming pages from
inactive images, that is images which are not currently in use by segments.
the Ref structure has no Lock anymore. Only a single long that is atomically
incremented or decremnted using cmpswap().
there are various other changes as a consequence code. and lots of pikeshedding,
sorry.
|
|
procwrite() did truncate the offset to 32bit ulong.
introduce off2addr() function that does the sign
extension hack and use it conststently for Qmem
reads and writes.
|
|
for the CMclose procctl, the fd number was not
bounds checked before indexing in the Fgrp.fd
array.
for the CMclosefiles, we looped fd from 0..maxfd-1,
but need to loop from 0..maxfd as maxfd is inclusive.
|
|
the original format for addresses was %8lux which was changed
to %p for amd64. this broke linuxemu which assumes fixed format
in the segment file. as a compromize we change it to %8p and
amd64 port of linuxemu will hopefully use a more robust parser :)
|
|
file offset is 64 bit signed integer, negative offsets
are invalid and rejected by the kernel. to still access
kernel memory on amd64, we unconditionally clear the sign
bit of the 64 bit offset in libmach and devproc sign
extends the offset back to a 64 bit address.
|
|
this change is in preparation for amd64. the systab calling
convention was also changed to return uintptr (as segattach
returns a pointer) and the arguments are now passed as
va_list which handles amd64 arguments properly (all arguments
are passed in 64bit quantities on the stack, tho the upper
part will not be initialized when the element is smaller
than 8 bytes).
this is partial. xalloc needs to be converted in the future.
|
|
make sure noteid is valid (>0).
prohibit changing note group of kernel processes. this is also
checked for in pgrpnote().
prevent "none" user from changing its note group to another "none"
sessions. this would allow him to send notes other none processes
other than its own.
|
|
pprint() might block or even (maliciously) call into
devproc write which will corrupt the qlock chain on attempt
to qlock up->debug again.
|
|
theres a race when we wait for a process children and that
process exits before we sleep().
|
|
p->dot can be nil when process exits (see pexit())
set closeprocs dot to up->slash so it will show up
right in devproc.
|
|
|