summaryrefslogtreecommitdiff
path: root/sys/src/9/port/sysproc.c
AgeCommit message (Collapse)Author
2023-03-07sys/src/9/port/sysproc.c: add spim magicadventuresin9
2022-12-04kernel: free exec temporary stack segment under seglockcinap_lenrek
On error, we must free the temporary stack segment while holding up->seglock.
2022-09-03kernel: half NERR, refcount Note's to avoid excessive allocations for ↵cinap_lenrek
postnotepg() Half NERR stack to 32. When posing a note to a large group, avoid allocating Notes for each individual process, but post the reference instread. factor out process interruption into procinterrupt(). Avoid allocation of notes in alarmkproc, just posting the same note to everyone.
2022-08-17kernel: allocate notes in heapcinap_lenrek
de-bloat the proc structure by allocating notes with on the heap instead of embedding them in the proc structure. This saves around 640 bytes per process.
2022-07-25sysproc: raise limit on #! lines, and allow quoted argsOri Bernstein
Our #! line length is very short, and the naïve quoting makes it difficult to pass more complicated arguments to the programs being run. This is fine for simple interpreters, but it's often useful to pass arguments to more complicated interpreters like auth/box or awk. This change raises the limit, but also switches to tokenizing via tokenize(2), rather than hand rolled whitespace splitting. The limits chosen are arbitrary, but they leave approximately 3 KiB of stack space on 386, and 13k on amd64. This is a lot of stack used, but it should leave enough for fairly deep devtab chan stacks.
2022-05-28kernel: add chdev command to devconsJacob Moody
2022-05-02kernel: fix noteid change race condition from devproc while forking (thanks ↵cinap_lenrek
joe7) devproc allows changing the noteid of another process which opens a race condition in sysrfork(), when deciding to inherit the noteid of "up" to the child and calling pidalloc() later to take the reference, the noteid could have been changed and the childs noteid could have been freed already in the process. this bug can only happen when one writes the /proc/n/noteid file of a another process than your own that is in the process of forking. the noteid changing functionality of devproc seems questinable and seems to be only used by ape's setpgrid() implementation.
2021-10-12kernel: return error from sysrfork instead of waiting and retryingcinap_lenrek
The old strategy of wait and retry doesnt seem to work very well as it keeps all the forking parents stuck waiting in the kernel worsening the situation. The idea with this change is to have rfork() return error quickly; and without whining; as most callers would just react with a sysfatal() which might be better for surviving this.
2021-05-29kernel: use 64-bit virtual entry point for expanded header, document ↵cinap_lenrek
behaviour in a.out(6) For 64-bit architectures, the a.out header has the HDR_MAGIC flag set in the magic and is expanded by 8 bytes containing the 64-bit virtual address of the programs entry point. While Exec.entry contains physical address for kernel images. Our sysexec() would always use Exec.entry, even for 64-bit a.out binaries, which worked because PADDR(entry) == entry for userspace pointers. This change fixes it, having the kernel use the 64-bit entry point and document the behaviour in the manpage.
2020-12-20kernel: handle tos and per process pcycle counters in port/cinap_lenrek
we might as well handle the per process cycle counter in the portable part instead of duplicating the code in every arch and have inconsistent implementations. we now have a portable kenter() and kexit() function, that is ment to be used in trap/syscall from user, which updates the counters. some kernels missed initializing Mach.cyclefreq.
2020-02-29kernel: simplify exec()cinap_lenrek
progarg[0] can be assigned to elem directly as it is a copy in kernel memory, so the char proelem[64] buffer is not neccesary. do the close-on-exit outside of the segment lock. there is no reason to keep the segment table locked.
2020-02-28kernel: make sure we wont run into the tos when copying exec() argumentscinap_lenrek
in case the calling process changes its arguments under us, it could happen that the final argument string lengths become bigger than initially calculated. this is fine as we still make sure we wont overflow the stack segment, but we could overrun into the tos structure at the end of the stack. so change the limit to the base of the tos, not the end of the stack segment.
2020-02-23kernel: fix multiple devproc bugs and pid reuse issuescinap_lenrek
devproc assumes that when we hold the Proc.debug qlock, the process will be prevented from exiting. but there is another race where the process has already exited and the Proc* slot gets reused. to solve this, on process creation we also have to acquire the debug qlock while initializing the fields of the process. this also means newproc() should only initialize fields *not* protected by the debug qlock. always acquire the Proc.debug qlock when changing strings in the proc structure to avoid doublefree on concurrent update. for changing the user string, we add a procsetuser() function that does this for auth.c and devcap. remove pgrpnote() from pgrp.c and replace by static postnotepg() in devproc. avoid the assumption that the Proc* entries returned by proctab() are continuous. fixed devproc permission issues: - make sure only eve can access /proc/trace - none should only be allowed to read its own /proc/n/text - move Proc.kp checks into procopen() pid reuse was not handled correctly, as we where only checking if a pid had a living process, but there still could be processes expecting a particular parentpid or noteid. this is now addressed with reference counted Pid structures which are organized in a hash table. read access to the hash table does not require locks which will be usefull for dtracy later.
2020-01-27kernel: fix mistake from previous commit (noteid not being inherited by default)cinap_lenrek
2020-01-26kernel: implement portable userinit() and simplify process creationcinap_lenrek
replace machine specific userinit() by a portable implemntation that uses kproc() to create the first process. the initcode text is mapped using kmap(), so there is no need for machine specific tmpmap() functions. initcode stack preparation should be done in init0() where the stack is mapped and can be accessed directly. replacing the machine specific userinit() allows some big simplifications as sysrfork() and kproc() are now the only callers of newproc() and we can avoid initializing fields that we know are being initialized by these callers. rename autogenerated init.h and reboot.h headers. the initcode[] and rebootcode[] blobs are now in *.i files and hex generation was moved to portmkfile. the machine specific mkfile only needs to specify how to build rebootcode.out and initcode.out.
2019-12-01kernel: improve diagnostics by reversing the roles of Proc.parentpid and ↵cinap_lenrek
Proc.parent for better system diagnostics, we *ALWAYS* want to record the parent pid of a user process, regardless of if the child will post a wait record on exit or not. for that, we reverse the roles of Proc.parent and Proc.parentpid so Proc.parentpid will always be set on rfork() and the Proc.parent pointer will point to the parent's Proc structure or is set to nil when no wait record should be posted on exit (RFNOWAIT flag). this means that we can get the pid of the original parent process from /proc, regardless of the the child having rforked with the RFNOWAIT flag. this improves the output of pstree(1) somewhat if the parent is still alive. note that theres no guarantee that the parent pid is still valid. the conditions are unchanged: a user process that will post wait record has: up->kp == 0 && up->parent != nil && up->parent->pid == up->parentpid the boot process is: up->kp == 0 && up->parent == nil && up->parentpid == 0 and kproc's have: up->kp != 0 && up->parent == nil && up->parentpid == 0
2019-09-04kernel: make exec clear errstr, stop side-channels and truncate on utf8 boundarycinap_lenrek
make exec() clear the per process error string to avoid spurious errors and confusion. the errstr() syscall used to always swap the maximum buffer size with memmove(), which is problematic as this gives access to the garbage beyond the NUL byte. worse, newproc(), werrstr() and rerrstr() only clear the first byte of the input buffer. so random stack rubble could be leaked across processes. we change the errstr() syscall to not copy beyond the NUL byte. the manpage also documents that errstr() should truncate on a utf8 boundary so we use utfecpy() to ensure proper NUL termination.
2019-08-27kernel: prohibit changing cache attributes (SG_CACHED|SG_DEVICE) in ↵cinap_lenrek
segattach(), set SG_RONLY in data2txt() the user should not be able to change the cache attributes for a segment in segattach() as this can cause the same memory to be mapped with conflicting attributes in the cache. SG_TEXT should always be mapped with SG_RONLY attribute. so fix data2txt() to follow the rules.
2019-08-27kernel: make user stack segment non-executablecinap_lenrek
2019-05-03kernel: exec support for arm64 binariescinap_lenrek
2019-05-01kernel: get rid of checkpagerefs() debuggingcinap_lenrek
was only implemented by the pc kernel. does not account pages used by the mount cache.
2018-06-10kernel: don't cap the minimum sleep time to TK2MS(1) for syssleep()cinap_lenrek
on HZ 100 systems like pc and pc64, the minium sleep time was 10ms, which is quite high. the cap isnt really needed as arch specific timerset() enforces its own limit, but on a higher resolution. background: from Charles Forsyth: I haven't really got an opinion on it. The 10ms interval was first used on machines that were much slower. I thought someone did set HZ to a bigger value, partly to support better in-kernel timing. I haven't done it because I never had a need for it. If I were doing (say) protocol implementation in user mode, I'd certainly reconsider. Sleep itself forces at best ms granularity, and for some applications that's too big. initial mail from qwx raising the issue: > Hello, > > I found out recently that sleep(2)'s resolution on 386 and 9front's amd64 > kernel is 10 ms rather than 1 ms. The reason is that on those kernels, > HZ is set to 100 rather than say 1000. In syssleep, we get 1 tich every > 10 ms. > > What is unclear is why. > > To paraphrase cinap_lenrek's answer to my question: > > In syssleep: > if(ms < TK2MS(1)) > ms = TK2MS(1); > tsleep(&up->sleep, return0, 0, ms); > > "TK2MS(1)" can be replaced with just "1", and the arch specific > timerset() routine would do its own capping of the period if it's too > small for the timer resolution, and make better decisions based on what > the minimum timer period should be given the latency overhead of the > given arch's interrupt handling and performance characteristics. > > Alternatively, HZ could be raised to 500 or 1000. > > It seems it's just trying to prevent excessive context switches and > interrupts, but it seems somewhat arbitrary. A ton of syscalls can be > done in 1 ms, and it's the lowest we can go without changing the unit. > > > What do you think? > > Thanks in advance, > > qwx
2017-06-20kernel: add support for sticky segments (cached, preallocated, never paged)cinap_lenrek
2016-09-07kernel: tsemacquire() use MACHP(0)->ticks for time deltacinap_lenrek
we might wake up on a different cpu after the sleep so delta from machX->ticks - machY->ticks can become negative giving spurious timeouts. to avoid this always use the same mach 0 tick counter for the delta.
2016-03-30devsegment: cleanupscinap_lenrek
- return distinct error message when attempting to create Globalseg with physseg name - copy directory name to up->genbuf so it stays valid after we unlock(&glogalseglock) - cleanup wstat() handling, allow changing uid - make sure global segment size is below SEGMAXSIZE - move isoverlap() check from globalsegattach() into segattach() - remove Proc* argument from globalsegattach(), segattach() and isoverlap() - make Physseg.attr and segattach attr parameter an int for consistency
2015-12-21kernel: missing changes for ibrk() prototypecinap_lenrek
2015-08-06kernel: have to validate argv[] again when copying to the new stackcinap_lenrek
we have to validaddr() and vmemchr() all argv[] elements a second time when we copy to the new stack to deal with the fact that another process can come in and modify the memory of the process doing the exec. so the argv[] strings could have changed and increased in length. we just make sure the data being copied will fit into the new stack and error when we would overflow. also make sure to free the ESEG in case the copy pass errors.
2015-08-06kernel: limit argv[] strings to the USTKSIZE to avoid overflowcinap_lenrek
argv[] strings get copied to the new processes stack segment, which has a maximum size of USTKSIZE, so limit the size of the strings to that and check early for overflow.
2015-08-06kernel: validnamedup() the name argument for segattach()cinap_lenrek
this moves the name validation out of segattach() to syssegattach() to make sure the segment name cannot be changed by the user while segattach looks at it.
2015-08-06kernel: change vmemchr() length argument to ulong and simplifycinap_lenrek
2015-08-06kernel: make shargs() function static in sysproc.ccinap_lenrek
2015-08-06kernel: reject empty argv (argv[0] == nil) in sysexec()cinap_lenrek
when executing a script, we did advance argp0 unconditionally to replace argv[0] with the script name. this fails when argv[] is empty, then we'd advance argp0 past the nil terminator. the alternative would be to *not* advance if *argp0 == nil, but that would require another validaddr() check for a case that is unlikely to have been anticipated in most programs being invoked as libc's ARGBEGIN macro assumes argv[0] being non-nil as it also unconditionally advances the argv pointer. to keep us sane, we now reject an empty argv[]. on entry, we verify that argv[] is valid for at least two elements: - the program name argv[0], has to be non-nil - the first potential nil terminator in argv[1] when argv[0] == nil, we throw Ebadarg "bad arg in system call"
2015-07-10kernel: use HDR_MAGIC constant to handle Exec header extension, make ↵cinap_lenrek
rebootcmd() handle AOUT_MAGIC macro
2015-07-09sysexec(): need () arround AOUT_MAGIC comparsion to handle #define hack on mipscinap_lenrek
2015-07-09sysexec(): make the mips compiler happycinap_lenrek
2015-07-09kernel: reject bogus two byte "#!" shell scripts in sysexec()cinap_lenrek
- reject files smaller or equal to two bytes, they are bogus - fix out of bounds access in shargs() when n <= 2 - only copy the bytes read into line buffer - use nil for pointers instead of 0
2015-04-12kernel: fixed segment support (for fpga experiments)cinap_lenrek
fixed segments are continuous in physical memory but allocated in user pages. unlike shared segments, they are not allocated on demand but the pages are allocated on creation time (devsegment). fixed segments are never swapped out, segfreed or resized and can only be destroyed as a whole. the physical base address can be discovered by userspace reading the ctl file in devsegment.
2015-03-22vl, libmach, kernel: mips has 16K alignment for segments (for bigpages)cinap_lenrek
2015-03-07kernel: catch address overflow in syssegfree()cinap_lenrek
the "to" address can overflow in syssegfree() causing wrong number of pages to be passed to mfreeseg(). with the current implementation of mfreeseg() however, this doesnt cause any data corruption but was just freeing an unexpected number of pages. this change checks for this condition in syssegfree() and errors out instead. also mfreeseg() was changed to take ulong argument for number of pages instead of int to keep it consistent with other routines that work with page counts.
2015-03-03kernel: fix physical segment handlingcinap_lenrek
ignore physical segments in mcountseg() and mfreeseg(). physical segments are not backed by user pages, and doing putpage() on physical segment pages in mfreeseg() is an error. do now allow physical segemnts to be resized. the segment size is only checked in segattach() to be within the physical segment! ignore physical segments in portcountpagerefs() as pagenumber() does not work on the malloced page structures of a physical segment. get rid of Physseg.pgalloc() and Physseg.pgfree() indirection as this was never used and if theres a need to do more efficient allocation, it should be done in a portable way.
2014-09-20pc64: put return value of nsec syscall in register on amd64cinap_lenrek
WHAT WHERE THEY *THINKING*??!?! unlike seek, the (new) nsec syscall (not used in 9front libc) returns the time value in register (from nix), so do the same for compatibility.
2014-08-17kernel: make noswap flag exclude processes from killbig() if not eve, reset ↵cinap_lenrek
noswap flag on exec
2014-06-22kernel: new pagecache, remove Lock from page, use cmpswap for Ref instead of ↵cinap_lenrek
Lock make the Page stucture less than half its original size by getting rid of the Lock and the lru. The Lock was required to coordinate the unchaining of pages that where both cached and on the lru freelist. now pages have a single next pointer that is used for palloc.head freelist xor for page cache hash chains in Image.pghash[]. cached pages are not on the freelist anymore, but will be reclaimed from images by the pager when the freelist runs out of pages. each Image has its own 512 hash chains for cached page lookup. That is 2MB worth of pages and there should be no collisions for most text images. page reclaiming can be done without holding palloc.lock as the Image is the owner of the page hash chains protected by the Image's lock. reclaiming Image structures can be done quickly by only reclaiming pages from inactive images, that is images which are not currently in use by segments. the Ref structure has no Lock anymore. Only a single long that is atomically incremented or decremnted using cmpswap(). there are various other changes as a consequence code. and lots of pikeshedding, sorry.
2014-05-20add _nsec() syscall 53 for binary compatibility with labs distributioncinap_lenrek
the new syscall is added under the symbol _nsec() for binary compatibility. nsec() is still a library function reading /dev/bintime.
2014-02-05pc64: dont 4 byte align stack pointer for amd64 in sysexec()cinap_lenrek
2014-02-02kernel: fix bogus free in sysexec.cinap_lenrek
we free the wrong pointer in the waserror() block.
2014-02-01kernel: handle amd64 40 byte headers in exec()cinap_lenrek
2014-01-20kernel: various cleanupscinap_lenrek
2014-01-20kernel: apply uintptr for ulong when a pointer is storedcinap_lenrek
this change is in preparation for amd64. the systab calling convention was also changed to return uintptr (as segattach returns a pointer) and the arguments are now passed as va_list which handles amd64 arguments properly (all arguments are passed in 64bit quantities on the stack, tho the upper part will not be initialized when the element is smaller than 8 bytes). this is partial. xalloc needs to be converted in the future.
2013-12-29kernel: make sure user text, data and bss wont overlap the stack segment in ↵cinap_lenrek
sysexec()