Age | Commit message (Collapse) | Author |
|
The idea is that when we reboot, we zero out
memory written by processes that have the private
flag set (such as factotum and keyfs), and also
clear the secrmem pool, which contains TLS keys
and the state of the random number generator.
This is so the newly booted kernel or firmware
will not find these secret keys in memory.
|
|
procs come from the dynamic pools, so we don't need
to remove the memory used by possible procs from the
total available.
|
|
Treallocate the small data structures around procs eagerly,
but use malloc to allocate the large proc data structures
when we need them, which allows us to scale to many more procs.
There are still many scalability bottlenecks, so we only crank
up the nproc limit by a little bit this time around, and crank
it up more as we optimize more.
|
|
This adds the new function pointer PCArch.clockinit(),
which is a timer dependent initialization routine.
It also takes over the job of guesscpuhz(). This way, the
architecture ident code can switch between different
timers (i8253, HPET and XEN timer).
|
|
|
|
|
|
we might as well handle the per process cycle
counter in the portable part instead of duplicating the code
in every arch and have inconsistent implementations.
we now have a portable kenter() and kexit() function,
that is ment to be used in trap/syscall from user,
which updates the counters.
some kernels missed initializing Mach.cyclefreq.
|
|
|
|
|
|
It appears that our IDT overlaps with the data structures
passed from grub in multiboot load.
So defer setup of the interrupt table after the multiboot
parameters have been processed.
|
|
Make sure all pci busmaster activity is disabled,
including MSI/MSI-X interrupts, before switching
control to the new kernel.
|
|
loading the interrupt vector table early allows
us to handle traps during bootup before mmuinit()
which gives better diagnostics for debugging.
we also can handle general protection fault on
rdmsr() and wrmsr() which helps during
cpuidentify() and archinit() when probing for
cpu features.
|
|
The new pci code is moved to port/pci.[hc] and shared by
all ports.
Each port has its own PCI controller implementation,
providing the pcicfgrw*() functions for low level pci
config space access. The locking for pcicfgrw*() is now
done by the caller (only port/pci.c).
Device drivers now need to include "../port/pci.h" in
addition to "io.h".
The new code now checks bridge windows and membars,
while enumerating the bus, giving the pc driver a chance
to re-assign them. This is needed because some UEFI
implementations fail to assign the bars for some devices,
so we need to do it outselfs. (See pcireservemem()).
While working on this, it was discovered that the pci
code assimed the smallest I/O bar size is 16 (pcibarsize()),
which is wrong. I/O bars can be as small as 4 bytes.
Bit 1 in an I/O bar is also reserved and should be masked off,
making the port mask: port = bar & ~3;
|
|
This replaces the memory map code for both pc and pc64
kernels with a unified implementation using the new
portable memory map code.
The main motivation is to be robust against broken
e820 memory maps by the bios and delay the Conf.mem[]
allocation after archinit(), so mp and acpi tables
can be reserved and excluded from user memory.
There are a few changes:
new memreserve() function has been added for archinit()
to reserve bios and acpi tables.
upareserve() has been replaced by upaalloc(), which now
has an address argument.
umbrwmalloc() and umbmalloc() have been replaced by
umballoc().
both upaalloc() and umballoc() return physical addresses
or -1 on error. the physical address -1 is now used as
a sentinel value instead of 0 when dealing with physical
addresses.
archmp and archacpi now always use vmap() to access
the bios tables and reserve the ranges. more overflow
checks have been added.
ramscan() has been rewritten using vmap().
to handle the population of kernel memory, pc and pc64
now have pmap() and punmap() functions to do permanent
mappings.
|
|
replace machine specific userinit() by a portable
implemntation that uses kproc() to create the first
process. the initcode text is mapped using kmap(),
so there is no need for machine specific tmpmap()
functions.
initcode stack preparation should be done in init0()
where the stack is mapped and can be accessed directly.
replacing the machine specific userinit() allows some
big simplifications as sysrfork() and kproc() are now
the only callers of newproc() and we can avoid initializing
fields that we know are being initialized by these
callers.
rename autogenerated init.h and reboot.h headers.
the initcode[] and rebootcode[] blobs are now in *.i
files and hex generation was moved to portmkfile. the
machine specific mkfile only needs to specify how to
build rebootcode.out and initcode.out.
|
|
when a process does an exec syscall, procsetup() is called and
we have to disable the debug watchpoint registers. just clearing
p->dr is not enougth as we are not going thru a procsave() and
procrestore() cycle which would disable and reload the saved
debug registers.
instead of clearing debug registers in procfork(), we should
clear the saved debug registers before a process goes to die
(pexit() calls sched() with up->state = Moribund) as the Proc
structure can get reused for kernel processes (kproc) which
never call procfork() and would therefore have debug registers
loaded.
|
|
|
|
instead of having application processors spin in mpshutdown()
with mmu on, and be subject to reboot() overriding kernel text
and modifying page tables, park the application processors in
rebootcode idle loop with the mmu off.
|
|
|
|
this driver makes regions of physical memory accessible as a disk.
to use it, ramdiskinit() has to be called before confinit(), so
that conf.mem[] banks can be reserved. currently, only pc and pc64
kernel use it, but otherwise the implementation is portable.
ramdisks are not zeroed when allocated, so that the contents are
preserved across warm reboots.
to not waste memory, physical segments do not allocate Page structures
or populate the segment pte's anymore. theres also a new SG_CHACHED
attribute.
|
|
|
|
|
|
specific fpu handling
introducing the PFPU structue which allows the machine specific
code some flexibility on how to handle the FPU process state.
for example, in the pc and pc64 kernel, the FPsave structure is
arround 512 bytes. with avx512, it could grow up to 2K. instead
of embedding that into the Proc strucutre, it is more effective
to allocate it on first use of the fpu, as most processes do not
use simd or floating point in the first place. also, the FPsave
structure has special 16 byte alignment constraint, which further
favours dynamic allocation.
this gets rid of the memmoves in pc/pc64 kernels for the aligment.
there is also devproc, which is now checking if the fpsave area
is actually valid before reading it, avoiding debuggers to see
garbage data.
the Notsave structure is gone now, as it was not used on any
machine.
|
|
|
|
|
|
|
|
mechanism to pass arguments to /boot/boot
|
|
|
|
|
|
need to reset DR7 in procsave(); remove superfluous reset of DR7 in mmurelease()
|
|
|
|
|
|
|
|
|
|
|
|
cpus on pc64
|
|
introduce cpushutdown() function that does the common
operation of initiating shutdown, returning once all
cpu's got the message and are about to shutdown. this
avoids duplicated code which isnt really machine specific.
automatic reboot on panic only when *debug= is not set
and the machine is a cpu server or has no display,
otherwise just hang.
|
|
remove kbdenable()/kbdinit()
on vmware, loading a new kernel sometimes reboots when
wiggling the mouse. disabling keyboard and mouse on
shutdown fixes the issue.
make sure ps2 mouse is disabled on init, will get re-enabled
in i8042auxenable().
keyboard isnt special anymore, we can just use the devreset
entry point in the device to do the keyboard initialization,
so kbdinit()/kbdenable() are not needed anymore.
|
|
there are no kernels currently that do page coloring,
so the only use of cachectl[] is flushing the icache
(on arm and ppc).
on pc64, cachectl consumes 32 bytes in each page resulting
in over 200 megabytes of overhead for 32gb of ram with 4K
pages.
this change removes cachectl[] and adds txtflush ulong
that is set to ~0 by pio() to instruct putmmu() to flush
the icache.
|
|
|
|
all mmuinit0() does is initialize m->gdt, but this isnt neccesary
as this is done by mmuinit() anyway before loading the gdt.
|
|
|
|
BOOTLINE to /boot/boot as argv[]
this change allows command line passing to /boot/boot from qemu like:
qemu -kernel 9pcf -append "-u glenda tcp"
|
|
|
|
bootscreeninit(), fix rampage() usage
rampage() cannot be used after meminit(), so test for
conf.mem[0].npage != 0 and use xalloc()/mallocalign()
instead. this allows us to use vmap() early before
mmuinit() which is needed for bootscreeninit() and
acpi.
to get memory for page tables, pc64 needs a lowraminit().
with EFI, the RSDT pointer is passed in *acpi= parameter
from the efi loader. as the RSDT is ususally at the end of
the physical address space (and not to be found in
bios areas), we cannot KMAP() it so we need to vmap().
|
|
EFI system has no cga or vesa anymore, so it becomes neccesary to
pass GOP framebuffer info to the kernel to get some output on the
screen.
|
|
as we do system reset and reboot only from boot processor cpu0 now,
theres no need for active.rebooting conditional variable.
mpshutdown() will unconditionally park application processors and
and cpu0 boots the new kernel or calls mpshutdown() causing system
reset.
|
|
|
|
|
|
|