Age | Commit message (Collapse) | Author |
|
|
|
sugen() calls cgen64() speculatively so that when cgen64() returns
zero, it will fall back and compile 64-bit copy.
the bug was that cgen64() compiled the left hand side and then recursively
called cgen64() again, which didnt handle the memory copy so it returned
zero and sugen() would compile the left hand side again resulting in two
function calls being emited.
some code that reproduced the issue:
#include <u.h>
#include <libc.h>
typedef struct
{
char x[10];
vlong a;
} X;
X a;
X *f(void) { return &a; }
void
main(int argc, char *argv[])
{
f()->a = a.a;
}
producing:
TEXT f+0(SB),0,$0
MOVL $a+0(SB),AX
RET ,
RET ,
TEXT main+0(SB),0,$0
CALL ,f+0(SB)
CALL ,f+0(SB) <- bug
MOVL AX,CX
LEAL a+12(SB),DX
MOVL (DX),AX
MOVL AX,12(CX)
MOVL 4(DX),AX
MOVL AX,16(CX)
RET ,
GLOBL a+0(SB),$20
END ,
|
|
optimizers
introduce rolor() function to subsitute (a << c) | (a >> (bits(a) - c))
with (a <<< c) where <<< is cyclic rotation and c is constant.
this almost doubles the speed of chacha encryption of 386 and amd64.
the peephole optimizer used to stop when it hit a shift or rol
instruction when attempting to eleminate moves by register
substitution. but we do not have to as long as the shift count
operand is not CX (which cannot be substituted) and CX is not
a subject for substitution.
|
|
|
|
- cover more cases that have no side effects
- ensure function has complex FNX
- pull operators out of OFUNC level
- rewrite OSTRUCT lhs to avoid all side-effects, use regalloc() instead of regret()
|
|
|
|
conversions
|
|
final assignment wont trash the registers
|
|
|
|
peephole pass
|
|
|
|
the shift instructions does not change the zero flag
when the shift count is 0, so we cannot remove the
compare instruction in this case.
this fixes oggdec under 386.
|
|
when the previous instruction sets the zero flag,
we can remove the CMPL/CMPQ instruction.
this removes compares for zero/non zero tests only.
it only looks at the previous non-nop instruction
to see if it sets our compare value register.
|
|
|
|
If 64-bit multiply has to save both AX and DX, it could load the wrong value
into DX; also, biggen shouldn't allocate either AX or DX as temporaries
when using the template for MUL.
|
|
|
|
|
|
|
|
|
|
|