Age | Commit message (Collapse) | Author |
|
|
|
icmpdontfrag() was not working properly, need to pass the
gating source interface. in fact, we now always pass the source
interface to all icmp*() functions, which is used to
determine source ip address of the icmp reply.
also dont generate a icmp response for packets going to
non-unicast addresses (such as broadcast).
increase the amount of icmp response payload, but keep
icmp responses below the minimum ipv4 mtu (68 bytes).
regularize icmpv6 function names.
move icmp unreachable codes to icmpv6.c.
provide the mtu value for icmppkttoobig6().
dont advise announced udp connections.
avoid code duplication in icmp.c and icmpv6.c,
by having single send function with type, code and arg
parameters.
maintain statistics for sent ipv4 icmp types.
avoid route lookup in ipout*() by passing Routehint* to
icmpnohost*().
iladvise()... more like ill advice.
|
|
We used to only clamp to the MTU of the destination interface,
but this is wrong. We have to clamp to the minimum of both
source and destination.
For this, we change the gating argument type of ipoput4()
and ipoput6() from int to Ipifc* to pass the source interface.
|
|
|
|
when forwarding packets (gating), unconditionally
check tcp-syn packets for the mss-size option and
reduce it to fit the mtu of the outgoing interface.
this is done by exporting a new tcpmssclamp() function
from ip/tcp.c that takes an ip packet and its buffer size
and the effective mtu of the interface and adjusts
the mss value of tcp syn options.
this function is now also used by devbridge, enforcing
a tcp mss below the tunnel mtu.
|
|
This adds a new route "t"-flag that enables network address translation,
replacing the source address (and local port) of a forwarded packet to
one of the outgoing interface.
The state for a translation is kept in a new Translation structure,
which contains two Iphash entries, so it can be inserted into the
per protocol 4-tuple hash table, requiering no extra lookups.
Translations have a low overhead (~200 bytes on amd64),
so we can have many of them. They get reused after 5 minutes
of inactivity or when the per protocol limit of 1000 entries
is reached (then the one with longest inactivity is reused).
The protocol needs to export a "forward" function that is responsible
for modifying the forwarded packet, and then handle translations in
its input function for iphash hits with Iphash.trans != 0.
This patch also fixes a few minor things found during development:
- Include the Iphash in the Conv structure, avoiding estra malloc
- Fix ttl exceeded check (ttl < 1 -> ttl <= 1)
- Router should not reply with ttl exceeded for multicast flows
- Extra checks for icmp advice to avoid protocol confusions.
|
|
The ipoput4() and ipoput6() functions can raise an error(),
which means before calling sndrst() or limbo() (from tcpiput()),
we have to get rid of our blist by calling freeblist(bp).
Makse sure to set the Block pointer to nil after freeing in
ipiput() to avoid accidents.
Fix wrong panic string in sndsynack, and make any sending
functions like sndrst(), sndsynack() and tcpsendka()
return the value of ipoput*(), so we can distinguish
"no route" error.
Add a Enoroute[] string constant.
Both htontcp4() and htontcp6() can never return nil,
as they will allocate new or resize the existing block.
Remove the misleading error handling code that assumes
that it can fail.
Unlock proto on error in limborexmit() which can
be raised from sndsynack() -> ipoput*() -> error().
Make sndsynack() pass a Routehint pointer to ipoput*()
as it already did the route lookup, so we dont have todo
it twice.
|
|
(thanks joe9)
if the server responds without a window scale option in
its syn-ack, disable window scaling alltogether as both
sides need to understand the option.
|
|
Rhoden)
Under the normal close sequence, when we receive a FIN|ACK, we enter
TIME-WAIT and respond to that LAST-ACK with an ACK. Our TCP stack would
send an ACK in response to *any* ACK, which included FIN|ACK but also
included regular ACKs. (Or PSH|ACKs, which is what we were actually
getting/sending).
That was more ACKs than is necessary and results in an endless ACK storm
if we were under the simultaneous close sequence. In that scenario,
both sides of a connection are in TIME-WAIT. Both sides receive
FIN|ACK, and both respond with an ACK. Then both sides receive *those*
ACKs, and respond again. This continues until the TIME-WAIT wait period
elapses and each side's TCP timers (in the Plan 9 / Akaros case) shut
down.
The fix for this is to only respond to a FIN|ACK when we are in TIME-WAIT.
|
|
joe9)
|
|
|
|
|
|
connection without a tcp reset (used by go)
|
|
v6 mandates minimum mtu of 1280, tho someone *could* setup
an interface with a lower mtu or set it lower for testing.
|
|
we currently do not implement path mtu discovery so for
destinations that are not directly reachable assume the
minimum mtu of 1280 bytes.
|
|
|
|
|
|
|
|
|
|
for incoming connection, we used s->laddr to lookup the interface
for the incoming call, but this does not work when the announce
address is tcp!*!123, then s->laddr is all zeros "::". instead,
use the incoming destination address for interface mtu lookup.
thanks mycroftix for troubleshooting!
|
|
(thanks David du Colombier)
David du Colombier wrote:
> The slowness issue only appears on the loopback, because
> it provides a 16384 MTU.
>
> There is an old bug in the Plan 9 TCP stack, were the TCP
> MSS doesn't take account the MTU for incoming connections.
>
> I originally fixed this issue in January 2015 for the Plan 9
> port on Google Compute Engine. On GCE, there is an unusual
> 1460 MTU.
>
> The Plan 9 TCP stack defines a default 1460 MSS corresponding
> to a 1500 MTU. Then, the MSS is fixed according to the MTU
> for outgoing connections, but not incoming connections.
>
> On GCE, this issue leads to IP fragmentation, but GCE didn't
> handle IP fragmentation properly, so the connections
> were dropped.
>
> On the loopback medium, I suppose this is the opposite issue.
> Since the TCP stack didn't fix the MSS in the incoming
> connection, the programs sent multiple small 1500 bytes
> IP packets instead of large 16384 IP packets, but I don't
> know why it leads to such a slowdown.
|
|
|
|
this patch consists of two bits of work submitted as one
patch.
the first bit fixed a "pacing" problem, where a tcp connection
rate-limited by the reading process would experience 10%
of the expected throughput, and could even get into live
lock. it was noticed at the time of this initial work that
the stack often sent tiny grams. some good bits from nix'
original tcp were merged in. the test program
/n/sources/contrib/quanstro/tcptest.c
will verify that under most conditions, a reader-paced connection
now gets the expected throughput. expected arguments
would be
tcptest -s1 -n 5000 -l
the second bit is a first step in preparing tcp to handle
modest (1-2MB) bandwidth-delay products. the strategy
was to completely implement NewReno. the testing network
was a 7/35/70ms by 100Mbit wan emulator with 0/.05/.1% loss.
here are the performance comparisons from the changes after
the first round "old" to the submitted patch "new". the
smallest improvement was 80%, the largest was 11x.
loss% rtt old new
0.10 7 4.40 7.85
0.10 35 0.88 1.79
0.10 70 0.47 0.84
0.05 7 4.80 9.38
0.05 35 1.00 2.02
0.05 70 0.52 1.77
0.01 7 5.33 11.87
0.01 35 1.14 10.97
0.01 70 0.54 4.75
0.00 7 4.49 11.92
0.00 35 1.04 11.35
0.00 70 0.58 10.56
since the diff is not very easy to read, i wrote a small
paper detailing the changes
http://www.quanstro.net/plan9/tcp/tcp.pdf
- erik
|
|
|
|
|
|
|