summaryrefslogtreecommitdiff
path: root/sys/src/9/ip/tcp.c
AgeCommit message (Collapse)Author
2022-12-30tcp: only create new translation when SYN packetcinap_lenrek
2022-12-18devip: fix icmp bugscinap_lenrek
icmpdontfrag() was not working properly, need to pass the gating source interface. in fact, we now always pass the source interface to all icmp*() functions, which is used to determine source ip address of the icmp reply. also dont generate a icmp response for packets going to non-unicast addresses (such as broadcast). increase the amount of icmp response payload, but keep icmp responses below the minimum ipv4 mtu (68 bytes). regularize icmpv6 function names. move icmp unreachable codes to icmpv6.c. provide the mtu value for icmppkttoobig6(). dont advise announced udp connections. avoid code duplication in icmp.c and icmpv6.c, by having single send function with type, code and arg parameters. maintain statistics for sent ipv4 icmp types. avoid route lookup in ipout*() by passing Routehint* to icmpnohost*(). iladvise()... more like ill advice.
2022-12-13devip: tcpmssclamp() to minimum of source and destination interface MTUcinap_lenrek
We used to only clamp to the MTU of the destination interface, but this is wrong. We have to clamp to the minimum of both source and destination. For this, we change the gating argument type of ipoput4() and ipoput6() from int to Ipifc* to pass the source interface.
2022-09-17tcpmssclamp: only check the first ipv4 fragment for tcp headercinap_lenrek
2022-09-17devip: do tcp mss clamping when forwarding packetscinap_lenrek
when forwarding packets (gating), unconditionally check tcp-syn packets for the mss-size option and reduce it to fit the mtu of the outgoing interface. this is done by exporting a new tcpmssclamp() function from ip/tcp.c that takes an ip packet and its buffer size and the effective mtu of the interface and adjusts the mss value of tcp syn options. this function is now also used by devbridge, enforcing a tcp mss below the tunnel mtu.
2022-03-12devip: implement network address translation routescinap_lenrek
This adds a new route "t"-flag that enables network address translation, replacing the source address (and local port) of a forwarded packet to one of the outgoing interface. The state for a translation is kept in a new Translation structure, which contains two Iphash entries, so it can be inserted into the per protocol 4-tuple hash table, requiering no extra lookups. Translations have a low overhead (~200 bytes on amd64), so we can have many of them. They get reused after 5 minutes of inactivity or when the per protocol limit of 1000 entries is reached (then the one with longest inactivity is reused). The protocol needs to export a "forward" function that is responsible for modifying the forwarded packet, and then handle translations in its input function for iphash hits with Iphash.trans != 0. This patch also fixes a few minor things found during development: - Include the Iphash in the Conv structure, avoiding estra malloc - Fix ttl exceeded check (ttl < 1 -> ttl <= 1) - Router should not reply with ttl exceeded for multicast flows - Extra checks for icmp advice to avoid protocol confusions.
2021-10-11devip: improve tcp error handling for ipoputcinap_lenrek
The ipoput4() and ipoput6() functions can raise an error(), which means before calling sndrst() or limbo() (from tcpiput()), we have to get rid of our blist by calling freeblist(bp). Makse sure to set the Block pointer to nil after freeing in ipiput() to avoid accidents. Fix wrong panic string in sndsynack, and make any sending functions like sndrst(), sndsynack() and tcpsendka() return the value of ipoput*(), so we can distinguish "no route" error. Add a Enoroute[] string constant. Both htontcp4() and htontcp6() can never return nil, as they will allocate new or resize the existing block. Remove the misleading error handling code that assumes that it can fail. Unlock proto on error in limborexmit() which can be raised from sndsynack() -> ipoput*() -> error(). Make sndsynack() pass a Routehint pointer to ipoput*() as it already did the route lookup, so we dont have todo it twice.
2019-05-22devip: if the server does not support TCP ws option, disable window scaling ↵cinap_lenrek
(thanks joe9) if the server responds without a window scale option in its syn-ack, disable window scaling alltogether as both sides need to understand the option.
2019-01-27devip: tcp: Don't respond to FIN-less ACKs during TIME-WAIT (thanks Barret ↵cinap_lenrek
Rhoden) Under the normal close sequence, when we receive a FIN|ACK, we enter TIME-WAIT and respond to that LAST-ACK with an ACK. Our TCP stack would send an ACK in response to *any* ACK, which included FIN|ACK but also included regular ACKs. (Or PSH|ACKs, which is what we were actually getting/sending). That was more ACKs than is necessary and results in an endless ACK storm if we were under the simultaneous close sequence. In that scenario, both sides of a connection are in TIME-WAIT. Both sides receive FIN|ACK, and both respond with an ACK. Then both sides receive *those* ACKs, and respond again. This continues until the TIME-WAIT wait period elapses and each side's TCP timers (in the Plan 9 / Akaros case) shut down. The fix for this is to only respond to a FIN|ACK when we are in TIME-WAIT.
2018-11-18devip: fix swapped tcp snd.scale and recv.scale in tcpstate() format (thanks ↵cinap_lenrek
joe9)
2018-04-22devip: cleanup tcp.ccinap_lenrek
2018-04-08devip: implement source specific routingcinap_lenrek
2017-01-12kernel: add "close" ctl message for tcp connection to gracefully hang up a ↵cinap_lenrek
connection without a tcp reset (used by go)
2016-11-16ip/tcp: never raise the mss over the link mtu < 1280 for v6cinap_lenrek
v6 mandates minimum mtu of 1280, tho someone *could* setup an interface with a lower mtu or set it lower for testing.
2016-11-15ip/tcp: only calculae mss from interface mtu when directly reachable for v6cinap_lenrek
we currently do not implement path mtu discovery so for destinations that are not directly reachable assume the minimum mtu of 1280 bytes.
2016-11-08kernel/ip: remove nil checks for allocb() and padblock()cinap_lenrek
2016-11-07ip/tcp: remove useless nil checks for padblock() and allocb() return valuecinap_lenrek
2016-10-23ip: simplify code as packblock() and concatblock() will never errorcinap_lenrek
2016-03-12devip: handle ignoreadvice flag for all protocolscinap_lenrek
2015-09-02tcp: fix mtu on server sockets again (thans mycroftix)cinap_lenrek
for incoming connection, we used s->laddr to lookup the interface for the incoming call, but this does not work when the announce address is tcp!*!123, then s->laddr is all zeros "::". instead, use the incoming destination address for interface mtu lookup. thanks mycroftix for troubleshooting!
2015-05-14tcp: fix loopback slowness issue / set tcb->mss for incoming connections ↵cinap_lenrek
(thanks David du Colombier) David du Colombier wrote: > The slowness issue only appears on the loopback, because > it provides a 16384 MTU. > > There is an old bug in the Plan 9 TCP stack, were the TCP > MSS doesn't take account the MTU for incoming connections. > > I originally fixed this issue in January 2015 for the Plan 9 > port on Google Compute Engine. On GCE, there is an unusual > 1460 MTU. > > The Plan 9 TCP stack defines a default 1460 MSS corresponding > to a 1500 MTU. Then, the MSS is fixed according to the MTU > for outgoing connections, but not incoming connections. > > On GCE, this issue leads to IP fragmentation, but GCE didn't > handle IP fragmentation properly, so the connections > were dropped. > > On the loopback medium, I suppose this is the opposite issue. > Since the TCP stack didn't fix the MSS in the incoming > connection, the programs sent multiple small 1500 bytes > IP packets instead of large 16384 IP packets, but I don't > know why it leads to such a slowdown.
2013-11-22kernel: more kproc pexit() and sleep error handlingcinap_lenrek
2013-07-21apply erik quanstros tcp-bdp patch (from sources)cinap_lenrek
this patch consists of two bits of work submitted as one patch. the first bit fixed a "pacing" problem, where a tcp connection rate-limited by the reading process would experience 10% of the expected throughput, and could even get into live lock. it was noticed at the time of this initial work that the stack often sent tiny grams. some good bits from nix' original tcp were merged in. the test program /n/sources/contrib/quanstro/tcptest.c will verify that under most conditions, a reader-paced connection now gets the expected throughput. expected arguments would be tcptest -s1 -n 5000 -l the second bit is a first step in preparing tcp to handle modest (1-2MB) bandwidth-delay products. the strategy was to completely implement NewReno. the testing network was a 7/35/70ms by 100Mbit wan emulator with 0/.05/.1% loss. here are the performance comparisons from the changes after the first round "old" to the submitted patch "new". the smallest improvement was 80%, the largest was 11x. loss% rtt old new 0.10 7 4.40 7.85 0.10 35 0.88 1.79 0.10 70 0.47 0.84 0.05 7 4.80 9.38 0.05 35 1.00 2.02 0.05 70 0.52 1.77 0.01 7 5.33 11.87 0.01 35 1.14 10.97 0.01 70 0.54 4.75 0.00 7 4.49 11.92 0.00 35 1.04 11.35 0.00 70 0.58 10.56 since the diff is not very easy to read, i wrote a small paper detailing the changes http://www.quanstro.net/plan9/tcp/tcp.pdf - erik
2012-07-09tcp: memset paranoia, synced from sourcescinap_lenrek
2011-03-30Import sources from 2011-03-30 iso image - libTaru Karttunen
2011-03-30Import sources from 2011-03-30 iso imageTaru Karttunen