summaryrefslogtreecommitdiff
path: root/sys/src/9/ip/ip.c
AgeCommit message (Collapse)Author
2022-09-18tcpmssclamp: pass correct size tcpmssclamp()cinap_lenrek
The len variable refers to the total length of a Block list in ipoputX(), but we need to pass the size of the buffer we pass instead, which can be less if there are multiple blocks.
2022-09-17devip: do tcp mss clamping when forwarding packetscinap_lenrek
when forwarding packets (gating), unconditionally check tcp-syn packets for the mss-size option and reduce it to fit the mtu of the outgoing interface. this is done by exporting a new tcpmssclamp() function from ip/tcp.c that takes an ip packet and its buffer size and the effective mtu of the interface and adjusts the mss value of tcp syn options. this function is now also used by devbridge, enforcing a tcp mss below the tunnel mtu.
2022-03-12devip: implement network address translation routescinap_lenrek
This adds a new route "t"-flag that enables network address translation, replacing the source address (and local port) of a forwarded packet to one of the outgoing interface. The state for a translation is kept in a new Translation structure, which contains two Iphash entries, so it can be inserted into the per protocol 4-tuple hash table, requiering no extra lookups. Translations have a low overhead (~200 bytes on amd64), so we can have many of them. They get reused after 5 minutes of inactivity or when the per protocol limit of 1000 entries is reached (then the one with longest inactivity is reused). The protocol needs to export a "forward" function that is responsible for modifying the forwarded packet, and then handle translations in its input function for iphash hits with Iphash.trans != 0. This patch also fixes a few minor things found during development: - Include the Iphash in the Conv structure, avoiding estra malloc - Fix ttl exceeded check (ttl < 1 -> ttl <= 1) - Router should not reply with ttl exceeded for multicast flows - Extra checks for icmp advice to avoid protocol confusions.
2021-10-09devip: cache arp entry in Routehintcinap_lenrek
Instead of having to do an arp hash table lookup for each outgoing ip packet, forward the Routehint pointer to the medium's bwrite() function and let it cache the arp entry pointer. This avoids route and arp hash table lookups for tcp, il and connection oriented udp. It also allows us to avoid multiple route and arp table lookups for the retransmits once an arp/neighbour solicitation response arrives.
2020-05-10devip: fix ifc recursive rlock() deadlockcinap_lenrek
ipiput4() and ipiput6() are called with the incoming interface rlocked while ipoput4() and ipoput6() also rlock() the outgoing interface once a route has been found. it is common that the incoming and outgoing interfaces are the same recusive rlocking(). the deadlock happens when a reader holds the rlock for the incoming interface, then ip/ipconfig tries to add a new address, trying to wlock the interface. as there are still active readers on the ifc, ip/ipconfig process gets queued on the inteface RWlock. now the reader finds the outgoing route which has the same interface as the incoming packet and tries to rlock the ifc again. but now theres a writer queued, so we also go to sleep waiting four outselfs to release the lock. the solution is to never wait for the outgoing interface rlock, but instead use non-queueing canrlock() and if it cannot be acquired, discard the packet.
2020-01-05devip: fix packet loss when interface is wlockedcinap_lenrek
to prevent deadlock on media unbind (which is called with the interface wlock()'ed), the medias reader processes that unbind was waiting for used to discard packets when the interface could not be rlocked. this has the unfortunate side effect that when we change addresses on a interface that packets are getting lost. this is problematic for the processing of ipv6 router advertisements when multiple RA's are getting received in quick succession. this change removes that packet dropping behaviour and instead changes the unbind process to avoid the deadlock by wunlock()ing the interface temporarily while waiting for the reader processes to finish. the interface media is also changed to the mullmedium before unlocking (see the comment).
2019-03-07devip: ignore the evil bit in fragment info fieldcinap_lenrek
using ~IP_DF mask to select offset and "more fragments" bits includes the evil bit 15. so instead define a constant IP_FO for the fragment offset bits and use (IP_MF|IP_FO). that way the evil bit gets ignored and doesnt cause any useless calls to ipreassemble().
2019-03-04devip: zero fragment offset after reassembly, remove tos magic, cleanupcinap_lenrek
2019-03-03devip: simplify ip reassembly functions, getting rid of Ipfrag.hlencinap_lenrek
given that we now keep the block size consistent with the ip packet size, the variable header part of the ip packet is just: BLEN(bp) - fp->flen == fp->hlen. fix bug in ip6reassemble() in the non-fragmented case: reload ih after ip header was moved before writing ih->ploadlen. use concatbloc() instead of pullupblock().
2019-03-03devip: fix ip fragmentation handling issues with header optionscinap_lenrek
some protocols assume that Ip4hdr.length[] and Ip6hdr.ploadlen[] are valid and not out of range within the block but this has not been verified. also, the ipv4 and ipv6 headers can have variable length options, which was not considered in the fragmentation and reassembly code. to make this sane, ipiput4() and ipiput6() now verify that everything is in range and trims to block to the expected size before it does any further processing. now blocklen() and Ip4hdr.length[] are conistent. ipoput4() and ipoput6() are simpler now, as they can rely on blocklen() only, not having a special routing case. ip fragmentation reassembly has to consider that fragments could arrive with different ip header options, so we store the header+option size in new Ipfrag.hlen field. unfraglen() has to make sure not to run past the buffer, and hadle the case when it encounters multiple fragment headers.
2018-09-23devip: fix default parameter calculation for router life-timecinap_lenrek
router life time is in seconds, while max ra interval is in milliseconds!
2018-05-10ip: add some primitive rate limiting knobs to counteract bufferbloatcinap_lenrek
2018-04-22devip: fix ipv6 icmp unreachable handling, fix retransmit, fix ifc locking, ↵cinap_lenrek
remove tentative check
2018-04-19devip: add "reflect" ctl message, fix memory leaks in icmpv6, fix source ↵cinap_lenrek
address for icmpttlexceeded, cleanup
2018-04-08devip: implement source specific routingcinap_lenrek
2018-03-18devip: more v6 improvementscinap_lenrek
ipv4local() and ipv6local() now take remote address argument, returning the closest local address to the source. this implements the standartized source address selection rules instead of just returning the first local v4 or v6 address. the source address selection was broken for esp, rudp an udp, blindly assuming ifc->lifc->local being a valid v4 address. use ipv6local() instead. the v6 routing code used to lookup source address route to decide to drop the packet instead of checking the interface on the destination route. factor out the route hint from Conv and put it in Routehint structure. avoiding stack bloat in v4 routing. implement the same trick for v6 avoiding second route lookup in ipoput6. fix memory leak in icmpv6 router solicitation handling. remove old unfinished handling of multiple v6 routers. should implement source specific routes instead. avoid duplication, use common convipvers() function. use isv4() instead of memcmp v4prefix.
2016-11-08kernel/ip: fix typo (rfc -> ifc)cinap_lenrek
2016-11-07ip: always pass a single block to Medium.bwrite(), avoid concatblock() calls ↵cinap_lenrek
in Dev.bwrite() the convention for Dev.bwrite() is that it accepts a *single* block, and not a block chain. so we never have concatblock here. to keep stuff consistent, we also guarantee thet Medium.bwrite() will get a *single* block passed as well, as the callers are few in number.
2014-12-21ip: exclude "don't fragment" bit from ipv4 reassembly testcinap_lenrek
other operating systems always set the "don't fragment" bit in ther outgoing ipv4 packets causing us to unnecesarily call ip4reassemble() looking for a fragment reassembly queue. the change excludes the "don't fragment" bit from the test so we now call ip4reassemble() only when the "more fragmens" bit is set or a fragment offset other than zero is given. this optimization was discovered from akaros.
2012-08-02ip: fix assert panic on fragmented icmp echo request (see eriks icmp-frag patch)cinap_lenrek
2011-03-30Import sources from 2011-03-30 iso image - libTaru Karttunen
2011-03-30Import sources from 2011-03-30 iso imageTaru Karttunen