Re: This is strictly a violation of the TCP specification by Christopher Williams 2026-03-02 This is in response to Cloudflare’s article titled “This is strictly a violation of the TCP specification”[1]. Yes, I realize it’s nearly ten years old. But recently I’ve been researching TCP and its RFCs and trying to better understand how it’s supposed to work. And the article discussion on Hacker News has already closed, leaving me with few outlets to discuss it. To give a brief overview, the server software in question was not always closing its side of connections (this was identified as a bug in that software), leaving those connections in the CLOSE_WAIT state; this happens when the client has closed its side (they’ve ended up in the FIN_WAIT_2 state) and the server OS is waiting for the server application to close its side. Many clients (Linux, Windows, and possibly FreeBSD or some other systems) automatically forcibly close their side of the connection as a DoS security measure, moving from FIN_WAIT_2 to CLOSED after a minute or so. This leaves one side completely closed and the other side in CLOSE_WAIT. The problem, observed as clients sometimes being unable to connect to the server, occurs when one of these clients uses the same local port to make a new connection to the server. The server, seeing that a connection already exists (in the CLOSE_WAIT state), ignores the new connection request. The client keeps retrying to establish the new connection, retransmitting SYN segments for up to a few minutes before timing out. The article concludes by pointing the finger at the client behavior (i.e., automatically closing FIN_WAIT_2 connections) as the root issue, as it’s “strictly a violation of the TCP specification”. But is it really a violation? I argue that it might not be. The TCP spec (RFC 9293[2], previously RFC 793[3]) defines a “fictional” user/TCP interface, but it leaves a lot of the details to an actual implementation. For example, Berkeley sockets (used by BSD, Linux, Windows, you name it) defines an interface that in some ways bears only a slight resemblance to the one defined in the spec, but it provides the required functionality. Most systems, as mentioned, will close a socket that’s in the FIN_WAIT_2 state after a while without waiting for a FIN from the remote side. Note that Linux (I haven’t checked other systems) does this only if the application on the local side has expressed its intent to close the socket for reading (e.g., by calling `shutdown(fd, SHUT_RD)` or by closing the file descriptor, which does basically the same thing as `shutdown(fd, SHUT_RDWR)`); in other words, the connection is not automatically closed if the application has a handle (file descriptor) and can still read from the socket. You can look at this intent as a hint that the system is allowed to abort the connection if the remote side hasn’t closed its side after a period of time—after all, if the remote side eventually _does_ send more data, that data would have to be discarded anyway as the application can no longer receive it. Now, one could argue that Linux should send a reset (RST) in that case, but that shouldn’t matter for this particular issue if it were not for another problem; even if Linux _did_ send a reset, the other problem would still occur if that reset were lost. So what’s the other problem? Here’s the relevant text in both RFC 793 and RFC 9293: ``If an incoming segment is not acceptable, an acknowledgment should be sent in reply (unless the RST bit is set, if so drop the segment and return): `` After sending the acknowledgment, drop the unacceptable segment and return. “Not acceptable” means the sequence number is outside a relatively small window, which is likely the case with a SYN segment for a new connection as the sequence number for each new connection is essentially random and highly unlikely to be inside the window of a previous connection. In the unlikely event that the sequence number falls within the window, RFC 793 requires sending a RST and closing the connection, and RFC 9293 (via RFC 5961[4]) recommends sending a “challenge ACK”. A “challenge ACK” is simply an ACK segment with the current sequence and acknowledgment numbers which both mitigates blind attacks and re-synchronizes hosts that have somehow gotten out of sync. This is effectively the same response as with a not-acceptable segment above. What that all means is that the server should have responded to the SYN segment, either with a challenge ACK or a RST. In the most likely scenario, if the client received a challenge ACK it would have closed its side (as a refused connection) and sent a reset to the server with the sequence number set to the acknowledgment number from the server; the server receiving the reset segment would then reset its side of the connection. In the far less likely scenario where the sequence number is in the window and RFC 9293/RFC 5961’s recommendation is not followed, the client would receive a reset and inform the user that its connection was reset, and the server also would have closed its connection. In either case, the connection would be reset nearly instantly, unless an ACK segment or a RST segment was lost, which would resolve itself after a short time because the client would retransmit its SYN after a second or so. ------------------------------------------------------------ My TCP/IP stack ------------------------------------------------------------ Why do I care about this? The simple answer is that I’m developing a simple TCP/IP stack along the lines of uIP[5] by Adam Dunkels. But why? Why not? ---------------------------------------------------- While developing this TCP/IP stack, I found (and reported) a couple minor errors in RFC 9293. You can see my reported errata[6] if you care enough to look. I found a few more possible errata that I plan to submit soonish. I’m also making a private version of RFC 9293 with prose and other verbosity stripped out and a few aspects of it simplified as well (e.g., my implementation doesn’t support sending data in a SYN segment and supports only one unacknowledged segment in-flight, so checking the ACK is much simpler in both cases). My goodness it’s wordy! But that’s understandable, considering TCP (as well as the “catenet”, or the “Internet” as it’s known today) was a relatively new technology in its day. ---------------------------------------------------- One reason I’m developing a TCP/IP stack is that I want to make a few design changes to uIP, but its code resembles spaghetti (a lot of `goto`s). One design decision I’d like to change is in how it supports half-closed connections; simply put, it doesn’t. Once a peer closes its side, uIP automatically closes its side. And once the local application closes its side, the application can no longer receive data from the peer. That certainly avoids leaving connections in the CLOSE_WAIT state, and it also avoids leaving connections in the FIN_WAIT_2 state because a connection in that state is forcibly closed if a FIN hasn’t been received for a couple minutes. The downside is that this limits some applications that can be written for uIP. For example, how does an application talk to a service that requires an “EOF” (FIN) from the client before fulfilling a request? It can’t. Another reason is that I simply enjoy creating my own software to better my understanding. I’m also currently writing my stack in C++ to give it a lightweight object-oriented interface (C++ compilers are very good at optimizing these days, so I’m not too worried about size and performance). I plan to write a thin uIP compatibility layer to ease converting an application from uIP to my stack, though an application might need minor changes due to fundamental incompatibilities (for one, uIP gives an application direct access to members of the `uip_conn` structure which likely won’t exist in a compatible way in my version). I still have a lot more to research and develop so don’t expect anything any time soon. ------------------------------------------------------------ References ------------------------------------------------------------ [1] https://blog.cloudflare.com/this-is-strictly-a-violation-of-the-tcp-specification/ [2] https://www.rfc-editor.org/rfc/rfc9293.html [3] https://www.rfc-editor.org/rfc/rfc793.html [4] https://www.rfc-editor.org/rfc/rfc5961.html [5] https://github.com/adamdunkels/uip [6] https://www.rfc-editor.org/errata/rfc9293