Re: This is strictly a violation of the TCP specification
                  by Christopher Williams
                         2026-03-02


This is in response to Cloudflare’s article titled “This is
strictly a violation of the TCP specification”[1]. Yes, I
realize it’s nearly ten years old. But recently I’ve been
researching TCP and its RFCs and trying to better understand
how it’s supposed to work. And the article discussion on
Hacker News has already closed, leaving me with few outlets
to discuss it.

To give a brief overview, the server software in question
was not always closing its side of connections (this was
identified as a bug in that software), leaving those
connections in the CLOSE_WAIT state; this happens when
the client has closed its side (they’ve ended up in the
FIN_WAIT_2 state) and the server OS is waiting for the
server application to close its side. Many clients (Linux,
Windows, and possibly FreeBSD or some other systems)
automatically forcibly close their side of the connection
as a DoS security measure, moving from FIN_WAIT_2 to CLOSED
after a minute or so. This leaves one side completely closed
and the other side in CLOSE_WAIT. The problem, observed as
clients sometimes being unable to connect to the server,
occurs when one of these clients uses the same local port
to make a new connection to the server. The server, seeing
that a connection already exists (in the CLOSE_WAIT state),
ignores the new connection request. The client keeps
retrying to establish the new connection, retransmitting SYN
segments for up to a few minutes before timing out.

The article concludes by pointing the finger at the
client behavior (i.e., automatically closing FIN_WAIT_2
connections) as the root issue, as it’s “strictly a
violation of the TCP specification”. But is it really a
violation? I argue that it might not be.

The TCP spec (RFC 9293[2], previously RFC 793[3]) defines
a “fictional” user/TCP interface, but it leaves a lot of
the details to an actual implementation. For example,
Berkeley sockets (used by BSD, Linux, Windows, you name
it) defines an interface that in some ways bears only a
slight resemblance to the one defined in the spec, but
it provides the required functionality. Most systems, as
mentioned, will close a socket that’s in the FIN_WAIT_2
state after a while without waiting for a FIN from the
remote side. Note that Linux (I haven’t checked other
systems) does this only if the application on the local side
has expressed its intent to close the socket for reading
(e.g., by calling `shutdown(fd, SHUT_RD)` or by closing the
file descriptor, which does basically the same thing as
`shutdown(fd, SHUT_RDWR)`); in other words, the connection
is not automatically closed if the application has a handle
(file descriptor) and can still read from the socket. You
can look at this intent as a hint that the system is allowed
to abort the connection if the remote side hasn’t closed its
side after a period of time—after all, if the remote side
eventually _does_ send more data, that data would have to be
discarded anyway as the application can no longer receive
it.

Now, one could argue that Linux should send a reset (RST)
in that case, but that shouldn’t matter for this particular
issue if it were not for another problem; even if Linux
_did_ send a reset, the other problem would still occur if
that reset were lost.

So what’s the other problem? Here’s the relevant text in
both RFC 793 and RFC 9293:

 ``If an incoming segment is not acceptable, an
   acknowledgment should be sent in reply (unless the RST
   bit is set, if so drop the segment and return):

   `<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>`

   After sending the acknowledgment, drop the
   unacceptable segment and return.

“Not acceptable” means the sequence number is outside a
relatively small window, which is likely the case with a
SYN segment for a new connection as the sequence number
for each new connection is essentially random and highly
unlikely to be inside the window of a previous connection.
In the unlikely event that the sequence number falls
within the window, RFC 793 requires sending a RST and
closing the connection, and RFC 9293 (via RFC 5961[4])
recommends sending a “challenge ACK”. A “challenge ACK”
is simply an ACK segment with the current sequence and
acknowledgment numbers which both mitigates blind attacks
and re-synchronizes hosts that have somehow gotten out
of sync. This is effectively the same response as with a
not-acceptable segment above.

What that all means is that the server should have responded
to the SYN segment, either with a challenge ACK or a RST. In
the most likely scenario, if the client received a challenge
ACK it would have closed its side (as a refused connection)
and sent a reset to the server with the sequence number set
to the acknowledgment number from the server; the server
receiving the reset segment would then reset its side of
the connection. In the far less likely scenario where the
sequence number is in the window and RFC 9293/RFC 5961’s
recommendation is not followed, the client would receive a
reset and inform the user that its connection was reset, and
the server also would have closed its connection.

In either case, the connection would be reset nearly
instantly, unless an ACK segment or a RST segment was lost,
which would resolve itself after a short time because the
client would retransmit its SYN after a second or so.

------------------------------------------------------------
                      My TCP/IP stack
------------------------------------------------------------

Why do I care about this? The simple answer is that I’m
developing a simple TCP/IP stack along the lines of uIP[5]
by Adam Dunkels. But why? Why not?

        ----------------------------------------------------
        While developing this TCP/IP stack, I found (and
        reported) a couple minor errors in RFC 9293. You
        can see my reported errata[6] if you care enough to
        look. I found a few more possible errata that I plan
        to submit soonish.

        I’m also making a private version of RFC 9293
        with prose and other verbosity stripped out and
        a few aspects of it simplified as well (e.g., my
        implementation doesn’t support sending data in a SYN
        segment and supports only one unacknowledged segment
        in-flight, so checking the ACK is much simpler in
        both cases). My goodness it’s wordy! But that’s
        understandable, considering TCP (as well as the
        “catenet”, or the “Internet” as it’s known today)
        was a relatively new technology in its day.
        ----------------------------------------------------

One reason I’m developing a TCP/IP stack is that I want to
make a few design changes to uIP, but its code resembles
spaghetti (a lot of `goto`s). One design decision I’d like
to change is in how it supports half-closed connections;
simply put, it doesn’t. Once a peer closes its side,
uIP automatically closes its side. And once the local
application closes its side, the application can no longer
receive data from the peer. That certainly avoids leaving
connections in the CLOSE_WAIT state, and it also avoids
leaving connections in the FIN_WAIT_2 state because a
connection in that state is forcibly closed if a FIN hasn’t
been received for a couple minutes. The downside is that
this limits some applications that can be written for uIP.
For example, how does an application talk to a service that
requires an “EOF” (FIN) from the client before fulfilling a
request? It can’t.

Another reason is that I simply enjoy creating my own
software to better my understanding.

I’m also currently writing my stack in C++ to give it a
lightweight object-oriented interface (C++ compilers are
very good at optimizing these days, so I’m not too worried
about size and performance). I plan to write a thin uIP
compatibility layer to ease converting an application
from uIP to my stack, though an application might need
minor changes due to fundamental incompatibilities (for
one, uIP gives an application direct access to members of
the `uip_conn` structure which likely won’t exist in a
compatible way in my version).

I still have a lot more to research and develop so don’t
expect anything any time soon.

------------------------------------------------------------
                         References
------------------------------------------------------------

[1] https://blog.cloudflare.com/this-is-strictly-a-violation-of-the-tcp-specification/
[2] https://www.rfc-editor.org/rfc/rfc9293.html
[3] https://www.rfc-editor.org/rfc/rfc793.html
[4] https://www.rfc-editor.org/rfc/rfc5961.html
[5] https://github.com/adamdunkels/uip
[6] https://www.rfc-editor.org/errata/rfc9293