commented:

Go is a notable exception in that it avoids the C runtime on most
platforms, but Apple requires a C runtime to access syscalls.

Apple uses libSystem.dylib as the ABI stability boundary for syscalls,
NT-lineage Windows has ntdll.dll as the ABI stability boundary, not
syscalls. The BSDs use libc as that boundary. On OpenBSD, I believe Go
sets some kind of "opt out of NX-bit enforcement"-esque metadata flag
to opt out of having the kernel kill it for attempting to syscall from
a location outside of the read-only libc mapping the loader set up.
EDIT: To clarify, libSystem.dylib contains the functionality which
would normally be libc.so plus other things so, in that respect, it's
the same BSD-verse "libc is the stability boundary" dance.
EDIT: As of Go 1.16, Go now uses libc on OpenBSD to comply with their
syscalling policy.
Linux is the anomaly [uncommon] in having stable syscall numbers
instead of a "piece of the kernel that gets loaded into process
address space as a dynamic library and shares an unstable enum
definition of syscalls with the kernel-mode code" because Linux and
glibc aren't developed together in the same repo the way everyone else
does it.

There’s an entire ecosystem of processing that happens before the
function you declared as main starts up. C uses this to configure
allocation, file access, thread-local storage and other C runtime
services. Rust uses this time to configure parts of its own language
and runtime. Specifically, Rust has infrastructure to handle panics
and unwinding. Rust also needs to translate the C-style program
arguments into its own std::env::args interface.

On Windows, the C runtime is also responsible for parsing the
CP/M-style command string that MS-DOS copied (and Windows's subprocess
spawning APIs continued) into a POSIX-style argv array. That's why
Python's subprocess module documentation has a section named
Converting an argument sequence to a string on Windows about how it
will convert your argv array to a string following the quoting rules
baked into the MS C runtime, which the invoked subprocess's own parser
can deviate from if it so chooses.

On Linux, this hook is usually named _start and the linker
automatically adds whatever symbol has that name to the binary.

Not quite. If an ELF-format binary is an executable rather than just a
library, the e_entry field in its header (offset 0x18) contains the
address for the loader to jump to after setting it up in memory.
_start is GCC's convention (which things like NASM copy, IIRC) for how
you specify what e_entry should point to when you opt out of libc
providing it for you.

A similar hook exists on Windows, and boots the executable in a
function named _WinMainCRTStartup. At this point the C runtime has a
chance to configure itself, and the way that all runtimes do this is
via initialization functions.

Which the loader finds via AddressOfEntryPoint  in the PE header.
Offset 0x0028 from the start of the PE header, which comes after the
MZ (DOS EXE) header and DOS Stub.
EDIT 1:
Making the smallest Windows application and then Tiny PE are a good
way to learn more about the ins and outs of PE headers through the
vehicle of their authors figuring out how they can make smaller
executables. (Tiny PE violates the PE spec in accepted-by-Windows ways
such as overlapping stuff where it knows the OS won't read one of the
things being overlapped and stuffing code into unused header fields...
but if you go this far, the smallest file Windows will accept is
dependent on which Windows version you run it on.)
See also A Whirlwind Tutorial on Creating Really Teensy ELF
Executables for Linux.
EDIT 2: OK, done.

  commented:
Thanks - that's a very useful clarification. I had though it was
officially libc on macOS. I'll digest and integrate these
clarifications if that's alright with you.

  commented:
To be honest, I'm still reading. I started responding in-situ since
any clarification which comes later in the post would likely be missed
by someone skim-reading for want of a cross-reference, so I considered
it a reasonable time to respond.
I'll add an EDIT: boundary for stuff added after [I noticed that] you
replied.
EDIT: Oh, and yes. Feel free to integrate what I wrote.

  commented:
Re _start, on a.out systems the entry point from the kernel to an
executable was traditionally called start as declared in csu/crt0, eg
7th edition, VAX BSD. In that era the C compiler stuck a _ on the
front of its global symbols so you can see V7 declares _main and BSD
declares the asm name for C start() as unadorned start. In that era a
program started at the beginning and cc’s linker invocation arranged
for crt0 to come first. (csu = (lib) C startup, crt0 = zeroth C
runtime support object)
It’s harder to find out exactly how things worked in System V where
ELF came from, but start or _start continued to be the program entry
point declared in csu/crt0. I have never bothered to properly
understand how ELF changed _ prefixing: I think they added another
layer of it for funsies or something? Which caused start to become
_start for some reason?
I think it was ELF that added the obvious counterpart _end which
corresponds to the top of the BSS, i.e, what sbrk(0) would return
before malloc() creates its heap.

  commented:

The BSDs use libc as that boundary.

FreeBSD and NetBSD syscalls have ABI stability, as well as their
system libraries.

  commented:
Can you provide citations for this? I've written about the subject
years ago. While researching the subject, I found a lot of conflicting
information when it came to the various BSDs. I've seen people claim
they have stable system calls but back then I found forum posts and
mailing lists describing the opposite.
Linux documents this in the repository itself. Linus himself is on the
record saying it. Are there equally authoritative promises of system
call binary interface stability at the instruction set level from the
BSDs?

  commented:
https://cgit.freebsd.org/src/tree/sys/conf/NOTES#n330
https://cgit.freebsd.org/src/tree/sys/amd64/conf/GENERIC#n62
https://www.netbsd.org/gallery/presentations/joerg/asiabsdcon2016/asiabsdcon2016.pdf

  commented:
I don't see how any of those links contain an authoritative claim that
it's the intention of the FreeBSD or NetBSD projects to keep the
syscall numbers and ABI stable?
The PDF you link says that NetBSD has kept stability, but that's an
observation about history, not an authoritative statement of intent.

  commented:
Huh. It appears you're right for FreeBSD... at least if they follow
this wiki page well, and NetBSD does appear to follow the same
philosophy... but both were buried under a flood of contradictory
information. With that and hazy memories of Rust issues like #92466,
no wonder I was mistaken.
Correction memorized.

  commented:

It’s a highly-ordered, highly-controllable environment that lets you
more confidently do a lot of work without locks, atomics and other
synchronization primitives

The body of main is an even more ordered, even more controllable
environment that lets you confidently do a lot of work without locks,
atomics and other synchronization primitives. Well it is at least as
long as nobody starts undermining those properties by putting code
else where that runs before it.

One advantage we have in doing work before main is that it is
well-behaved. No threads are running unless we start them.

This is a perfect example, no threads are running in main until we
start them either, except of course if we start breaking the no life
before main guarantees, and now there can be, and unlike threads
started from main they’re not obvious in the code.

Runtimes make use of this pre-main phase because it guarantees (1)
running before user code, and (2) a single-threaded, highly-consistent
and predictably-ordered environment, which allow for reliable and
deterministic initialization

This is putting a misleading emphasis IMO. The runtime isn’t making
“use of” the pre main phase, it is the pre main phase (among other
things), it is what calls main.

  commented:
I've been interested in life-before-main in Rust for a while and
thought it would be useful to put it all together into a post that
explains what it is and why it's useful. I've got some thoughts on
future posts along these lines, like how you can build faster
collections that make use of linker aggregation, but I'd love to hear
feedback on this first "intro-focused" topic.

  commented:
I’ve been doing a lot of embedded (thus no_std and sometimes even
no-alloc) Rust, where main is just another function, and
initialization is largely up to the developer. There’s quite a bit of
hand-rolled boilerplate in the codebase for similar use cases, so I’m
curious how these crates relate to that environment.

  commented:
Assuming you are using LLVM's or GCC's linker, all of these crates
should work identically, though you'll likely need to manually
configure your linker script to set up .init_array and
..init_array.NNNNN properly and add a function to iterate over them.
The orphan section start/stop symbols are platform-independent magic
with those toolchains (including platformless embedded!).
Each of the crates in linktime should support both no_std and no_alloc
out of the box (although I probably need to test to ensure
scattered-collect compiles - it does not require either std or alloc,
but it's just "untested").
I find link-time aggregation of data to be extremely useful and it's
nice to avoid all the boilerplate that the post shows for a few more
complex cases. I'd be happy to discuss the particular usecases you
have in mind and see if there's a good way to apply it.
.