commented:

But there’s one really big problem: supply chain attacks.

No signed contract with an offer and consideration; not a supply
chain!
Separately, from the point of view of ensuring your dependencies don't
change out from underneath you, a lock file with hashes (or the Golang
minimal versioning scheme) are identical to vendoring your
dependencies. I hear you on the friction argument where vendoring
truly is different. But consider that in the limit, it pushes you to
either write your own implementations of things, or—even worse—vibe
code your dependencies, and I'd rather use well-tested software
written by domain experts.
edit: wanted to add:

There’s a ceiling to the complexity this will tolerate, though
companies like Google and Facebook(?) with giant monorepos demonstrate
that this ceiling is probably a lot higher than you think.

I worked at Facebook on this stuff, and I wouldn't wish its
third-party dependency story on anybody. (There can be at most two
semver-incompatible versions of a particular direct dependency to a
Rust crate across all of fbsource at any given time. If you want to
update a dependency, you have to take on the burden of updating all of
fbsource.) I think what Facebook does is okay for Facebook, but it
isn't particularly great or sustainable.

  commented:
Out of curiosity, why "at most two"? For being able to migrate
incrementally from an older version to a newer version?
Re: "isn't particularly great or sustainable", I suspect this is
really a function of scale and not the policy. Allowing different
versions creates its own problems in other ways because most modern
languages with the exception of TypeScript predominantly or
exclusively use nominal typing, preventing type reuse across versions
unless you're using the "semver trick" whenever there's a breaking
change...
As one anecdote, when Log4Shell came out, I distinctly remember that
companies which had lots of different versions, and versions scattered
in more places, had a harder time upgrading compared to companies with
few versions/pinning.

  commented:

Out of curiosity, why "at most two"? For being able to migrate
incrementally from an older version to a newer version?

Yeah -- so you must finish the upgrade from (for example) rand 0.8 to
0.9 before starting the upgrade from rand 0.9 to 0.10.

Re: "isn't particularly great or sustainable", I suspect this is
really a function of scale and not the policy. Allowing different
versions creates its own problems in other ways because most modern
languages with the exception of TypeScript predominantly or
exclusively use nominal typing, preventing type reuse across versions
unless you're using the "semver trick" whenever there's a breaking
change...

It is true that this is a matter of scale, yeah. I guess my point is
that I'd put a massive asterisk on "this ceiling is probably a lot
higher than you think".

  commented:
Ouch. That sounds like someone got burned hard in the past before they
imposed that rule.

  commented:
You're right, you're right, dependency attacks then.  &lt;3

  commented:
The Zig package manager imho is a really cool compromise:
All packages are pinned with a content hash, so lockfile-by-default.
It doesn't suffer from the "upstream suddenly got malicous" problem,
but still has the issue of "upstream's gone".
Except that it has both a global and local cache, content
hash-addressed, so when your upstream is gone, you just yeet a
tarball of your local copy where-ever you need it.
It's a really good compromise between "vendor sources" and "simple,
reusable software".

  commented:
You could extend that to all of your software actually, and it'd be
pretty cool. Keep all of your sources in a content-addressable store,
and hash each program with based on the hash of its inputs.

  commented:
Maybe I'm misunderstanding but this sounds like what Nix does.

  commented:
That's exactly what Nix does, and I was being a bit cheeky with my
description in GP. It's a shame that it's still not as great at being
the project-level build tool, because it's really great for everything
up to that.

  commented:
Interestingly, although Nix does hash the inputs (ideally¹ meaning
source code), it doesn't actually protect too well the outputs against
subversion... because it regularly downloads the outputs (i.e.
compiled binaries) from a cache, and thus requires extending the trust
to the cache server... The core packages presumably download from the
official cache, but some thirdparty packages do like to try and
happily mandate extending global trust permanently towards external
caches, posing a concrete security concern. AFAIU there's a (slow
going, maybe even stuck) effort to try and support content-addressed
fetching of binaries (i.e. outputs) as well; I kinda hope for a future
when the nixpkgs repo includes expected hashes of the build outputs,
and the nix command downloads them via DHT from torrents or somewhere
such, with the official Nix Hydra server only working as a torrent
seed. (See also: related, related/archive, related, related, related.)
¹ Quite often, packages definitions unfortunately use prebuilt binary
blobs, i.e. "releases" from github, as "inputs". At least they have to
hash them, thus pinning/"content-addressing", but then again the trust
is moved onto the person who uploaded the compiled binary to the
github releases page, thus introducing disconnect from the actual
source code.

  commented:
Haha, I thought you might be referring to Nix, but I didn't want to
assume.
I find Nix to be an acceptable build tool, not really much better or
worse than other tools of a similar complexity, although I haven't
tried something like Bazel. I'm interested to know what you see as a
great build tool?

  commented:
Nix is fantastic at orchestrating packages, though it makes the
unfortunate reality of build gnarliness inescapable (for good and
ill). If you watch it build without substituters you see amusing
things liks curl depend on brotli which depends on cmake which depends
on another curl which depends on nghttp2 which depends on tzdata which
depends on a tarball of zone definitions which is fetched by a third
curl.
Where it's less good is within a package that you're actively
developing, because if you breathe on any file it usually wants to do
a full rebuild, and sandbox set-up/teardown as well as the delay
looking for substitutes means that extracting fine-grained derivations
for each build step doesn't give you the speedup you might hope for.

  commented:
looks at all-your-codebase and grins, completely ignoring nix

  commented:
I mostly agree, I'm just a little curious about how to attack that
setup.  I suppose you'd have to modify a lockfile or find a hash
collision, neither of which sound very easy.
I just am not entirely thrilled by the idea because I'm used to the
cargo world where upgrading a dependency will also tend to upgrade all
its transitive dependencies without really saying much about it, along
with anything else you have that's semver-compatible.

  commented:
yeah, hash collision or actual source modification are the only two
attacl vectors besides implementation bugs.
the cool thing is that zig now copies your deps into a zig-out folder
next to your build script.
this gives you the capability of:

editing your deps locally
rewriting your build script to use file paths
inspect your deps

there's a plan for doing some kind of version selection inside
transitive dependencies, and @kristoffit proposed to use min version
selection.
the cool thing with zig in general is that it accepts the fact of "one
dep, multiple versions", and you can have the same dependency with two
versions in your project, the buildsystem doesn't care at all

  commented:

There’s a ceiling to the complexity this will tolerate, though
companies like Google and Facebook

The Third Networking Truth:

With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea.

A lot of practices that are cited from Google/Facebook/etc. only work
because those companies can put "sufficient thrust" behind them. For
example: I know for a fact that some of those places support their
chosen practices around monorepos and dependencies with teams whose
headcount is higher than the entire company I work for. Which is
something they can afford to do, but is not something most of the rest
of us can afford to do.

  commented:
I feel like vendoring is not the solution that many people on the
internet tout it to be, because it mostly changes the shape of the
problem, but not the problem itself.
Bloat and multiple versions? You still have bloat, and even more of
it, due to the lack of package-manager reuse. Supply chain attacks?
Good luck reviewing a vendored mountain of code. Vulnerabilities and
security updates? Vendoring is much worse. Visibility into
dependencies? Thanks, I like my cargo tree and cargo bloat just fine.
At the same time, the crisis is real. I don't know what a solution is
like. Maybe it will be library-level capabilities and/or sandboxing,
maybe some kind of a new social contract around opensource. Maybe
there is no solution and we'll have to surrender to (hopefully
benevolent) entities that ensure law and order. Maybe clankers will
get cheap, fast, and effective enough to review and eliminate crude
supply chain attacks of today (while replacing them with much more
sophisticated and subtle threats). But I know that reverting to bad
old days (romanticized by some people like Jonathan Blow) is not a
solution.

  commented:
Despite a huge body of objections and my own internal deliberations, I
am increasingly convinced that vendoring does exactly one thing: it
turns a problem into tomorrow me's problem. Unfortunately, it seems to
be enough for a whole lot of people.

  commented:
I might be wrong, but I think one downside would be that scanners
wouldn't be able to flag that your copied dependency have a bug. If
that's true, it means you could have a latent issue that you aren't
notified about that you would be otherwise

  commented:
From the number of false positives these scanners produce, this may be
an upside. These scanners are extremely helpful for letting you see
what could be a problem, and they are extremely problematic when they
suddenly cause you to set aside other planned work to fix what a
scanner thinks is a problem, but isn't.

  commented:
I agree that scanners are stupid. I've had a case at work where I had
to update a Docker image from Debian to Ubuntu just to upgrade the
"version number" of nginx that was hypothetically affected by a CVE.
Of course it was patched by Debian. And the CVE in question was for
some feature we didn't use.

  commented:
Good take.

I believe vendoring all your deps will also have a desirable soft
side-effect: It will increase the cost of using dependencies.

YES YES YES.

There’s tons of libs out there that actually do pure computation, or
which only really touch the world through very basic and portable I/O
like files and network sockets. Just vendor ’em. Compression lib?
Copy-paste that sucker. libcurl? Copy-paste that sucker.

Nit pick, but please do not copy-paste libcurl. It's a good strategy
for most libraries but using it for C programs which deal with hostile
input is not good advice. You are not going to do a better job than
your operating system at keeping libcurl safe.
One thing I never thought of before is how it's at least a little bit
weird that end-user package managers like apt came first, and
language-level package managers came later. I think this actually
caused a ton of problems; like if you look at early-2000s rubygems,
it's pretty obvious that they were trying to make "apt, but for ruby"
with the way it defaulted to system-wide installs instead of managing
projects each on an individual basis. It took decades to undo the
damage of that mistake by adding bundler to the mix, but bundler would
not have been necessary had the original design acknowledged the need
for project isolation. Python is still working thru the chaos of
fixing this. I'd imagine Perl is too, tho I don't know as much about
that.

  commented:

Nit pick, but please do not copy-paste libcurl.

Ah, so there is a limit.  :-)  Just hard to know exactly where to draw
the line.

One thing I never thought of before is how it's at least a little bit
weird that end-user package managers like apt came first, and
language-level package managers came later.

My take on the history: Package managers were originally a way to
build systems, and these systems often had multiple users, desktop
environments with lots of cooperating software, etc.  Building
software also took a lot of time and memory, and you had a lot of
software compared to the amount of disk space and RAM you had, so
reusing libs and stuff was a big deal.  The rise of the webapp made
most computers that mattered instead be servers that spent their life
running a small handful of programs, and disk space and RAM got cheap
enough that the size of code binaries wasn't very important.  The
system-building tools didn't really keep up with the times as much, so
most people building software really only needed and wanted tools that
were good at building single programs, not big interlocking systems
with lots of shared libs.
There's a parallel track to this history that can be summed up as "C
doesn't have a real goddamn module system", but that's less important
for this.

  commented:

The rise of the webapp made most computers that mattered instead be
servers that spent their life running a small handful of programs, and
disk space and RAM got cheap enough that the size of code binaries
wasn't very important

This is part of it, but I think a bigger part of it is that the target
audience of apt must be assumed to be a non-technical user. If you ask
it to install something, it has to just work.
The target audience of the tool that builds the webapp must be
technical, and thus they can be required to make decisions about
resolving dependencies which the target user of apt cannot; this
forces the packagers for apt to make a ton of decisions up-front
around integration. Those can't be the 100% best decisions for every
single user of the system; they have to compromise for the best
general case, but some flexibility is necessarily sacrificed.
Using apt is a joy; using maven is a job.

  commented:

One thing I never thought of before is how it's at least a little bit
weird that end-user package managers like apt came first, and
language-level package managers came later.

CTAN and CPAN predated the Linux package managers, so some communities
had at least something resembling package management before Linux
distros.

  commented:

Proposed solution
include all the dependencies for your software, with your software.
[...]
Copy-paste upstream source control into your git repo and commit that
fucker. [...]
Get sick of doing this by hand? Make the build tool automate it,
that’s its job.

And at that point we're full circle and are including 3rd party
software unseen again?

  commented:
Keep reading:

(You could also get the same effect by ditching any concept of semver
or other “these two different pieces of code should behave the same”
in the build system, and treating every version number as unique and
unrelated to any other. But that doesn’t solve the problem of
dependencies vanishing or otherwise being subverted, or someone
tampering with the contents of a package in other ways. It’s an
optimization, and in my mind a premature one; we might get there
eventually but shouldn’t start there.)


  commented:
I may be wrong in this, but I feel this text currently kinda mixes two
three things together, that arguably could (should?) be discussed
separately: vendoring, and version-pinning, and binaries caching.
Just vendoring alone doesn't necessarily imply version-pinning. If I
aggressively upgrade my dependency versions, and re-vendor them from
the internets every so often, I'm still not protected against
malicious source upgrades (e.g. the xz attack, or the uid = 0 one you
mentioned), right? I'm hopefully at least protected against binary
artifacts attacks (like github releases, etc.), and availability
attacks (thirdparty server goes down).
Some degree of "protection" against malicious source upgrades (e.g.
the xz attack) in the proposed solution you describe comes, I believe,
actually from version-pinning. This part doesn't really necessitate
vendoring. As others mention, if we wanted to mostly solve
availability, we could fetch the dependencies from
DHT/BitTorrent/IPFS/radicle/... based on their hash. A notable attempt
at solving/mitigating it is AFAIU Go(lang)'s/rsc's MVS. I'm actually
surprised you didn't mention it in the article - makes me sincerely
wonder whether you're aware of it? it would feel obvious to me that
MVS should be discussed in context of the article, to clarify how your
proposed solution addresses the issues MVS tries to solve. Notably,
one challenge raised against MVS/version-pinning is that of security
patches. How do we address it? Imagine I have a bunch of Java apps
which happen use the vulnerable version of log4j somewhere in their
deps tree - how do I learn about it, and how do I make sure I close
that vulnerability when doing version-pinning? Also, I'm not really
sure if this differs much from the upgrade cooldowns you mention; with
version-pinning, we still need to choose some version when we're first
pinning, and how do we make sure it's neither "too fresh" (thus being
the guinea pig) nor "too old" (thus staying in some ancient era).
Finally, "just vendor it" seems to also imply we're vendoring the
sources, and thus skipping/shorting the binaries/packages caching
attacks. Interestingly, many "package repositories" (e.g. docker hub,
or github releases) introduce an easy to miss disconnect between the
source code presumed to be of a library, and the actual library that
is being downloaded. To my surprise, there can also be a "source code
vs. source code" disconnect, where for example IIUC the source code I
submit to crates.io does not have to be the same as the code on
github.com, so an attacker could steal a maintainer's crates.io
credentials and keep uploading subverted sources there, under
assumption that most people read what's on github instead (if at all).
But again, I think it could be good to mention this explicitly, not
just rely on "just vendor it" doing that accidentally. Especially if
somebody would start to think maybe "just vendoring a JAR file" is
enough.

  commented:

I believe vendoring all your deps will also have a desirable soft
side-effect: It will increase the cost of using dependencies.

This is... Not a desirable effect? Last thing we want is bad old days
of everyone reinventing all their own everything when a library could,
should, or does exist.

  commented:
Better have thousands of copies of a couple of functions than everyone
rely on the same ones and those become malicious or disappear.
There's a balance to be struck for sure, but the cost of adding new
dependencies should make it so that people at least take a moment to
understand the risk and that they are trusting the upstream
maintainers (and whoever might phish them).
.