commented: But there’s one really big problem: supply chain attacks. No signed contract with an offer and consideration; not a supply chain! Separately, from the point of view of ensuring your dependencies don't change out from underneath you, a lock file with hashes (or the Golang minimal versioning scheme) are identical to vendoring your dependencies. I hear you on the friction argument where vendoring truly is different. But consider that in the limit, it pushes you to either write your own implementations of things, or—even worse—vibe code your dependencies, and I'd rather use well-tested software written by domain experts. edit: wanted to add: There’s a ceiling to the complexity this will tolerate, though companies like Google and Facebook(?) with giant monorepos demonstrate that this ceiling is probably a lot higher than you think. I worked at Facebook on this stuff, and I wouldn't wish its third-party dependency story on anybody. (There can be at most two semver-incompatible versions of a particular direct dependency to a Rust crate across all of fbsource at any given time. If you want to update a dependency, you have to take on the burden of updating all of fbsource.) I think what Facebook does is okay for Facebook, but it isn't particularly great or sustainable. commented: Out of curiosity, why "at most two"? For being able to migrate incrementally from an older version to a newer version? Re: "isn't particularly great or sustainable", I suspect this is really a function of scale and not the policy. Allowing different versions creates its own problems in other ways because most modern languages with the exception of TypeScript predominantly or exclusively use nominal typing, preventing type reuse across versions unless you're using the "semver trick" whenever there's a breaking change... As one anecdote, when Log4Shell came out, I distinctly remember that companies which had lots of different versions, and versions scattered in more places, had a harder time upgrading compared to companies with few versions/pinning. commented: Out of curiosity, why "at most two"? For being able to migrate incrementally from an older version to a newer version? Yeah -- so you must finish the upgrade from (for example) rand 0.8 to 0.9 before starting the upgrade from rand 0.9 to 0.10. Re: "isn't particularly great or sustainable", I suspect this is really a function of scale and not the policy. Allowing different versions creates its own problems in other ways because most modern languages with the exception of TypeScript predominantly or exclusively use nominal typing, preventing type reuse across versions unless you're using the "semver trick" whenever there's a breaking change... It is true that this is a matter of scale, yeah. I guess my point is that I'd put a massive asterisk on "this ceiling is probably a lot higher than you think". commented: Ouch. That sounds like someone got burned hard in the past before they imposed that rule. commented: You're right, you're right, dependency attacks then. <3 commented: The Zig package manager imho is a really cool compromise: All packages are pinned with a content hash, so lockfile-by-default. It doesn't suffer from the "upstream suddenly got malicous" problem, but still has the issue of "upstream's gone". Except that it has both a global and local cache, content hash-addressed, so when your upstream is gone, you just yeet a tarball of your local copy where-ever you need it. It's a really good compromise between "vendor sources" and "simple, reusable software". commented: You could extend that to all of your software actually, and it'd be pretty cool. Keep all of your sources in a content-addressable store, and hash each program with based on the hash of its inputs. commented: Maybe I'm misunderstanding but this sounds like what Nix does. commented: That's exactly what Nix does, and I was being a bit cheeky with my description in GP. It's a shame that it's still not as great at being the project-level build tool, because it's really great for everything up to that. commented: Interestingly, although Nix does hash the inputs (ideally¹ meaning source code), it doesn't actually protect too well the outputs against subversion... because it regularly downloads the outputs (i.e. compiled binaries) from a cache, and thus requires extending the trust to the cache server... The core packages presumably download from the official cache, but some thirdparty packages do like to try and happily mandate extending global trust permanently towards external caches, posing a concrete security concern. AFAIU there's a (slow going, maybe even stuck) effort to try and support content-addressed fetching of binaries (i.e. outputs) as well; I kinda hope for a future when the nixpkgs repo includes expected hashes of the build outputs, and the nix command downloads them via DHT from torrents or somewhere such, with the official Nix Hydra server only working as a torrent seed. (See also: related, related/archive, related, related, related.) ¹ Quite often, packages definitions unfortunately use prebuilt binary blobs, i.e. "releases" from github, as "inputs". At least they have to hash them, thus pinning/"content-addressing", but then again the trust is moved onto the person who uploaded the compiled binary to the github releases page, thus introducing disconnect from the actual source code. commented: Haha, I thought you might be referring to Nix, but I didn't want to assume. I find Nix to be an acceptable build tool, not really much better or worse than other tools of a similar complexity, although I haven't tried something like Bazel. I'm interested to know what you see as a great build tool? commented: Nix is fantastic at orchestrating packages, though it makes the unfortunate reality of build gnarliness inescapable (for good and ill). If you watch it build without substituters you see amusing things liks curl depend on brotli which depends on cmake which depends on another curl which depends on nghttp2 which depends on tzdata which depends on a tarball of zone definitions which is fetched by a third curl. Where it's less good is within a package that you're actively developing, because if you breathe on any file it usually wants to do a full rebuild, and sandbox set-up/teardown as well as the delay looking for substitutes means that extracting fine-grained derivations for each build step doesn't give you the speedup you might hope for. commented: looks at all-your-codebase and grins, completely ignoring nix commented: I mostly agree, I'm just a little curious about how to attack that setup. I suppose you'd have to modify a lockfile or find a hash collision, neither of which sound very easy. I just am not entirely thrilled by the idea because I'm used to the cargo world where upgrading a dependency will also tend to upgrade all its transitive dependencies without really saying much about it, along with anything else you have that's semver-compatible. commented: yeah, hash collision or actual source modification are the only two attacl vectors besides implementation bugs. the cool thing is that zig now copies your deps into a zig-out folder next to your build script. this gives you the capability of: editing your deps locally rewriting your build script to use file paths inspect your deps there's a plan for doing some kind of version selection inside transitive dependencies, and @kristoffit proposed to use min version selection. the cool thing with zig in general is that it accepts the fact of "one dep, multiple versions", and you can have the same dependency with two versions in your project, the buildsystem doesn't care at all commented: There’s a ceiling to the complexity this will tolerate, though companies like Google and Facebook The Third Networking Truth: With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. A lot of practices that are cited from Google/Facebook/etc. only work because those companies can put "sufficient thrust" behind them. For example: I know for a fact that some of those places support their chosen practices around monorepos and dependencies with teams whose headcount is higher than the entire company I work for. Which is something they can afford to do, but is not something most of the rest of us can afford to do. commented: I feel like vendoring is not the solution that many people on the internet tout it to be, because it mostly changes the shape of the problem, but not the problem itself. Bloat and multiple versions? You still have bloat, and even more of it, due to the lack of package-manager reuse. Supply chain attacks? Good luck reviewing a vendored mountain of code. Vulnerabilities and security updates? Vendoring is much worse. Visibility into dependencies? Thanks, I like my cargo tree and cargo bloat just fine. At the same time, the crisis is real. I don't know what a solution is like. Maybe it will be library-level capabilities and/or sandboxing, maybe some kind of a new social contract around opensource. Maybe there is no solution and we'll have to surrender to (hopefully benevolent) entities that ensure law and order. Maybe clankers will get cheap, fast, and effective enough to review and eliminate crude supply chain attacks of today (while replacing them with much more sophisticated and subtle threats). But I know that reverting to bad old days (romanticized by some people like Jonathan Blow) is not a solution. commented: Despite a huge body of objections and my own internal deliberations, I am increasingly convinced that vendoring does exactly one thing: it turns a problem into tomorrow me's problem. Unfortunately, it seems to be enough for a whole lot of people. commented: I might be wrong, but I think one downside would be that scanners wouldn't be able to flag that your copied dependency have a bug. If that's true, it means you could have a latent issue that you aren't notified about that you would be otherwise commented: From the number of false positives these scanners produce, this may be an upside. These scanners are extremely helpful for letting you see what could be a problem, and they are extremely problematic when they suddenly cause you to set aside other planned work to fix what a scanner thinks is a problem, but isn't. commented: I agree that scanners are stupid. I've had a case at work where I had to update a Docker image from Debian to Ubuntu just to upgrade the "version number" of nginx that was hypothetically affected by a CVE. Of course it was patched by Debian. And the CVE in question was for some feature we didn't use. commented: Good take. I believe vendoring all your deps will also have a desirable soft side-effect: It will increase the cost of using dependencies. YES YES YES. There’s tons of libs out there that actually do pure computation, or which only really touch the world through very basic and portable I/O like files and network sockets. Just vendor ’em. Compression lib? Copy-paste that sucker. libcurl? Copy-paste that sucker. Nit pick, but please do not copy-paste libcurl. It's a good strategy for most libraries but using it for C programs which deal with hostile input is not good advice. You are not going to do a better job than your operating system at keeping libcurl safe. One thing I never thought of before is how it's at least a little bit weird that end-user package managers like apt came first, and language-level package managers came later. I think this actually caused a ton of problems; like if you look at early-2000s rubygems, it's pretty obvious that they were trying to make "apt, but for ruby" with the way it defaulted to system-wide installs instead of managing projects each on an individual basis. It took decades to undo the damage of that mistake by adding bundler to the mix, but bundler would not have been necessary had the original design acknowledged the need for project isolation. Python is still working thru the chaos of fixing this. I'd imagine Perl is too, tho I don't know as much about that. commented: Nit pick, but please do not copy-paste libcurl. Ah, so there is a limit. :-) Just hard to know exactly where to draw the line. One thing I never thought of before is how it's at least a little bit weird that end-user package managers like apt came first, and language-level package managers came later. My take on the history: Package managers were originally a way to build systems, and these systems often had multiple users, desktop environments with lots of cooperating software, etc. Building software also took a lot of time and memory, and you had a lot of software compared to the amount of disk space and RAM you had, so reusing libs and stuff was a big deal. The rise of the webapp made most computers that mattered instead be servers that spent their life running a small handful of programs, and disk space and RAM got cheap enough that the size of code binaries wasn't very important. The system-building tools didn't really keep up with the times as much, so most people building software really only needed and wanted tools that were good at building single programs, not big interlocking systems with lots of shared libs. There's a parallel track to this history that can be summed up as "C doesn't have a real goddamn module system", but that's less important for this. commented: The rise of the webapp made most computers that mattered instead be servers that spent their life running a small handful of programs, and disk space and RAM got cheap enough that the size of code binaries wasn't very important This is part of it, but I think a bigger part of it is that the target audience of apt must be assumed to be a non-technical user. If you ask it to install something, it has to just work. The target audience of the tool that builds the webapp must be technical, and thus they can be required to make decisions about resolving dependencies which the target user of apt cannot; this forces the packagers for apt to make a ton of decisions up-front around integration. Those can't be the 100% best decisions for every single user of the system; they have to compromise for the best general case, but some flexibility is necessarily sacrificed. Using apt is a joy; using maven is a job. commented: One thing I never thought of before is how it's at least a little bit weird that end-user package managers like apt came first, and language-level package managers came later. CTAN and CPAN predated the Linux package managers, so some communities had at least something resembling package management before Linux distros. commented: Proposed solution include all the dependencies for your software, with your software. [...] Copy-paste upstream source control into your git repo and commit that fucker. [...] Get sick of doing this by hand? Make the build tool automate it, that’s its job. And at that point we're full circle and are including 3rd party software unseen again? commented: Keep reading: (You could also get the same effect by ditching any concept of semver or other “these two different pieces of code should behave the same” in the build system, and treating every version number as unique and unrelated to any other. But that doesn’t solve the problem of dependencies vanishing or otherwise being subverted, or someone tampering with the contents of a package in other ways. It’s an optimization, and in my mind a premature one; we might get there eventually but shouldn’t start there.) commented: I may be wrong in this, but I feel this text currently kinda mixes two three things together, that arguably could (should?) be discussed separately: vendoring, and version-pinning, and binaries caching. Just vendoring alone doesn't necessarily imply version-pinning. If I aggressively upgrade my dependency versions, and re-vendor them from the internets every so often, I'm still not protected against malicious source upgrades (e.g. the xz attack, or the uid = 0 one you mentioned), right? I'm hopefully at least protected against binary artifacts attacks (like github releases, etc.), and availability attacks (thirdparty server goes down). Some degree of "protection" against malicious source upgrades (e.g. the xz attack) in the proposed solution you describe comes, I believe, actually from version-pinning. This part doesn't really necessitate vendoring. As others mention, if we wanted to mostly solve availability, we could fetch the dependencies from DHT/BitTorrent/IPFS/radicle/... based on their hash. A notable attempt at solving/mitigating it is AFAIU Go(lang)'s/rsc's MVS. I'm actually surprised you didn't mention it in the article - makes me sincerely wonder whether you're aware of it? it would feel obvious to me that MVS should be discussed in context of the article, to clarify how your proposed solution addresses the issues MVS tries to solve. Notably, one challenge raised against MVS/version-pinning is that of security patches. How do we address it? Imagine I have a bunch of Java apps which happen use the vulnerable version of log4j somewhere in their deps tree - how do I learn about it, and how do I make sure I close that vulnerability when doing version-pinning? Also, I'm not really sure if this differs much from the upgrade cooldowns you mention; with version-pinning, we still need to choose some version when we're first pinning, and how do we make sure it's neither "too fresh" (thus being the guinea pig) nor "too old" (thus staying in some ancient era). Finally, "just vendor it" seems to also imply we're vendoring the sources, and thus skipping/shorting the binaries/packages caching attacks. Interestingly, many "package repositories" (e.g. docker hub, or github releases) introduce an easy to miss disconnect between the source code presumed to be of a library, and the actual library that is being downloaded. To my surprise, there can also be a "source code vs. source code" disconnect, where for example IIUC the source code I submit to crates.io does not have to be the same as the code on github.com, so an attacker could steal a maintainer's crates.io credentials and keep uploading subverted sources there, under assumption that most people read what's on github instead (if at all). But again, I think it could be good to mention this explicitly, not just rely on "just vendor it" doing that accidentally. Especially if somebody would start to think maybe "just vendoring a JAR file" is enough. commented: I believe vendoring all your deps will also have a desirable soft side-effect: It will increase the cost of using dependencies. This is... Not a desirable effect? Last thing we want is bad old days of everyone reinventing all their own everything when a library could, should, or does exist. commented: Better have thousands of copies of a couple of functions than everyone rely on the same ones and those become malicious or disappear. There's a balance to be struck for sure, but the cost of adding new dependencies should make it so that people at least take a moment to understand the risk and that they are trusting the upstream maintainers (and whoever might phish them). .