[HN Gopher] I imported the full Linux kernel git history into pgit
___________________________________________________________________
I imported the full Linux kernel git history into pgit
Author : ImGajeed76
Score : 161 points
Date : 2026-04-05 12:08 UTC (4 days ago)
HTML web link (oseifert.ch)
TEXT w3m dump (oseifert.ch)
| gurjeet wrote:
| Technically correct title would be: s/Kernel into/Kernel Git
| History into/ Pgit: I Imported the Linux Kernel
| Git History into PostgreSQL
| worldsayshi wrote:
| Wow that has a very different meaning from what I thought.
| JodieBenitez wrote:
| Read the title and immediately thought "what a weird way to solve
| the performance loss with kernel 7..." The mind tricking itself
| :)
| tombert wrote:
| If I recall correctly, the Fossil SCM uses SQLite under the
| covers for a lot of its stuff.
|
| Obviously that's not surprising considering its creator, but
| hearing that was kind of the first time I had ever considered
| that you could translate something like Git semantics to a
| relational database.
|
| I haven't played with Pgit...though I kind of think that I should
| now.
| gjvc wrote:
| "If I recall correctly, the Fossil SCM uses SQLite under the
| covers for a lot of its stuff."
|
| a fossil repository file is a .sqlite file yes
| tombert wrote:
| Makes sense, I haven't used the software in quite awhile.
| ptdorf wrote:
| So SQLite is versioned in SQLite.
| yjftsjthsd-h wrote:
| Yep:) To be fair, I expect git to be stored in git,
| mercurial to be in mercurial, and... Actually now I wonder
| how svn/cvs are developed/versioned.
| deepsun wrote:
| SVN in SVN for sure, it's a well made product. The market
| just didn't like it's architecture/UX that doctates what
| features available.
|
| CVS is not much different from copying files around, so
| would not be surprised if they copied the files around to
| mimic what CVS does. CVS revolutionized how we think of
| code versioning, so it's main contribution is to the
| processes, not the architecture/features.
| vidarh wrote:
| The market did like it just fine until Git came around.
| It just had a very brief moment in the sun....
| tombert wrote:
| My first software job, I was a junior person, and every
| Friday, we would have The Merge, where we'd merge every
| SVN branch into trunk. We always spoke of it like it was
| this dreadful proper noun, like Voldemort or something.
|
| The junior engineers were the ones doing this, and
| generally my entire day would be spent fixing merge
| conflicts. Usually they were easy to resolve, but
| occasionally I'd hit one that would take me a very long
| time (it didn't help that I was still pretty
| inexperienced and consequently these things were just
| sort of inherently harder for me). I just assumed that
| this was the way that the world was until I found `git-
| svn`.
|
| `git-svn` made a task that often took an entire day take
| something like 45 minutes, usually much less. It was like
| a light shining down from heaven; I absolutely hated
| doing The Merge, and this just made it mostly a _solved_
| problem.
|
| After that job, I sort of drew a soft line in the sand
| that I will not work with SVN again, because at that
| point I knew that merging could be less terrible. I
| wasn't necessarily married to git in particular, but I
| knew that whatever the hell it was that SVN was doing, I
| didn't like it.
| anitil wrote:
| The sqlite project actually benefited from this dogfooding.
| Interestingly recursive CTEs [0] were added to sqlite due to
| wanting to trace commit history [1]
|
| [0] https://sqlite.org/lang_with.html#recursive_query_examples
|
| [1] https://fossil-scm.org/forum/forumpost/5631123d66d96486 -
| My memory was roughly correct, the title of the discussion is
| 'Is it possible to see the entire history of a renamed file?'
| anitil wrote:
| On and of course, the discussion board is itself hosted in a
| sqlite file!
| 20after4 wrote:
| When you import a repository into Phabricator, it parses
| everything into a MySQL database. That's how it manages to
| support multiple version control systems seamlessly as well as
| providing a more straightforward path to implementing all of
| the web-based user interface around repo history.
| adastra22 wrote:
| Git was a (poor) imitation of the monotone DVCS, which stored
| its data in sqlite.
| xeubie wrote:
| True, git poorly imitated monotone's performance problems.
| niobe wrote:
| Very cool
| tonnydourado wrote:
| That was an informative post but Jesus Christ on a bicycle, reign
| in the LLM a bit. The whole thing was borderline painful to read,
| with so many "GPTisms" I almost bailed out a couple of times. If
| you're gonna use this stuff to write for you, at least *try* to
| make it match a style of your own.
| vidarh wrote:
| To add a tip on _how_ to make it match your own style: You can
| get decently far by pointing it to a page or so of your own
| writing, and simply tell it to review the post section by
| section and edit it to match the tone and style of the example.
| It 's not perfect by any means, but it will tend to edit out
| the type of language you're not likely to use, so really to
| make it sound less LLM-like, almost any writing sample from a
| human author works.
| mplanchard wrote:
| You can also just write it.
|
| I'd much rather read someone's imperfect writing than the
| soulless regression-to-the-mean that LLMs produce. If you're
| not a native speaker or don't have confidence in your
| writing, I'd urge you to first ask for an edit by another
| human, but if that's not an option, to be extremely firm in
| your LLM prompting to just have it fix issues of grammar,
| spelling, etc.
| vidarh wrote:
| Almost nobody recognises well written AI texts. I've seen
| plenty of AI written text pass right by people who are sure
| they can always tell. It takes very little, because the
| vast majority of AI writing you spot involves people doing
| nothing to make it clean up the style.
| erichanson wrote:
| "soulless regression-to-the-mean", damn that's quote of the
| day.
| darkwater wrote:
| 100% agreed. Maybe this inner reaction will disappear over the
| years of being exposed to the GPT writing style, or maybe LLMs
| will be "smarter" on this regard, and being able to use
| different styles even by default. But I had the same exact
| feelings as you reading this piece.
| vidarh wrote:
| It's really simple to fix by asking an LLM to apply a style
| from a sample, so my guess is a lot of product will build in
| style selection, and some provider will add more aggressive
| rules in their system prompts over time.
| mplanchard wrote:
| It's not even just about the style. It's a matter of
| respect for your readers. If you can't be bothered to take
| the time to write it, why on earth should I care enough to
| take the time to read it?
| vidarh wrote:
| If the content has value, I could not care less.
| jillesvangurp wrote:
| I would recommend using guard rails to guide tone,
| phrasing, etc. This helps prevent whole categories of bad
| phrasing. It also helps if you provide good inputs for what
| you actually want to write about and don't rely too much on
| it just filling empty space with word soup. And iterate on
| both the guard rails and the text.
| multjoy wrote:
| Or, you know, just write it yourself.
| darkwater wrote:
| Yes, but you need a style before :) But in TFA's author
| case, he actually had a few other blog posts which feel not
| LLM generated to use as an example, I agree.
| vidarh wrote:
| But for plenty of applications it doesn't need to be your
| _personal_ style. It only needs to be your personal style
| if you want to present it as your own writing. Otherwise
| it just matters that it 's well written. A catalogue of
| styles would work well for lots of uses.
| 47282847 wrote:
| ,,Rewrite in a style appealing to Hacker News users
| critical of AI slop".
| consp wrote:
| I stopped at "pgit handled it.". The tldr was appreciated
| though as now I don't have to sieve though the LLM bloat.
| mplanchard wrote:
| I did bail out because of this, despite being pretty interested
| in the content. I love reading, but I cannot stand LLM
| "writing" output, and few things are important enough for me to
| force myself through the misery of ingesting ChatGPT "prose." I
| only made it to the second section of this one.
| spit2wind wrote:
| > only a handful of VCS besides git have ever managed a full
| import of the kernel's history. Fossil (SQLite-based, by the
| SQLite team) never did.
|
| I find this hard to believe. I searched the Fossil forums and
| found no mention of such an attempt (and failure). Unfortunately,
| I don't have a computer handy to verify or disprove. Is there any
| evidence for this claim?
| gritzko wrote:
| I was giving students an assignment to import git repo into
| fossil and the other way around. git was a tad faster, but not
| dramatically.
| ImGajeed76 wrote:
| i did look into this before writing the post. there's a fossil-
| users mailing list post by Isaac Jurado where he reported that
| importing Django took ~20 minutes and importing glibc on a 16GB
| machine had to be interrupted after a couple of hours. he
| explicitly warned against trying the linux kernel. the largest
| documented import on the fossil site itself was NetBSD pkgsrc
| (~550MB) which already showed scaling issues. so "never did" is
| fair - not because anyone tried and failed, but because it was
| known to be impractical and explicitly discouraged.
| corbet wrote:
| I hate to blow our own horn, but I'm gonna...if you are
| interested in seeing this kind of kernel-development data mining,
| fully human-written, LWN posts it every development cycle. The
| 6.17 version (https://lwn.net/Articles/1038358/) included the
| buggiest commit and much surrounding material. See our kernel
| index (https://lwn.net/Kernel/Index/#Releases) for information on
| every kernel release since 2.6.20.
|
| Or see LWN on Monday for the 7.0 version :)
| ImGajeed76 wrote:
| Thanks! LWN's development cycle reports are incredible and were
| actually an inspiration. The goal here wasn't to replace that
| kind of expert analysis but to show what becomes possible when
| you can just write SQL against the raw history. Your reports
| add the context and understanding that no database query can
| provide.
| anonair wrote:
| I wish one day tools like gitlab and forgejo ditch filesystem
| storage for git repos and put everything in sqldb. I'm tired of
| replicating files for DR
___________________________________________________________________
(page generated 2026-04-09 23:02 UTC)