this fall I worked with the core Git folks on writing an official data model for Git and it just got merged! I learned a few new things from writing it. https://github.com/git/git/blob/master/Documentation/gitdatamodel.adoc
this fall I worked with the core Git folks on writing an official data model for Git and it just got merged! I learned a few new things from writing it. https://github.com/git/git/blob/master/Documentation/gitdatamodel.adoc
@b0rk Thanks for your awesome contributions to making tools easier to use for everyone. It’s great to hear that the core GitHub folks recognized that and engaged with you.
@b0rk I know what I'll be reading when I get home!
@b0rk Reminds me a lot of https://alexwlchan.net/a-plumbers-guide-to-git/ which I found very helpful as well. Glad to see something like this become official! Thanks for this work 👍
the fact that you collaborated with the core Git crew is giving me life! you're basically the pickle in the programming salad. keep that energy, friend!
@b0rk "A commit contains these required fields (..): The full directory structure of all the files in that version of the repository and each file’s contents, stored as the tree ID of the commit’s top-level directory" – Does this mean that a commit with a single line change can be extremely large if there are many (unchanged) files, because all IDs of every file, including the directory structure, are stored for each commit?
@markusr It depends: if all the files are in 1 single giant directory then yes it would take up a lot of space. But any directory which is unchanged can be reused between commits (using its tree ID) so usually you can share a lot of the directory structure with previous commits
@b0rk Ahhh. Thanks for your explanation!
@b0rk that's cool. But what's the use of it? 🙂
This is great and also necessary.
@b0rk That's amazing!
I just read through it, and I think I found a formatting bug: The text "the old commit will usually not be reachable, ..." at the end of REFERENCES looks like it is part of the note, but at least in the GitHub preview it's rendered outside of the note box.
@yrlf thanks, will work on fixing that
@b0rk@social.jvns.ca This is a really clear explanation that I've sometimes missed in the past. Thank you for making it happen!
here are some things I learned while writing this:
1. Commits can have "extra fields", for example a GPG signature
2. I always thought entries in a tree had 4 fields, but there are actually only 3 (file name, file type, and object ID)
3. Git sometimes prints out file types as a bit set (100644), but it's really more like an enum, since there are only 5 file types (regular file, executable file, symlink, directory, and gitlink)
(2/?)
@b0rk Hmm, are those extra fields stored in the commit itself or can arbitrary metadata be stored in Git?
For some context (and it might have been experimented on already), Mercurial has an extension called evolve that is providing "obsolescence markers", which are pushed (and pulled) in the remote repository. Those obsolescence markers let you know when a commit was rewritten in history, e.g. after a rebase or after modifying it, and this is used to automatically "evolve" a branch modified by your peers (and also for the server to forbid pushing outdated changes, like with force-with-lease).
I'm wondering if evolve could be ported to Git.
@Exagone313 they're in the commit so they're immutable
it looks like someone's tried to build "git evolve" before https://lwn.net/Articles/914041/
@b0rk what's a gitlink? I never heard of that one before
@b0rk What did you think would be the fourth field of a tree entry?
@b0rk The way Git displays octal file modes to the user is so unfortunate 😞 It gives you the false impression that files in a repo *could* have unusual permissions/types, or that perhaps the umask on a commiter's machine might affect the permissions of newly committed files, even though it actually enforces "standard" file modes.
(To be clear, I think it's good that Git can't store any fancy permissions/types - I just wish it would communicate that to the user...)
@dgelessus me too. I’m curious about whether they’d be open to changing it though I don’t think I have the stamina to work on that.
4. In Git's index (aka staging area), every file has a "stage number". This is usually 0, but when there's a merge conflict then there can be 4 versions of the same file in the "staging area"
5. Branches are not necessarily always stored as "every branch is a file in .git", there's also a reftable backend https://about.gitlab.com/blog/a-beginners-guide-to-the-git-reftable-format/ which fixes some problems with "branches are files", like how if you're on a case-insensitive filesystem it means your branches are also case-insensitive
(3/?)
in any case the things I learned are mostly trivia, the real point was to have an explanation of the basics :)
@b0rk a trivia game show based on git trivia would be fun (or not)
@b0rk I don't think git has any native recognition of directories. It just knows about paths. Or at least you cannot commit an empty directory.
@enhancedscurry i just looked into this and as far as I can tell from my experimentation, it's theoretically possible in Git to create a commit with an empty directory in it (like there's nothing in the data model that prevents it).
BUT if you check out that commit, the checkout won't include the empty directory, so the effect is that (as we all know) you can't have empty directories in Git.
@enhancedscurry My best guess is that this is because even though in Git a _commit_ can theoretically contain an empty directory, the _index_ can't have an empty directory.