-
-
Notifications
You must be signed in to change notification settings - Fork 90
Architecture
git-branchless
is implemented in Rust. The package is called git-branchless
, and it is implemented in a Rust crate called branchless
.
Aside: in retrospect, the crate name
branchless
could be confused with some kind of library for high-performance branchless programming. Unfortunately, this library only aids high-velocity software development.
This page is intended for git-branchless
developers or curious users. The main concepts are implemented in branchless::core
.
git-branchless
watches for events in the Git repo by installing various hooks (see branchless::hooks
). These hooks add events to the event log, which is an ordered sequence of events stored on disk in a SQLite database. See the Event
documentation for details about the types of events which can be recorded.
At present, on startup, git-branchless
loads all events into memory, and then replays them to determine the current state of the repository (see EventReplayer
). This could be slow if the user has done many operations and the event log is long.
The undo feature is implemented by taking recent events from the event log and then applying their inverses. For example, if a commit A was rewritten to B, then the inverse operation is to rewrite B to A.
This might not be the best implementation, since some inverses don't make sense. For example, if the user rewrites a draft commit A into its upstream version contained in the main branch, should the inverse really rewrite a main branch commit into a draft commit? That results in the case of main branch commits being obsolete.
It might be best to introduce a dedicated "undo" event type, rather than attempt to invert previous events.
Not yet implemented: To avoid performance problems when the event log is long, it should be possible to add "checkpoints" to the event log. A checkpoint would be a synthetic event that contains a copy of the repository state. Rather than replay all events in the event log, we can find the most recent checkpoint, load the repository state, and replay events only from that point. In this way, we can arbitrarily bound the number of events that need to be read and replayed in the worst case.
So far, I haven't hit performance problems with a few thousand local events, so I haven't prioritized this.
Git has a concept called "reference logs", or "reflogs" for short. A reflog is a history of events that happened to a single Git reference. This is pretty similar to our event log. In fact, the first version of git-branchless
attempted to infer the repository history from the reflog for HEAD
.
So why don't we use reflogs? Unfortunately, they have a number of shortcomings:
- They don't store structured data. We have to guess what the event is doing based on the message.
- For rewrite events, it's particularly difficult to figure out what the "old" version of the commit was. On the other hand, this information is directly exposed in the
post-rewrite
hook, if we wish to record it ourselves. - It's difficult to insert our own synthetic events into the reflog. Either due to a bug in
libgit2
/pygit2
, or possibly due to a fundamental shortcoming in Git, I was unable to insert an entry for a reference which had a message but left the reference pointing to the same object as before. - Reflogs only exist for references which currently exist. Branches may be created and deleted. Once they're deleted, the reflog is also deleted!
- Usually, we use the reflog for
HEAD
to undo work. Sometimes the reflog forHEAD
isn't touched, such as creating and deleting a branch pointing to a non-HEAD commit. In these cases, the historical reference information is entirely unrecoverable using reflogs. - This means the user has to rely on unergonomic solutions to restore a deleted branch.
- In contrast, our
git undo
command can restore deleted branches — only because we don't delete event logs for deleted references.
- Usually, we use the reflog for
- Logically-related events in the same reflog aren't obviously grouped.
- The ordering of events between different reflogs is difficult to ascertain. A single operation can end up creating reflog entries with the same timestamp in different reflogs, and it's not clear which logically happened first.
- Reflogs can be edited or cleaned up.
- The user is free to modify the reflog as they like, which could break internal invariants, although this is probably not a big concern in practice.
- Git's garbage collection may prune reflog entries. It would be a better user experience for a
git undo
if we could tell the user "the operation cannot be undone because Git has garbage collected necessary objects", rather than have old events mysteriously missing from the history.
The reader might also be interested in Jujutsu
, an experimental Git-compatible VCS which also has an "operation log".
I'm not aware of other source control systems which also use a general-purpose event log. Please update this section if you know of another one.
git-branchless
implements a basic version of Mercurial's Changeset Evolution feature.
For the implementation details, there's a good technical document here: https://www.mercurial-scm.org/doc/evolution/concepts.html
Normally, when a commit is amended or rebased, the result is an entirely new Git object, which has no direct relation to the old one. By leveraging the event log and the post-rewrite
hook, we can record these relationships.
These are the important situations:
- One commit is rewritten into one commit (e.g. a rebase or amend): the
RewriteEvent
has the old OID and the new OID. - Many commits are rewritten into one commit (e.g. an interactive rebase
fixup
orsquash
): There are multipleRewriteEvents
with different old OIDs and the same new old OID. - One commit is rewritten into many commits (e.g. a
split
operation): agit split
command is not implemented at the time of this writing, so there is no corresponding sequence ofRewriteEvent
s.
Recording these events allows us to update the smartlog with the latest version of the commit, as well as undo these operations in a principled manner.
As of https://github.com/arxanas/git-branchless/commit/f6c540fea8392223d604c4994b081b603b3df850, the commit graph is based on Eden SCM's segmented changelog data structure. See the thread at https://github.com/quark-zju/gitrevset/issues/1 for more details. Some resources:
- Search the Wiki 🔎
- User guide
- Welcome
- Installation
- Tutorial
- Command overview
- General:
- Navigation:
- Committing:
- Rebasing:
- Verification:
- Collaboration:
- Workflows
- Advanced topics
- Reference
- Developer guide
- Onboarding offer
- Development
- Reference