Rewriting History With Git

The running joke among programmers is that barely anyone understands how to use git. While that can seem painfully true sometimes, I don't think the problem is actually that people don't understand git, but that it seems scary. Here's some not-scary and hopefully helpful tips.

Specifically, I'm going to be talking about rebasing. Most people who have used git will have collaborated with other people, so hopefully you're basically familiar with what rebasing is.

If not, think about those graphs you've seen in every git guide. A merge makes those side-by-side lines where one points back into the main branch (the merge commit), a rebase rewrites the commits to all be stacked on one line. Pretty simple, right?

Rebasing essentially rewrites history. If you've only ever rebased to stay up-to-date with your shared repo, you're missing out on why so many people love the command: The interactive rebase. All you have to do is:

1
git rebase -i

The -i is key. If you really want to go nuts, pop open a repo you've been using (preferably one without too many commits), and try:

1
git rebase -i --root

You should be familiar with that screen; it's basically the commit message. The difference here is that there's individual commits listed with a word in front. At the bottom there's a helpful message:

1
2
3
4
5
6
7
8
9
10
11
12
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#

This is the magic guide to rebasing. Picking a commit means it stacks that on for you - that's more-or-less what happens when you pull & rebase. Reword is nice when a typo is bugging you in an old commit. You'll get the option to rewrite the commit message. Squash and fixup are invaluable for local chronic committers.

If you choose to squash a commit, the changes in that commit will get moved into the previous one. This lets you clean up your history. You might commit changes dozens of times fiddling with things (especially if you have a testing/deployment workflow that involves pushing a branch), but if you're anything like me at least 5 of them are fixing typos or some other simple, boneheaded mistake. Rather than have the stream of 'FZZCKING SHIFt kEy'-style commits, squash them! Try it out on a test repo. Fixup does the same thing, but you'll only see the one commit message in the editor - squash gives you the list. That's helpful if you're like me and tend to do bullets, but only want to highlight the important parts.

Many group workflows encourage squashing commits to the overall change, rather than all the incremental steps. I like to squash the commits before offering it to merge regardless, as it lets me have a clear history of edits despite the wacky flow I tend to actually use.

I'm going to skip over exec (and the related --filter-branch) as they're mostly useful for mass edits like scrubbing files or changing user information. If you need to do that kind of thing, git-scm has your back at the bottom of the page and Github has some help too. That's also the majority of answers you'll find on SO and other resources relating to those options.

Editing commits, however, can be a lot of fun. You get to jump back in time to whenever you wrote the commit and start working on it again, as if you had just hit save. The important thing to remember here is if you want to modify files or the commit message on the same commit, you have to use:

1
git commit --amend

Not just git commit. You can, by committing like you usually do, add commits at that point in history (though the time will be off). I encourage you to keep an eye on your git log while you're interactively rebasing, especially if you have several in a row.

I know what you're thinking: Cool, I can go back and insert a bugfix at a time that makes more sense, and clean up my history; put a README in on the first commit instead of adding it randomly later... But what about those dates? If I mix in commits I don't want to be jumping from April to January to April again. Here's a simple bash script to help you out:

Fix commit dates
1
GIT_COMMITTER_DATE="`date -d "$1"`" git commit --amend --date "`date -d "$1"`"

You can save this anywhere on your computer (chmod +x), then run it like ./datefix.sh '2015-12-31 23:59:59'. It'll load up your default commit editor and you can add in a Happy New Year message. Similarly, you can change the author/email/etc of any commits this way.

Play around with it in a repo you haven't made public yet, maybe because the git history is ridiculous. After you rewrite a couple entire histories, you'll feel like a pro at interactive rebasing, and be way more comfortable structuring commits.

The one big caveat (and why I keep recommending not-yet-public repos): You'll totally break merging with other people's records if you go back changing history like this. I always try to keep my work saved separately so I can tweak my commits down later without breaking any collaborator's repos. After playing around with this, though, you should know how to fix it!