Rewriting git history simply with git-filter-repo
In this post I describe how I used git-filter-repo to rewrite the history of a git repository to move files into a subfolder.
Background: rewriting git history
As a git user, I like to Rebase. I like to make lots of small commits and tidy them up later using interactive rebase, and to rewrite my PRs to make them easier to understand (and review). I use git push origin --force-with-lease so much, that I have it aliased as git pof.
What I don't do is rewrite the history of my main/master branch. There's a whole world of pain there, as other people will likely have started branches from the branch, and they can easily end up in a complete mess.
However, sometimes it makes sense.
I was working on a small side project the other day, when I realised it would really make sense for it to effectively be a "monorepo". So rather than having all the existing code in the root directory, I wanted to move it to a child directory.
So I started with a directory that looked like this:

And I wanted a directory that looked like this:

The notable points here are:
- Everything has been moved to an engine subfolder
- Except the .gitattributes and .gitignore files, which are still at the top level.
The simplest way to do this is to just move all the files, and create a new commit with the changes, job done. The downside to that is that while git itself is ok at tracking file moves (it sometimes gets things wrong), it can cause some other issues.
For example, if you're looking at a file on GitHub, and you want to see what it looks like at a particular commit, then you can use the branch selector to change it. However, if the file has moved, you'll get a 404. Not a great experience.

If the odd file has moved, that's not a big deal, but if literally every file has moved, that's not a great experience.
So what's the alternative? Rewriting history!
Rewriting history: the options
With rewriting history, we update the git branches to make it look like all the files were originally committed to the engine subfolder. There's no "sudden move". The history shows them as always having been in the engine folder.
This sort of wholesale rewriting of your
main/masterbranch is definitely not advisable if you are sharing the repo publicly. You will likely break all sorts of people's work!
Normally when I'm rewriting history I use git rebase -i in combination with git reset HEAD~. This lets me squash commits together, pause to split them apart, reorder them, or remove them entirely. That's great for when you're massaging a PR, but it's really not designed for wholesale rewriting of an entire repository.
For those scenarios, git filter-branch is a b etter option. This is a complex git command, that frankly, scares me. I have used it, on occasion, but the syntax is janky, you typically have to incorporate a lot of bash, it's often slow, and you could mess up your whole repository. Yay!
Just take a look at this Stack Overflow question which is about a similar requirement but in reverseΓ’"moving from the engine folder to the root. One of the suggested answers suggests running the following command:
git filter-branch -f --index-filter 'PATHS=`git ls-files -s | sed "s/^engine//"`; \ GIT_INDEX_FILE=$GIT_INDEX_FILE.new; \ echo -n "$PATHS" | \ git update-index --index-info \ && if [ -e "$GIT_INDEX_FILE.new" ]; \ then mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"; \ fi' -- --all That's definitely something. Does it work? Probably. Would you want to write your own? Almost certainly not.
So instead of trying to figure out how to mangle git filter-branch to my liking, I decided to look at at a suggestion I saw elsewhere: git-filter-repo.
"Installing" git-filter-repo using Docker
git-filter-repo isn't built-in to git itself. In fact, it's a single Python file, but it's written to feel like a git plugin. And the really nice thing is that the API is so much nicer. That whole git filter-branch expression in the previous section could be rewritten with git-filter-repo to be something like this:
git filter-repo --path-rename engine/: I think you'll agree that's much clearer! The manual is also very good, with lots of examples.
The only problem from my point of view, is that git-filter-repo is a Python module. Python on Windows can be problematic (even the install instructions make that clear) and while you can install Python from the Microsoft Store, I really didn't want to go through that. Docker to the rescue!
Docker is such a great use-case for something like this, where I want to quickly try a tool, and don't want to risk messing up my machine. Instead of installing Python, I'll run a Docker image that already has Python installed, map the drive to my project, and wo rk inside the docker image!
git-filter-repo requires Python 3.5+, so I searched for Python on Docker Hub and found the official images. The python:3 image is a bullseye (Debian 11) image, with Python 3.10 installed, which would do nicely.
I ran the following command from inside my app to pull and run the Docker image, to map the current directory to the /app directory inside the container, set the working directory to /app, and to start a bash shell.
docker run --rm -it -v ${PWD}:/app -w /app python:3 /bin/bash I now have a running Python container, but I don't have the git-filter-repo tool installed yet. The python:3 repo uses Debian 11, and according to the git-filter-repo install instructions, I needed to use the "backports" repository to install via apt-get:
A repository in this context refers to the server containing all the packages used by
aptfor installation into a Linux machine. It is separate from the concept of a "git repository".
Unfortunately the backports repository isn't enabled by default in Debian 11, so I followed the instructions from the backport website to add it to the sources list, and installed the git-filter-repo package:
# Add the backports repo to sources.list echo 'deb http://deb.debian.org/debian bullseye-backports main' > /etc/apt/sources.list.d/backports.list # Update the list of available packages apt-get update # Install git-filter-repo, adding the required /bullseye-backports suffix apt-get install -y git-filter-repo/bullseye-backports The logs indicated this had installed correctly, so I was ready to take it for a spin!
Using git-filter-repo to move files into a subdirectory
My first attempt to use git-filter-repo wasn't very successful. I tried running:
git filter-repo --to-subdirectory-filter engine/ which seemed like it would do most of what I wanted, but I was presented with the following:
> git filter-repo --to-subdirectory-filter engine/ Aborting: Refusing to destructively overwrite repo history since this does not look like a fresh clone. (expected freshly packed repo) Please operate on a fresh clone instead. If you want to proceed anyway, use --force. This is very interesting! Rewriting history is obviously a very destructive process in which you can lose work, and git-filter-repo is doing its best to make sure you don't hurt yourself. As long as you have your work pushed to a remote git repository you should be fine, but to be safe, git-filter-repo requires you work in a fresh clone by default.
This seemed very sensible to me, so I did as it asked, created a fresh clone, and tried again:
> git filter-repo --to-subdirectory-filter engine/ Parsed 24 commits New history written in 2.37 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects HEAD is now at 547b073 Use alternate robots.txt Enumerating objects: 375, done. Counting objects: 100% (375/375), done. Delta compression using up to 4 threads Compressing objects: 100% (161/161), done. Writing objects: 100% (375/375), done. Total 375 (delta 189), reused 327 (delta 189), pack-reused 0 Completely finished after 6.32 seconds. That's much better! As you can see from the logs, git-filter-repo was very busy, rewriting the commits. Taking a look at the results afterwards, everything except the .git folder had been moved to the engine subfolder:

and the history (shown with gitk here) shows that the original commits were all to the engine folder.

This is almost exactly what I want, except I wanted the .gitignore and .gitattributes to remain at the top level.
I'll come back to those strange
replace/*tags in thegitkimage shortly
The easiest way to fix the .gitignore location was more rewriting! I ran the following command to move the .gitignore and .gitattributes files back up to the root folder:
> git filter-repo \ --path-rename engine/.gitattributes:.gitattributes \ --path-rename engine/.gitignore:.gitignore Parsed 24 commits New history written in 1.35 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects HEAD is now at f554e31 Use alternate robots.txt fatal: replace depth too high for object 8027f9f8670e3da4762099d39e733bcfa44fea39 fatal: failed to run pack-refs Completely finished after 2.45 seconds. That appeared to work, as I now had the folder structure I wanted. But there were two slightly worrying fatal error messages in the logs Γ°¤" On top of that, when I tried opening gitk I got the following error message:

That's a bit concerning Γ° Luckily, after a bit of Googling, I found I could fix the issue by running:
> git replace -d 8027f9f8670e3da4762099d39e733bcfa44fea39 Deleted replace ref '8027f9f8670e3da4762099d39e733bcfa44fea39' After that, I could successfully open gitk, and could see that the .gitignore and .gitattributes files were again in the root, with everything else in the engine folder:

So with that, my work was pretty much done. But that fatal error was bugging me, as were all those extraneous replace/ refs.
It took me a little while to work out what those refs even were but eventually I pinned it down to a
gitfeature calledgit-replace. That feature is worth a whole blog post on its own, so for now I'll just point you to the docs if you're interested, and I'll walk through the feature in a subsequent post.
I decided to st art again, and this time I told git-filter-repo I didn't need the extra replace/ references by passing --replace-refs delete-no-add:
# Move everything to the engine/ subfolder git filter-repo --replace-refs delete-no-add --to-subdirectory-filter engine/ # Move .gitignore and .gitattributes back to the root git filter-repo --replace-refs delete-no-add \ --path-rename engine/.gitattributes:.gitattributes \ --path-rename engine/.gitignore:.gitignore This time there were no fatal errors in the logs, gitk opened without any errors, and all the replace/ references were gone. Success! With that I could exit the Docker container, double check everything was correct, and do a git push origin --force-with-lease of my newly rewritten repo!
All in all, I'm very impressed with git-filter-repo, and using it inside the Docker container is clean and painless, so I'd definitely recommend it!
Summary
In this post I described a scenario where I wanted to rewrite the history of a git repository to make it appear as though some files were originally created in a sub-folder instead of the root folder. I described how to run a python:3 Docker container, how to install git-filter-repo, and the commands required to move all the files except .gitattributes and .gitignore to an engine subfolder. To make it simpler, I've reproduced the main steps here:
- Create a fresh clone of your repository, and
cdto the clone directory
# Clone my/repo to output_directory git clone https://github.com/my/repo output_directory cd output_directory - Run a
python:3Docker container interactively, and installgit-filter-repoinside it
# run the Docker container docker run --rm -it -v ${PWD}:/app -w /app python:3 /bin/bash # inside the container, install git-filter-repo # Add the backports repo to sources.list echo 'deb http://deb.debian.org/debian bullseye-backports main' > /etc/apt/sources.list.d/backports.list # Update the list of available packages apt-get update # Install git-filter-repo, adding the required /bullseye-backports suffix apt-get install -y git-filter-repo/bullseye-backports - Run the
git-filter-repocommands to move all the files to the engine subdirectory, and then move the .gitignore and .gitattribute files back. Don't createreplace/refs.
# Move everything to the engine/ subfolder git filter-repo --replace-refs delete-no-add --to-subdirectory-filter engine/ # Move .gitignore and .gitattributes back to the root git filter-repo --replace-refs delete-no-add \ --path-rename engine/.gitattributes:.gitattributes \ --path-rename engine/.gitignore:.gitignore
Comments
Post a Comment