blog.lazkani.io/content/posts/git-binary-clean-up.md

9.3 KiB

+++ title = "Git binary clean up" author = ["Elia el Lazkani"] date = 2020-09-02T21:00:00+02:00 lastmod = 2021-06-28T00:01:34+02:00 tags = ["git", "git-filter-repo", "git-lfs"] categories = ["revision-control"] draft = false +++

When I first started this blog, I simply started with experiments. The first iteration was a wordpress which was followed, very fast, by joomla. Neither of them lasted long. They are simply not for me.

I am lucky to be a part of a small group started in #dgplug on Freenode. In mentioned group, I have access to a lot of cool and awesome people who can put me to shame in development. On the flip side, I live by a motto that says:

Always surround yourself with people smarter than yourself.

It's the best way to learn. Anyway, back to the topic at hand, they introduced me to static blog generators. There my journey started but it started with a trial. I didn't give too much thought to the repository. It moved from GitHub to Gitlab and finally here.

But, of course, you know how projects go, right ?

Once you start with one, closely follows other ones that crop up along the way. I put them on my TODO, literally. One of those items was that I committed all the images to the repository. It wasn't until a few days ago until I added a .gitattributes file. Shameful, I know.

No more ! Today it all changed.

First step first

Let's talk about what we need to do a little bit before we start. Plan it out in our head before doing the actual work.

I will itemize them here to make it easy to follow:

  • Clone a fresh repository to do the work in
  • Remove all the images from the git repository
  • Add the images again to git lfs

Sounds simple enough, doesn't it ?

warning

If you follow along this blog post, here's what you can expect.

  • You WILL lose all the files you delete from disk, as well, so make a copy
  • You WILL re-write history. This means that the SHA of every commit since the first image was committed WILL mostly likely change.
  • You WILL end up essentially with a new repository that shares very little similarities with the original, so BACKUP!.

Now that we got the warning out of the way, let's begin the serious work.

Clone the repository

I bet you can do this with your eyes closed by now.

$ # Backup your directory !
$ mv blog.lazkani.io blog-archive
$ git clone git@git.project42.io:Elia/blog.lazkani.io.git blog.lazkani.io
$ cd blog.lazkani.io

Easy peasy, lemon squeezy.

Remove images from history

Now, this is a tough one. Alright, let's browse.

Oh what is that thing git-filter-repo ! Alright looks good.

We can install it in different ways, check the project documentation but what I did, in a python virtual environment, was.

$ pip install git-filter-repo

warning

BEWARE THE DRAGONS

git-filter-repo makes this job pretty easy to do.

$ git filter-repo --invert-paths --path images/
Parsed 43 commits
New history written in 0.08 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 17d3f5c Modifying a Nikola theme
Enumerating objects: 317, done.
Counting objects: 100% (317/317), done.
Delta compression using up to 2 threads
Compressing objects: 100% (200/200), done.
Writing objects: 100% (317/317), done.
Total 317 (delta 127), reused 231 (delta 88), pack-reused 0
Completely finished after 0.21 seconds.

That took almost no time. Nice !

Let's check the directory and fair eonugh it no longer has images/.

Add the images back !

Okay, for this you will need git-lfs. It should be easy to find your package manager. This is a debian 10 machine so I did.

$ sudo apt-get install git-lfs

warning

Before you commit to using git-lfs, make sure that your git server supports it.

If you have a pipeline, make sure it doesn't break it.

I already stashed our original project like a big boy, so now I get to use it.

$ cp -r ../blog-archive/images .

Then we can initialize git-lfs.

$ git lfs install
Updated git hooks.
Git LFS initialized.

Okay ! We are good to go.

Next step, we need to tell git-lfs where are the files we care about. In my case, my needs are very simple.

$ git lfs track "*.png"
Tracking "*.png"

I've only used PNG images so far, so now that they are tracked you should see a .gitattributes file created if you didn't have one already.

From this step onward, git-lfs doesn't differ too much from regular git. In this case it was.

$ git add .gitattributes
$ git add images/
$ git status
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   .gitattributes
	new file:   images/local-kubernetes-cluster-on-kvm/01-add-cluster.png
	new file:   images/local-kubernetes-cluster-on-kvm/02-custom-cluster.png
	new file:   images/local-kubernetes-cluster-on-kvm/03-calico-networkProvider.png
	new file:   images/local-kubernetes-cluster-on-kvm/04-nginx-ingressDisabled.png
	new file:   images/local-kubernetes-cluster-on-kvm/05-customize-nodes.png
	new file:   images/local-kubernetes-cluster-on-kvm/06-registered-nodes.png
	new file:   images/local-kubernetes-cluster-on-kvm/07-kubernetes-cluster.png
	new file:   images/my-path-down-the-road-of-cloudflare-s-redirect-loop/flexible-encryption.png
	new file:   images/my-path-down-the-road-of-cloudflare-s-redirect-loop/full-encryption.png
	new file:   images/my-path-down-the-road-of-cloudflare-s-redirect-loop/too-many-redirects.png
	new file:   images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks-logs.png
	new file:   images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks.png
	new file:   images/weechat-ssh-and-notification/01-weechat-weenotify.png

Now that the files are staged, we shall commit.

$ git commit -v
[master 6566fd3] Re-adding the removed images to git-lfs this time
 14 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 images/local-kubernetes-cluster-on-kvm/01-add-cluster.png
 create mode 100644 images/local-kubernetes-cluster-on-kvm/02-custom-cluster.png
 create mode 100644 images/local-kubernetes-cluster-on-kvm/03-calico-networkProvider.png
 create mode 100644 images/local-kubernetes-cluster-on-kvm/04-nginx-ingressDisabled.png
 create mode 100644 images/local-kubernetes-cluster-on-kvm/05-customize-nodes.png
 create mode 100644 images/local-kubernetes-cluster-on-kvm/06-registered-nodes.png
 create mode 100644 images/local-kubernetes-cluster-on-kvm/07-kubernetes-cluster.png
 create mode 100644 images/my-path-down-the-road-of-cloudflare-s-redirect-loop/flexible-encryption.png
 create mode 100644 images/my-path-down-the-road-of-cloudflare-s-redirect-loop/full-encryption.png
 create mode 100644 images/my-path-down-the-road-of-cloudflare-s-redirect-loop/too-many-redirects.png
 create mode 100644 images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks-logs.png
 create mode 100644 images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks.png
 create mode 100644 images/weechat-ssh-and-notification/01-weechat-weenotify.png

Yes, I use -v when I commit from the shell, try it.

The interesting part from the previous step is that git-filter-repo left us without a remote. As I said, this repository resembles very little the original one so the decision made by git-filter-repo is correct.

Let's add a new empty repository remote to our new repository and push.

$ git remote add origin git@git.project42.io:Elia/blog.lazkani.io.git
$ git push -u origin master

Locking support detected on remote "origin". Consider enabling it with:
  $ git config lfs.https://git.project42.io/Elia/blog.lazkani.io.git/info/lfs.locksverify true
Enumerating objects: 338, done./13), 1.0 MB | 128 KB/s
Counting objects: 100% (338/338), done.
Delta compression using up to 2 threads
Compressing objects: 100% (182/182), done.
Writing objects: 100% (338/338), 220.74 KiB | 24.53 MiB/s, done.
Total 338 (delta 128), reused 316 (delta 127), pack-reused 0
remote: Resolving deltas: 100% (128/128), done.
remote: . Processing 1 references
remote: Processed 1 references in total
To git.project42.io:Elia/blog.lazkani.io.git
 * [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.

And the deed is done.

Note

If you were extremely observant so war, you might've noticed that I used the same link again while I said a new repository.

Indeed, I did. The old repository was renamed and archived here. A new one with the name of the previous one was created instead.

Conclusion

After I pushed the repository you can notice the change in size. It's not insignificant. I think it's clearner now. The 1.2MB size on the repository is no longer bothering me.