+++ title = "Git binary clean up" author = ["Elia el Lazkani"] date = 2020-09-02 lastmod = 2020-09-02 tags = ["git", "git-filter-repo", "git-lfs"] categories = ["revision-control"] draft = false +++ When I first started this blog, I simply started with experiments. The first iteration was a _wordpress_ which was followed, very fast, by _joomla_. Neither of them lasted long. They are simply not for me. I am lucky to be a part of a small group started in `#dgplug` on _Freenode_. In mentioned group, I have access to a lot of cool and awesome people who can put me to shame in development. On the flip side, I live by a _motto_ that says: > Always surround yourself with people smarter than yourself. It's the best way to learn. Anyway, back to the topic at hand, they introduced me to _static blog generators_. There my journey started but it started with a trial. I didn't give too much thought to the repository. It moved from _GitHub_ to _Gitlab_ and finally _here_. But, of course, you know how projects go, right ? Once you start with one, closely follows other ones that crop up along the way. I put them on my **TODO**, literally. One of those items was that I committed all the images to the repository. It wasn't until a few days ago until I added a `.gitattributes` file. Shameful, I know. No more ! Today it all changed. ## First step first {#first-step-first} Let's talk about what we need to do a little bit before we start. Plan it out in our head before doing the actual work. I will itemize them here to make it easy to follow: - Clone a fresh repository to do the work in - Remove all the images from the _git_ repository - Add the images again to _git lfs_ Sounds simple enough, doesn't it ?

warning

If you follow along this blog post, here's what you can expect. - You **WILL** lose _all the files you delete from disk_, as well, so make a copy - You **WILL** re-write history. This means that the _SHA_ of every commit since the first image was committed **WILL** mostly likely change. - You **WILL** end up essentially with a new repository that shares very little similarities with the original, so **BACKUP**!.
Now that we got the _warning_ out of the way, let's begin the serious work. ## Clone the repository {#clone-the-repository} I bet you can do this with your eyes closed by now. ```text $ # Backup your directory ! $ mv blog.lazkani.io blog-archive $ git clone git@git.project42.io:Elia/blog.lazkani.io.git blog.lazkani.io $ cd blog.lazkani.io ``` Easy peasy, lemon squeezy. ## Remove images from history {#remove-images-from-history} Now, this is a tough one. Alright, let's browse. Oh what is that thing [git-filter-repo](https://github.com/newren/git-filter-repo) ! Alright looks good. We can install it in different ways, check the project documentation but what I did, _in a python virtual environment_, was. ```text $ pip install git-filter-repo ```

warning

**BEWARE THE DRAGONS**
_git-filter-repo_ makes this job pretty easy to do. ```text $ git filter-repo --invert-paths --path images/ Parsed 43 commits New history written in 0.08 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects HEAD is now at 17d3f5c Modifying a Nikola theme Enumerating objects: 317, done. Counting objects: 100% (317/317), done. Delta compression using up to 2 threads Compressing objects: 100% (200/200), done. Writing objects: 100% (317/317), done. Total 317 (delta 127), reused 231 (delta 88), pack-reused 0 Completely finished after 0.21 seconds. ``` That took almost no time. Nice ! Let's check the directory and fair eonugh it no longer has `images/`. ## Add the images back ! {#add-the-images-back} Okay, for this you will need [git-lfs](https://git-lfs.github.com/). It should be easy to find your package manager. This is a _debian 10_ machine so I did. ```text $ sudo apt-get install git-lfs ```

warning

Before you commit to using _git-lfs_, make sure that your _git_ server supports it. If you have a pipeline, make sure it doesn't break it.
I already stashed our original project like a big boy, so now I get to use it. ```text $ cp -r ../blog-archive/images . ``` Then we can initialize _git-lfs_. ```text $ git lfs install Updated git hooks. Git LFS initialized. ``` Okay ! We are good to go. Next step, we need to tell _git-lfs_ where are the files we care about. In my case, my needs are very simple. ```text $ git lfs track "*.png" Tracking "*.png" ``` I've only used _PNG_ images so far, so now that they are tracked you should see a `.gitattributes` file created if you didn't have one already. From this step onward, _git-lfs_ doesn't differ too much from regular _git_. In this case it was. ```text $ git add .gitattributes $ git add images/ $ git status On branch master Changes to be committed: (use "git restore --staged ..." to unstage) modified: .gitattributes new file: images/local-kubernetes-cluster-on-kvm/01-add-cluster.png new file: images/local-kubernetes-cluster-on-kvm/02-custom-cluster.png new file: images/local-kubernetes-cluster-on-kvm/03-calico-networkProvider.png new file: images/local-kubernetes-cluster-on-kvm/04-nginx-ingressDisabled.png new file: images/local-kubernetes-cluster-on-kvm/05-customize-nodes.png new file: images/local-kubernetes-cluster-on-kvm/06-registered-nodes.png new file: images/local-kubernetes-cluster-on-kvm/07-kubernetes-cluster.png new file: images/my-path-down-the-road-of-cloudflare-s-redirect-loop/flexible-encryption.png new file: images/my-path-down-the-road-of-cloudflare-s-redirect-loop/full-encryption.png new file: images/my-path-down-the-road-of-cloudflare-s-redirect-loop/too-many-redirects.png new file: images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks-logs.png new file: images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks.png new file: images/weechat-ssh-and-notification/01-weechat-weenotify.png ``` Now that the files are staged, we shall commit. ```text $ git commit -v [master 6566fd3] Re-adding the removed images to git-lfs this time 14 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 images/local-kubernetes-cluster-on-kvm/01-add-cluster.png create mode 100644 images/local-kubernetes-cluster-on-kvm/02-custom-cluster.png create mode 100644 images/local-kubernetes-cluster-on-kvm/03-calico-networkProvider.png create mode 100644 images/local-kubernetes-cluster-on-kvm/04-nginx-ingressDisabled.png create mode 100644 images/local-kubernetes-cluster-on-kvm/05-customize-nodes.png create mode 100644 images/local-kubernetes-cluster-on-kvm/06-registered-nodes.png create mode 100644 images/local-kubernetes-cluster-on-kvm/07-kubernetes-cluster.png create mode 100644 images/my-path-down-the-road-of-cloudflare-s-redirect-loop/flexible-encryption.png create mode 100644 images/my-path-down-the-road-of-cloudflare-s-redirect-loop/full-encryption.png create mode 100644 images/my-path-down-the-road-of-cloudflare-s-redirect-loop/too-many-redirects.png create mode 100644 images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks-logs.png create mode 100644 images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks.png create mode 100644 images/weechat-ssh-and-notification/01-weechat-weenotify.png ``` Yes, I use `-v` when I commit from the shell, try it. The interesting part from the previous step is that _git-filter-repo_ left us without a _remote_. As I said, this repository resembles very little the original one so the decision made by _git-filter-repo_ is correct. Let's add a **new empty repository** _remote_ to our new repository and push. ```text $ git remote add origin git@git.project42.io:Elia/blog.lazkani.io.git $ git push -u origin master Locking support detected on remote "origin". Consider enabling it with: $ git config lfs.https://git.project42.io/Elia/blog.lazkani.io.git/info/lfs.locksverify true Enumerating objects: 338, done./13), 1.0 MB | 128 KB/s Counting objects: 100% (338/338), done. Delta compression using up to 2 threads Compressing objects: 100% (182/182), done. Writing objects: 100% (338/338), 220.74 KiB | 24.53 MiB/s, done. Total 338 (delta 128), reused 316 (delta 127), pack-reused 0 remote: Resolving deltas: 100% (128/128), done. remote: . Processing 1 references remote: Processed 1 references in total To git.project42.io:Elia/blog.lazkani.io.git * [new branch] master -> master Branch 'master' set up to track remote branch 'master' from 'origin'. ``` And the deed is done.

Note

If you were extremely observant so war, you might've noticed that I used the same link again while I said a **new repository**. Indeed, I did. The old repository was renamed and archived [here](https://gitea.project42.io/Elia/blog.lazkani.io-20200902-historical). A new one with the name of the previous one was created instead.
## Conclusion {#conclusion} After I pushed the repository you can notice the change in size. It's not insignificant. I think it's clearner now. The **1.2MB** size on the _repository_ is no longer bothering me.