blog.lazkani.io/content/posts/git-binary-clean-up.md

241 lines
9.3 KiB
Markdown
Raw Normal View History

+++
title = "Git binary clean up"
author = ["Elia el Lazkani"]
date = 2020-09-02T21:00:00+02:00
lastmod = 2021-06-28T00:01:34+02:00
tags = ["git", "git-filter-repo", "git-lfs"]
categories = ["revision-control"]
draft = false
+++
When I first started this blog, I simply started with experiments. The first iteration was a _wordpress_ which was followed, very fast, by _joomla_. Neither of them lasted long. They are simply not for me.
I am lucky to be a part of a small group started in `#dgplug` on _Freenode_. In mentioned group, I have access to a lot of cool and awesome people who can put me to shame in development. On the flip side, I live by a _motto_ that says:
> Always surround yourself with people smarter than yourself.
It's the best way to learn. Anyway, back to the topic at hand, they introduced me to _static blog generators_. There my journey started but it started with a trial. I didn't give too much thought to the repository. It moved from _GitHub_ to _Gitlab_ and finally _here_.
But, of course, you know how projects go, right ?
Once you start with one, closely follows other ones that crop up along the way. I put them on my **TODO**, literally. One of those items was that I committed all the images to the repository. It wasn't until a few days ago until I added a `.gitattributes` file. Shameful, I know.
No more ! Today it all changed.
<!--more-->
## First step first {#first-step-first}
Let's talk about what we need to do a little bit before we start. Plan it out in our head before doing the actual work.
I will itemize them here to make it easy to follow:
- Clone a fresh repository to do the work in
- Remove all the images from the _git_ repository
- Add the images again to _git lfs_
Sounds simple enough, doesn't it ?
<div class="admonition warning">
<p class="admonition-title">warning</p>
If you follow along this blog post, here's what you can expect.
- You **WILL** lose _all the files you delete from disk_, as well, so make a copy
- You **WILL** re-write history. This means that the _SHA_ of every commit since the first image was committed **WILL** mostly likely change.
- You **WILL** end up essentially with a new repository that shares very little similarities with the original, so **BACKUP**!.
</div>
Now that we got the _warning_ out of the way, let's begin the serious work.
## Clone the repository {#clone-the-repository}
I bet you can do this with your eyes closed by now.
```text
$ # Backup your directory !
$ mv blog.lazkani.io blog-archive
$ git clone git@git.project42.io:Elia/blog.lazkani.io.git blog.lazkani.io
$ cd blog.lazkani.io
```
Easy peasy, lemon squeezy.
## Remove images from history {#remove-images-from-history}
Now, this is a tough one. Alright, let's browse.
Oh what is that thing [git-filter-repo](https://github.com/newren/git-filter-repo) ! Alright looks good.
We can install it in different ways, check the project documentation but what I did, _in a python virtual environment_, was.
```text
$ pip install git-filter-repo
```
<div class="admonition warning">
<p class="admonition-title">warning</p>
**BEWARE THE DRAGONS**
</div>
_git-filter-repo_ makes this job pretty easy to do.
```text
$ git filter-repo --invert-paths --path images/
Parsed 43 commits
New history written in 0.08 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 17d3f5c Modifying a Nikola theme
Enumerating objects: 317, done.
Counting objects: 100% (317/317), done.
Delta compression using up to 2 threads
Compressing objects: 100% (200/200), done.
Writing objects: 100% (317/317), done.
Total 317 (delta 127), reused 231 (delta 88), pack-reused 0
Completely finished after 0.21 seconds.
```
That took almost no time. Nice !
Let's check the directory and fair eonugh it no longer has `images/`.
## Add the images back ! {#add-the-images-back}
Okay, for this you will need [git-lfs](https://git-lfs.github.com/). It should be easy to find your package manager.
This is a _debian 10_ machine so I did.
```text
$ sudo apt-get install git-lfs
```
<div class="admonition warning">
<p class="admonition-title">warning</p>
Before you commit to using _git-lfs_, make sure that your _git_ server supports it.
If you have a pipeline, make sure it doesn't break it.
</div>
I already stashed our original project like a big boy, so now I get to use it.
```text
$ cp -r ../blog-archive/images .
```
Then we can initialize _git-lfs_.
```text
$ git lfs install
Updated git hooks.
Git LFS initialized.
```
Okay ! We are good to go.
Next step, we need to tell _git-lfs_ where are the files we care about. In my case, my needs are very simple.
```text
$ git lfs track "*.png"
Tracking "*.png"
```
I've only used _PNG_ images so far, so now that they are tracked you should see a `.gitattributes` file created if you didn't have one already.
From this step onward, _git-lfs_ doesn't differ too much from regular _git_. In this case it was.
```text
$ git add .gitattributes
$ git add images/
$ git status
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: .gitattributes
new file: images/local-kubernetes-cluster-on-kvm/01-add-cluster.png
new file: images/local-kubernetes-cluster-on-kvm/02-custom-cluster.png
new file: images/local-kubernetes-cluster-on-kvm/03-calico-networkProvider.png
new file: images/local-kubernetes-cluster-on-kvm/04-nginx-ingressDisabled.png
new file: images/local-kubernetes-cluster-on-kvm/05-customize-nodes.png
new file: images/local-kubernetes-cluster-on-kvm/06-registered-nodes.png
new file: images/local-kubernetes-cluster-on-kvm/07-kubernetes-cluster.png
new file: images/my-path-down-the-road-of-cloudflare-s-redirect-loop/flexible-encryption.png
new file: images/my-path-down-the-road-of-cloudflare-s-redirect-loop/full-encryption.png
new file: images/my-path-down-the-road-of-cloudflare-s-redirect-loop/too-many-redirects.png
new file: images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks-logs.png
new file: images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks.png
new file: images/weechat-ssh-and-notification/01-weechat-weenotify.png
```
Now that the files are staged, we shall commit.
```text
$ git commit -v
[master 6566fd3] Re-adding the removed images to git-lfs this time
14 files changed, 40 insertions(+), 1 deletion(-)
create mode 100644 images/local-kubernetes-cluster-on-kvm/01-add-cluster.png
create mode 100644 images/local-kubernetes-cluster-on-kvm/02-custom-cluster.png
create mode 100644 images/local-kubernetes-cluster-on-kvm/03-calico-networkProvider.png
create mode 100644 images/local-kubernetes-cluster-on-kvm/04-nginx-ingressDisabled.png
create mode 100644 images/local-kubernetes-cluster-on-kvm/05-customize-nodes.png
create mode 100644 images/local-kubernetes-cluster-on-kvm/06-registered-nodes.png
create mode 100644 images/local-kubernetes-cluster-on-kvm/07-kubernetes-cluster.png
create mode 100644 images/my-path-down-the-road-of-cloudflare-s-redirect-loop/flexible-encryption.png
create mode 100644 images/my-path-down-the-road-of-cloudflare-s-redirect-loop/full-encryption.png
create mode 100644 images/my-path-down-the-road-of-cloudflare-s-redirect-loop/too-many-redirects.png
create mode 100644 images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks-logs.png
create mode 100644 images/simple-cron-monitoring-with-healthchecks/borgbackup-healthchecks.png
create mode 100644 images/weechat-ssh-and-notification/01-weechat-weenotify.png
```
Yes, I use `-v` when I commit from the shell, try it.
The interesting part from the previous step is that _git-filter-repo_ left us without a _remote_. As I said, this repository resembles very little the original one so the decision made by _git-filter-repo_ is correct.
Let's add a **new empty repository** _remote_ to our new repository and push.
```text
$ git remote add origin git@git.project42.io:Elia/blog.lazkani.io.git
$ git push -u origin master
Locking support detected on remote "origin". Consider enabling it with:
$ git config lfs.https://git.project42.io/Elia/blog.lazkani.io.git/info/lfs.locksverify true
Enumerating objects: 338, done./13), 1.0 MB | 128 KB/s
Counting objects: 100% (338/338), done.
Delta compression using up to 2 threads
Compressing objects: 100% (182/182), done.
Writing objects: 100% (338/338), 220.74 KiB | 24.53 MiB/s, done.
Total 338 (delta 128), reused 316 (delta 127), pack-reused 0
remote: Resolving deltas: 100% (128/128), done.
remote: . Processing 1 references
remote: Processed 1 references in total
To git.project42.io:Elia/blog.lazkani.io.git
* [new branch] master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.
```
And the deed is done.
<div class="admonition note">
<p class="admonition-title">Note</p>
If you were extremely observant so war, you might've noticed that I used the same link again while I said a **new repository**.
Indeed, I did. The old repository was renamed and archived [here](https://gitea.project42.io/Elia/blog.lazkani.io-20200902-historical). A new one with the name of the previous one was created instead.
</div>
## Conclusion {#conclusion}
After I pushed the repository you can notice the change in size. It's not insignificant.
I think it's clearner now. The **1.2MB** size on the _repository_ is no longer
bothering me.