9 minutes
Backups for K8s and Beyond
Intro
Recently I have been moving my homelab to Kubernetes. This has presented the need for a backup solution for any persistent data I might have there. For quite some time, I have been using Duplicati for my backups, but I haven’t been completely content with its performance, and have heard many horror stories of restores not working properly. So, I wanted to take this opportunity to find a backup solution that worked well for my personal computers (I have Windows, Linux, and Darwin hosts), my storage servers (UnRaid and FreeNAS), as well as Kubernetes. Assuming that such a solution exists, of course!
Choosing a Tool
There were a few solutions that I had heard mentioned lot on /r/homelab, and I took a look at all of them. Those being Duplicacy, Borg, and Restic.
Duplicacy seems like a good solution for some people, and the interface looked very nice. However, you do need to purchase a license to use all of its features, so I chose to avoid it unless I couldn’t find anything else that worked. I also wasn’t sure how I’d use it for any of my Kubernetes needs; it didn’t seem like a very popular use-case for the tool.
Borg and Restic both seemed like great tools. Ultimately, I decided to go with Restic purely because of its ecosystem, but again, they both seem like very nice solutions. Borg also probably has a better ecosystem for the majority of users, but I believe Restic’s is better in my particular case.
Restic didn’t have any particularly nice client GUIs that I could find, but for someone like me who likes to use version control as much as they can, there’s resticprofile, which is a fantastic tool that makes managing Restic on client machines very easy, and it works very well on my Windows, Linux, and Darwin hosts. I also found that Restic Browser could serve as a very usable GUI for doing restores. It’s still very bare-bones, but it does the job. Restic also has several solutions for interacting with K8s that looked very promising. Furthermore, Restic and all of these other tools are written in Go, which I very much prefer to Python, which is what Borg is written in. I assume this is one of the main reasons the Kubernetes ecosystem around Restic is so much more developed.
Integrating with Kubernetes
There are several tools out there that exist to make backing up persistent storage on Kubernetes with Restic much easier. Typically, they are operators that allow you define things like a backup schedule and what PVCs you want to be in which Restic repo. Again, I took a look at three relatively popular options.
The first product that I found was Stash. Stash is interesting because it has CRDs for a lot of different things you might want to backup or restore. I reached out to the sales team to see what an Enterprise License would cost (enterprise is needed for the most useful features), but they did not reply to me, I assume because I only have a few Kubernetes nodes to my name. From there, I was going to see if I could just build from source with license checks disabled, but it’s clear to me that at least some enterprise functionality isn’t present in the normal public repo, so that’s off the table as well.
Another very popular choice is Velero. However, I was immediately very apprehensive about it because it was made by Hepito, who sold out to VMware some time ago. This has led to a good amount of abandonware. It does look like Velero is still being supported, but it’s still important to realize that this acquisition altered the goals of the project. And I would have to pray that VMware does not alter them further. Additionally and very annoyingly, despite heavily using Restic, Velero does not support the Restic REST server backend. Meaning I would be hugely limited in my potential storage options.
Ultimately, I ended up going with K8up. In stark contrast to the other solutions I outlined, K8up is an active CNCF sandbox project, which makes me much more comfortable with using it. I really didn’t see any downsides to it for me personally, as it included most (if not all) of the enterprise features from Stash (such as support for backing up databases), and it also supports using the Restic REST server as a backend, which Velero was missing.
My Implementation
Below is a minimized account of how I implemented everything. For all the exact, ugly details, feel free to take a look at my homelab GitHub repo.
First, I created a backup
namespace for everything centralized:
apiVersion: v1
kind: Namespace
metadata:
name: backup
Next, I set up my Restic REST server. I used Rclone to do this, which basically
allows you to use anything that Rclone supports as storage for Restic. I ended
up creating a new helm chart for Rclone, just because I couldn’t
find any existing ones that I liked very much. Unlike many others, it just runs
rclone rcd
, so you can use this chart for basically anything, and just send
commands to serve/copy/sync/etc as needed.
Basically, the only extra values I supplied were to set my config file and add an extra port for Restic. This is my Kustomization:
helmCharts:
- name: rclone
repo: https://jacobcolvin.com/helm-charts/
version: '0.3.0'
releaseName: rclone
namespace: backup
valuesInline:
image:
repository: rclone/rclone
tag: '1.60.1'
configSecretName: rclone-config
extraPorts:
- name: restic
containerPort: 50001
protocol: TCP
I SSHed to the container and set up my remote ResticRemote
. Then I saved this
to my secret provider for the rclone-config
secret, so it won’t be lost.
I then created a Job to run the following command after the sync completes to start the Restic server:
curl -v -X POST -H 'Content-Type: application/json' -d '{
"_async": true,
"_group": "job/restic",
"command": "serve",
"arg": ["restic", "ResticRemote:/"],
"opt": {
"addr": ":50001"
}
}' http://rclone.backup.svc.cluster.local:5572/core/command
There are probably a lot of different ways to handle this, and I’m sure it’s mostly down to preference. So I won’t go into further detail on exactly how my Job is setup and such, but if you’re curious, it’s all public on my GitHub repo.
For my machines I wanted to back up, I used Traefik as an ingress for this. This is where I added things like authentication, certs, and such. Normally I use Cloudflare to proxy traffic, but in this case I thought it’d be better to not do that, as I am potentially sending quite a lot of data back and forth and don’t want to have to deal with any potential complications there. I also use both external-dns and cert-manager, so this was as simple as adding/replacing a few annotations, to disable proxying and switch to my Let’s Encrypt issuer:
'external-dns.alpha.kubernetes.io/cloudflare-proxied': 'false'
'cert-manager.io/issuer': 'letsencrypt-prod'
From there I was able to start using Restic on my personal machines. I used resticprofile to do the vast majority of the heavy lifting here. If you would like to see examples of the profiles I configured, you can check out my dotfiles repo.
Moving on to using this infrastructure to actually start backing up PVCs and
such that are also hosted by Kubernetes. First, I installed the K8up Backup
Operator. Note that the resources
part is required, because they don’t include
CRDs in the helm repo. You can also download the CRD and point to the file.
Also, the BACKUP_GLOBAL_OPERATOR_NAMESPACE
environment variable is important.
It tells any Jobs in other namespaces that they should use the operator from the
backup
namespace. Obviously, you’d want to configure this differently if there
were lots of people using one cluster.
helmCharts:
- name: k8up
repo: https://k8up-io.github.io/k8up
version: '4.0.1'
releaseName: k8up
namespace: backup
valuesInline:
k8up:
envVars:
- name: BACKUP_GLOBAL_OPERATOR_NAMESPACE
value: backup
resources:
- https://github.com/k8up-io/k8up/releases/download/k8up-4.0.1/k8up-crd.yaml
With that installed, we can now use the Schedule
CR to start backing things
up. While in this configuration, the operator is centralized, the Schedule
is
not. There should be one CR in each namespace containing things you want to back
up. Here’s an example Schedule
for a foobar
namespace.
apiVersion: k8up.io/v1
kind: Schedule
metadata:
name: foobar-schedule
namespace: foobar
spec:
backend:
rest:
url: http://rclone-restic.backup.svc.cluster.local:50001/macropower/foobar
repoPasswordSecretRef:
name: restic-credentials
key: repo-key
backup:
schedule: '0 4 * * *' # 04:00
failedJobsHistoryLimit: 2
successfulJobsHistoryLimit: 2
check:
schedule: '0 1 * * 1' # 01:00 on Monday
failedJobsHistoryLimit: 2
successfulJobsHistoryLimit: 2
prune:
schedule: '0 1 * * 0' # 00:00 on Monday
failedJobsHistoryLimit: 2
successfulJobsHistoryLimit: 2
retention:
keepLast: 3
keepDaily: 7
keepWeekly: 5
keepMonthly: 12
Note that this Schedule
has its very own repo to use at macropower/foobar
,
and also its own encryption key in the restic-credentials
secret. A different
namespace with its own Schedule
could have its own repo, credentials, or any
other attributes that aren’t configured on the operator.
Once the Schedule
is created, anything inside the namespace it lives in
(foobar
in this example) can be backed up via an annotation. For example,
below I have two PVCs, one for music
and one for anime
:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: music
namespace: foobar
annotations:
'k8up.io/backup': 'true'
spec:
# ...
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: anime
namespace: foobar
spec:
# ...
music
has the backup annotation, so it will be backed up every day per our
Schedule
. However, anime
does not have this annotation, so it will not be
included in backups.
Here’s a diagram showing how everything works together:
Databases and Other Edge Cases
Lastly, to deal with databases, you of course can’t simply backup their PVC. Thankfully, K8up has a really simple way of addressing databases and basically any other edge cases. You can add annotations on the Pod itself, and K8up can run commands inside your containers to collect and backup data. Personally, I use TimescaleDB which is backed up almost exactly in the same way as Postgres. I was able to just add the following annotations:
podAnnotations:
'k8up.io/backup': 'true'
'k8up.io/backupcommand': sh -c 'PGUSER="postgres" PGPASSWORD="$PATRONI_SUPERUSER_PASSWORD" pg_dumpall --clean'
'k8up.io/file-extension': .sql
This just creates a snapshot of the db.sql
file resulting from the pg_dump
command. I am not sure how or even if I could automate restores with Timescale,
because they do require a bit of extra work compared to
vanilla Postgres. But, hopefully this isn’t something I’ll have to do very
often.
Conclusion
I hope someone found this article helpful. If you would like to see my exact and up-to-date implementation of everything above, please check out my homelab repo:
And of course a huge thanks to the authors of the following projects: