How I've set up my highly-available Kubernetes cluster

Photo by Jake Walker / Unsplash

Recently, I started looking into getting a Kubernetes cluster set up to learn how it, and the whole “orchestration” concept, works. I’m not 100% sure, but I think this is how it works: Basically, you have a “cluster” of likely-identical machines (or “nodes”) where apps will be deployed to. You just tell the cluster how it’s deployed, and it handles the rest.

Now, I’m not going to go in-depth with all the concepts – there a bunch of other articles that will do better explaining that – however, I’ll lay out the steps I’ve done to go from nothing to a working Kubernetes cluster.

Pre-requisites

In my case, I needed a highly-available Kubernetes cluster (powered by k3s), with three master/control-plane nodes and four worker nodes. (For those who are wondering, the fourth worker node is there such that all its traffic goes through a site-to-site VPN.)

Here are the stuff you’ll need in your local machine:

And here are the stuff you’ll need for the cluster itself:

  • The specs of the nodes will depend on how you’ll use your cluster. As my cluster will need a bit of grunt, my VMs have been allocated 4 cores and 8GB RAM each. They could be as low as 1 core and 1GB of RAM, but only if you don’t put a lot of apps.
  • You can choose to have either an external datastore or embedded etcd. The latter just recently had full support (as of v1.19.5-k3s1). I’ve done both, so I’ll try to list instructions for both. To make the external datastore, you can create a VM with MariaDB on it. Just take note of the username, password, and the name of the database, since you’ll need that in the installation steps.
  • As a requirement for using k3sup, we need it to seamlessly access the nodes from your local machine. We can use ssh-copy-id for that one. Let’s also assume the non-root user we have on the nodes is named k3s.
  • You’ll also need a fixed registration address for registering the agent nodes to the master nodes properly. You can either use a load balancer, or use round-robin DNS. Since I’m just in my homelab, the latter will suffice. The name for my master nodes will be k3s.homelab.

Installing k3s

At this point, we now have a total of 7 or 8 VMs (the IPs are just an example):

  • External DB* (10.1.0.5)
  • Control plane/master nodes
    • 10.1.0.11
    • 10.1.0.12
    • 10.1.0.13
  • Worker nodes
    • 10.1.0.21
    • 10.1.0.22
    • 10.1.0.23
    • 10.1.0.26

Now we can start installing stuff. Let’s start with k3s, using k3sup. First thing’s first, create a working directory, then we’ll need to keep the datastore URI as a variable if we’re using an external DB (replace the credentials appropriately):

export DATASTORE="mysql://<username>:<password>@tcp(10.1.0.5:3306)/<db-name>

Then we install k3s to the master nodes:

k3sup install --user k3s --ip 10.1.0.11 --datastore="${DATASTORE}" --k3s-channel v1.19 --k3s-extra-args '--node-taint CriticalAddonsOnly=true:NoExecute'
k3sup install --user k3s --ip 10.1.0.12 --datastore="${DATASTORE}" --k3s-channel v1.19 --k3s-extra-args '--node-taint CriticalAddonsOnly=true:NoExecute'
k3sup install --user k3s --ip 10.1.0.13 --datastore="${DATASTORE}" --k3s-channel v1.19 --k3s-extra-args '--node-taint CriticalAddonsOnly=true:NoExecute'

The commands basically say we’re installing k3s v1.19.x in the master nodes using the user k3s, with the external datastore, and make sure only control plane/master node-related components are deployed in these nodes.

If we’re using embedded etcd instead, here’s what we’ll do:

k3sup install --cluster --user k3s --ip 10.1.0.11 --k3s-channel v1.19 --k3s-extra-args '--node-taint CriticalAddonsOnly=true:NoExecute'
k3sup join --server --user k3s --ip 10.1.0.12 --server-ip 10.1.0.11 --server-user k3s --k3s-channel v1.19 --k3s-extra-args '--node-taint CriticalAddonsOnly=true:NoExecute'
k3sup join --server --user k3s --ip 10.1.0.13 --server-ip 10.1.0.11 --server-user k3s --k3s-channel v1.19 --k3s-extra-args '--node-taint CriticalAddonsOnly=true:NoExecute'

Important Note! (As of 1/25/2021) Check your Rancher version first if it supports the current stable Kubernetes version. Right now, Rancher v2.5.5 does not support Kubernetes v1.20+.

After that, we can now install k3s agents to the worker nodes:

k3sup join --user k3s --ip 10.1.0.21 --server-ip 10.1.0.11 --k3s-channel v1.19
k3sup join --user k3s --ip 10.1.0.22 --server-ip 10.1.0.11 --k3s-channel v1.19
k3sup join --user k3s --ip 10.1.0.23 --server-ip 10.1.0.11 --k3s-channel v1.19
k3sup join --user k3s --ip 10.1.0.26 --server-ip 10.1.0.11 --k3s-channel v1.19

We’re almost done. We just need to replace the k3s URL in the agents with our fixed registration address. To do that, in each agent node, we’ll edit this line in /etc/systemd/system/k3s-agent.service.env:

# Old line: K3S_URL=https://10.1.0.11:6443
K3S_URL=https://k3s.homelab:6443

Then finally, restart the service:

systemctl daemon-reload
service k3s-agent restart

And we’re done! To see if our cluster is set up properly, we can use kubectl. Set the KUBECONFIG environment variable if you haven’t yet, then list the nodes:

export KUBECONFIG=`pwd`/kubeconfig

kubectl get nodes
NAME      STATUS   ROLES         AGE   VERSION
k3s-a01   Ready    <none>        17h   v1.19.7+k3s1
k3s-a02   Ready    <none>        17h   v1.19.7+k3s1
k3s-a03   Ready    <none>        17h   v1.19.7+k3s1
k3s-a06   Ready    <none>        17h   v1.19.7+k3s1
k3s-m01   Ready    etcd,master   17h   v1.19.7+k3s1
k3s-m02   Ready    etcd,master   17h   v1.19.7+k3s1
k3s-m03   Ready    etcd,master   17h   v1.19.7+k3s1

Rancher and Longhorn

Now that we’ve got the cluster up, the first thing I do is install Rancher. There are more in-depth instructions in this Rancher article, but here’s the condensed version that I use:

# Add Helm repos
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Create namespaces for Rancher and cert-manager
kubectl create namespace cattle-system
kubectl create namespace cert-manager

# Apply cert-manager CRD
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.0.4/cert-manager.crds.yaml

# Install cert-manager and Rancher (change hostname!)
helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v1.0.4
helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname=<hostname> --set ingress.tls.source=secret

After all the dust settles, installing Longhorn is simple, thanks to Rancher’s built-in app catalog. Just go to the Cluster Explorer > Apps & Marketplace page and install Longhorn.

The other stuff

There are a couple notes that I need to mention:

  • You may notice that I installed cert-manager but I didn’t set up the TLS certificate during the Rancher install. That’s because I don’t like using HTTP validation for Let’s Encrypt. Instead, I use the YAML script from this issue so I can use DNS-01 validation.
  • I encourage anyone who’s going for that high-availability life to try Longhorn. It a pretty cool and sophisticated distributed storage system on top of Kubernetes. It’s also a massive time-saver for migrating volumes between clusters, thanks to the backup/restore system, where I can use NFS, or even S3.
  • And speaking of high-availability, the only thing that’s not highly available in my cluster is my “external-ish” load balancer. Still figuring out how to do that with multiple nginx nodes. To be continued…