Controlling Replicas With Kubernetes Node Labels and LINSTOR Auxiliary Properties
By using Kubernetes node labels and LINSTOR® auxiliary properties, you can better control the placement of your replicas within your cluster. This is useful when you need to avoid placing two replicas within a single failure domain (such as a rack or DC).
Assume that you have a six node Kubernetes cluster with LINSTOR configured
using the LINSTOR Operator for persistent storage, and you have a
LINSTOR storage pool named lvm-thin
configured across all nodes.
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-0 Ready control-plane 6h57m v1.26.3
kube-1 Ready <none> 6h57m v1.26.3
kube-2 Ready <none> 6h57m v1.26.3
kube-3 Ready <none> 6h57m v1.26.3
kube-4 Ready <none> 6h57m v1.26.3
kube-5 Ready <none> 6h57m v1.26.3
LINSTOR ==> node list
╭───────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════╡
┊ kube-0 ┊ SATELLITE ┊ 192.168.222.40:3366 (PLAIN) ┊ Online ┊
┊ kube-1 ┊ SATELLITE ┊ 192.168.222.41:3366 (PLAIN) ┊ Online ┊
┊ kube-2 ┊ SATELLITE ┊ 192.168.222.42:3366 (PLAIN) ┊ Online ┊
┊ kube-3 ┊ SATELLITE ┊ 192.168.222.43:3366 (PLAIN) ┊ Online ┊
┊ kube-4 ┊ SATELLITE ┊ 192.168.222.44:3366 (PLAIN) ┊ Online ┊
┊ kube-5 ┊ SATELLITE ┊ 192.168.222.45:3366 (PLAIN) ┊ Online ┊
┊ linstor-op-cs-controller-7c7d59d98d-d82lr ┊ CONTROLLER ┊ 172.16.186.2:3366 (PLAIN) ┊ Online ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────╯
LINSTOR ==> storage-pool list
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
8<------------------------------------------------------------snip---------------------------------------------------------------8<
┊ lvm-thin ┊ kube-0 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-1 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-2 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-3 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-4 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
┊ lvm-thin ┊ kube-5 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 8.00 GiB ┊ 8.00 GiB ┊ True ┊ Ok ┊ ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Also assume you have your six nodes evenly distributed across three
separate racks within your data center, or across three separate
availability zones (AZ) within a cloud region. In our examples, we’ll
assume kube-0
and kube-1
are in one rack or AZ, kube-2
and kube-3
are in
another, and kube-4
and kube-5
are in yet another.
LINSTOR, by default, is not aware of this distribution and therefore might place both replicas of a two replica LINSTOR volume within the same rack or AZ. This would leave your data inaccessible during a rack or AZ outage. Alternatively, you might want to keep replicas within a single rack or AZ to isolate LINSTOR’s replication, or to keep replication latency to an absolute minimum.
In either situation, you will first need to add Kubernetes labels to each
node. The LINSTOR Operator will automatically import Kubernetes node
labels into LINSTOR and apply them as auxiliary properties on the
LINSTOR node objects. Using the assumptions above, you will add the
following node labels to your Kubernetes nodes, using the key zone with
values a
, b
, and c
to differentiate your racks or AZs.
# kubectl label nodes kube-{0,1} zone=a
node/kube-0 labeled
node/kube-1 labeled
# kubectl label nodes kube-{2,3} zone=b
node/kube-2 labeled
node/kube-3 labeled
# kubectl label nodes kube-{4,5} zone=c
node/kube-4 labeled
node/kube-5 labeled
You’ll see the Kubernetes node labels on each of the respective LINSTOR node objects.
LINSTOR ==> node list-properties kube-0
╭────────────────────────────────────────────────────────────────────────────────╮
┊ Key ┊ Value ┊
╞════════════════════════════════════════════════════════════════════════════════╡
┊ Aux/beta.kubernetes.io/arch ┊ amd64 ┊
┊ Aux/beta.kubernetes.io/os ┊ linux ┊
┊ Aux/kubernetes.io/arch ┊ amd64 ┊
┊ Aux/kubernetes.io/hostname ┊ kube-0 ┊
┊ Aux/kubernetes.io/os ┊ linux ┊
┊ Aux/linbit.com/hostname ┊ kube-0 ┊
┊ Aux/linbit.com/sp-DfltDisklessStorPool ┊ true ┊
┊ Aux/linbit.com/sp-lvm-thick ┊ true ┊
┊ Aux/linbit.com/sp-lvm-thin ┊ true ┊
┊ Aux/node-role.kubernetes.io/control-plane ┊ ┊
┊ Aux/node.kubernetes.io/exclude-from-external-load-balancers ┊ ┊
┊ Aux/registered-by ┊ linstor-operator ┊
┊ Aux/zone ┊ a ┊
┊ CurStltConnName ┊ default ┊
┊ NodeUname ┊ kube-0 ┊
╰────────────────────────────────────────────────────────────────────────────────╯
LINSTOR’s storageClasses
can then be configured to avoid placing
replicas within a single failure domain using the LINSTOR storageClass
parameter replicasOnDifferent
, naming the zone key.
cat << EOF > linstor-sc-on-diff.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor-csi-lvm-thin-r2-on-diff
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: 2
storagePool: lvm-thin
replicasOnDifferent: zone
reclaimPolicy: Delete
allowVolumeExpansion: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor-csi-lvm-thin-r3-on-diff
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: 3
storagePool: lvm-thin
replicasOnDifferent: zone
reclaimPolicy: Delete
allowVolumeExpansion: true
EOF
kubectl apply -f linstor-sc-on-diff.yaml
Creating persistent volume claims (PVC) using the storageClass
created
above will result in replicas being distributed where the key zone has
different values.
cat << EOF > pvcs-on-diff.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-diff-zone-0
spec:
storageClassName: linstor-csi-lvm-thin-r3-on-diff
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-diff-zone-1
spec:
storageClassName: linstor-csi-lvm-thin-r3-on-diff
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
EOF
kubectl apply -f pvcs-on-diff.yaml
Within LINSTOR, you will see that each replica of the LINSTOR resources is in a different zone.
LINSTOR ==> resource list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-c38af6c1-f02a-46db-b8ac-74b4eef20ca6 ┊ kube-0 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:18 ┊
┊ pvc-c38af6c1-f02a-46db-b8ac-74b4eef20ca6 ┊ kube-2 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:17 ┊
┊ pvc-c38af6c1-f02a-46db-b8ac-74b4eef20ca6 ┊ kube-4 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:18 ┊
┊ pvc-e8a5d0c8-9e61-46c3-afb5-f0ca975c4249 ┊ kube-1 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:17 ┊
┊ pvc-e8a5d0c8-9e61-46c3-afb5-f0ca975c4249 ┊ kube-3 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:19 ┊
┊ pvc-e8a5d0c8-9e61-46c3-afb5-f0ca975c4249 ┊ kube-4 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:39:19 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
LINSTOR’s storageClasses
can also be configured to place replicas within
the same zone by using the LINSTOR storageClass
parameter replicasOnSame
,
naming the respective key and value pair.
cat << EOF > linstor-sc-on-same.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor-csi-lvm-thin-r2-on-same-a
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: 2
storagePool: lvm-thin
replicasOnSame: zone=a
reclaimPolicy: Delete
allowVolumeExpansion: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor-csi-lvm-thin-r2-on-same-b
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: 2
storagePool: lvm-thin
replicasOnSame: zone=b
reclaimPolicy: Delete
allowVolumeExpansion: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor-csi-lvm-thin-r2-on-same-c
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: 2
storagePool: lvm-thin
replicasOnSame: zone=c
reclaimPolicy: Delete
allowVolumeExpansion: true
EOF
kubectl apply -f linstor-sc-on-same.yaml
Creating PVCs using the storageClasses
created above will result in
replicas being distributed where the key zone has the specified value.
cat << EOF > pvcs-on-same.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-zone-a
spec:
storageClassName: linstor-csi-lvm-thin-r2-on-same-a
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-zone-b
spec:
storageClassName: linstor-csi-lvm-thin-r2-on-same-b
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-vol-claim-zone-c
spec:
storageClassName: linstor-csi-lvm-thin-r2-on-same-c
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
EOF
kubectl apply -f pvcs-on-same.yaml
Within LINSTOR, you will see that each replica of the LINSTOR resources are in the same zone.
LINSTOR ==> resource list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-0ef85bf7-2a9a-4e6f-9d7b-a473518c6cee ┊ kube-2 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:55 ┊
┊ pvc-0ef85bf7-2a9a-4e6f-9d7b-a473518c6cee ┊ kube-3 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:52 ┊
┊ pvc-0fc56b3d-b249-4e6f-a225-41224cb367f9 ┊ kube-0 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:52 ┊
┊ pvc-0fc56b3d-b249-4e6f-a225-41224cb367f9 ┊ kube-1 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:53 ┊
┊ pvc-35144a76-d15f-4709-9911-b6c951e87cc1 ┊ kube-4 ┊ 7002 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:54 ┊
┊ pvc-35144a76-d15f-4709-9911-b6c951e87cc1 ┊ kube-5 ┊ 7002 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2023-03-24 22:17:56 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Written by: MDK - 3/24/23